mongo-python-driver/doc/api/pymongo/stream_processing.rst

191 lines
6.4 KiB
ReStructuredText

:mod:`stream_processing` -- Atlas Stream Processing
====================================================
.. warning::
The Atlas Stream Processing API is **experimental**. The driver
specification is in Draft status and the API surface — including
retryability behavior, error code mappings, and method signatures —
may change in a backward-incompatible way before the spec is
finalized.
Overview
--------
Atlas Stream Processing (ASP) lets you build continuous, stateful pipelines
that process data from one or more sources in real time. A *stream processing
workspace* is a dedicated Atlas endpoint that hosts one or more named stream
processors; it is distinct from a standard MongoDB cluster and is accessed
through :class:`~pymongo.asynchronous.stream_processing.AsyncStreamProcessingClient`
(or its sync twin :class:`~pymongo.synchronous.stream_processing.StreamProcessingClient`)
rather than ``MongoClient``.
Workspace connection strings use the standard ``mongodb://`` scheme — ``mongodb+srv://``
is not supported. TLS is always required for workspace connections and cannot be
disabled. ``authSource`` defaults to ``"admin"`` if not explicitly set. Users must
hold the ``atlasAdmin`` role to execute ASP commands.
Quickstart
----------
Async
~~~~~
.. code-block:: python
import asyncio
from pymongo import AsyncStreamProcessingClient
async def main():
uri = (
"mongodb://user:pass@"
"atlas-stream-<workspaceId>-<suffix>.<region>.a.query.mongodb.net/"
)
async with AsyncStreamProcessingClient(uri) as client:
sps = client.stream_processors()
await sps.create("demo", pipeline=[
{"$source": {"connectionName": "<conn>", "topic": "events"}},
{"$match": {"value": {"$gt": 100}}},
{"$emit": {"connectionName": "<conn>", "db": "out", "coll": "high"}},
])
proc = sps.get("demo")
await proc.start()
info = await sps.get_info("demo")
print(info.state, info.pipeline_version)
async for doc in proc.sample(limit=10):
print(doc)
await proc.stop()
await proc.drop()
asyncio.run(main())
Sync
~~~~
.. code-block:: python
from pymongo import StreamProcessingClient
uri = (
"mongodb://user:pass@"
"atlas-stream-<workspaceId>-<suffix>.<region>.a.query.mongodb.net/"
)
with StreamProcessingClient(uri) as client:
sps = client.stream_processors()
sps.create("demo", pipeline=[
{"$source": {"connectionName": "<conn>", "topic": "events"}},
{"$match": {"value": {"$gt": 100}}},
{"$emit": {"connectionName": "<conn>", "db": "out", "coll": "high"}},
])
proc = sps.get("demo")
proc.start()
info = sps.get_info("demo")
print(info.state, info.pipeline_version)
for doc in proc.sample(limit=10):
print(doc)
proc.stop()
proc.drop()
Sample cursor semantics
-----------------------
The sample cursor is a custom two-phase protocol distinct from the standard
MongoDB ``getMore`` mechanism. It MUST NOT be confused with standard
:class:`~pymongo.asynchronous.cursor.AsyncCursor` objects.
For most use cases, call :meth:`~pymongo.asynchronous.stream_processing.AsyncStreamProcessor.sample`
to obtain an :class:`~pymongo.asynchronous.stream_processing.AsyncSampleCursor` and iterate it
with ``async for``. The cursor drives the underlying protocol automatically:
- The first iteration sends ``startSampleStreamProcessor``, optionally with a ``limit``.
- Subsequent iterations send ``getMoreSampleStreamProcessor`` with the ``cursorId``
returned by the previous call, optionally with a ``batchSize``.
- When the server returns ``cursorId: 0`` the cursor is exhausted and no further
wire calls are made.
For fine-grained control — tracking ``cursorId`` across calls yourself — use
:meth:`~pymongo.asynchronous.stream_processing.AsyncStreamProcessor.get_stream_processor_samples`
directly. Pass ``cursor_id=0`` and an :exc:`~pymongo.errors.InvalidOperation` is raised
immediately, before any wire call is sent.
Commands not yet supported
--------------------------
The following commands from the ASP server specification are intentionally
deferred and not yet wrapped by this API:
- ``modifyStreamProcessor`` — rename, pipeline replacement, and DLQ reconfiguration
- ``listStreamProcessors`` — enumerate processors in a workspace
- ``listStreamConnections`` — enumerate available connections
- ``processStreamProcessor`` — one-shot ad-hoc pipeline execution
- ``listWorkspaceDefaults`` — fetch workspace tier defaults
Users can still call any of these directly via ``run_command`` on a plain
:class:`~pymongo.asynchronous.mongo_client.AsyncMongoClient` connected to the
workspace endpoint.
Async classes
-------------
.. autoclass:: pymongo.asynchronous.stream_processing.AsyncStreamProcessingClient
:members:
:show-inheritance:
.. autoclass:: pymongo.asynchronous.stream_processing.AsyncStreamProcessors
:members:
:show-inheritance:
.. autoclass:: pymongo.asynchronous.stream_processing.AsyncStreamProcessor
:members:
:show-inheritance:
.. autoclass:: pymongo.asynchronous.stream_processing.AsyncSampleCursor
:members:
:show-inheritance:
Sync classes
------------
.. autoclass:: pymongo.synchronous.stream_processing.StreamProcessingClient
:members:
:show-inheritance:
.. autoclass:: pymongo.synchronous.stream_processing.StreamProcessors
:members:
:show-inheritance:
.. autoclass:: pymongo.synchronous.stream_processing.StreamProcessor
:members:
:show-inheritance:
.. autoclass:: pymongo.synchronous.stream_processing.SampleCursor
:members:
:show-inheritance:
Options and result types
------------------------
.. autoclass:: pymongo.stream_processing_options.CreateStreamProcessorOptions
:members:
:show-inheritance:
.. autoclass:: pymongo.stream_processing_options.StartStreamProcessorOptions
:members:
:show-inheritance:
.. autoclass:: pymongo.stream_processing_options.GetStreamProcessorStatsOptions
:members:
:show-inheritance:
.. autoclass:: pymongo.stream_processing_options.GetStreamProcessorSamplesOptions
:members:
:show-inheritance:
.. autoclass:: pymongo.stream_processing_options.GetStreamProcessorSamplesResult
:members:
:show-inheritance:
.. autoclass:: pymongo.stream_processing_options.StreamProcessorInfo
:members:
:show-inheritance: