430 lines
14 KiB
ReStructuredText
430 lines
14 KiB
ReStructuredText
Async Tutorial
|
|
==============
|
|
|
|
.. warning:: This API is currently in beta, meaning the classes, methods,
|
|
and behaviors described within may change before the full release.
|
|
If you come across any bugs during your use of this API,
|
|
please file a Jira ticket in the "Python Driver" project at https://jira.mongodb.org/browse/PYTHON.
|
|
|
|
.. code-block:: pycon
|
|
|
|
from pymongo import AsyncMongoClient
|
|
|
|
client = AsyncMongoClient()
|
|
await client.drop_database("test-database")
|
|
|
|
This tutorial is intended as an introduction to working with
|
|
**MongoDB** and **PyMongo** using the asynchronous API.
|
|
|
|
Prerequisites
|
|
-------------
|
|
Before we start, make sure that you have the **PyMongo** distribution
|
|
:doc:`installed <installation>`. In the Python shell, the following
|
|
should run without raising an exception:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> import pymongo
|
|
|
|
This tutorial also assumes that a MongoDB instance is running on the
|
|
default host and port. Assuming you have `downloaded and installed
|
|
<https://www.mongodb.com/docs/manual/installation/>`_ MongoDB, you
|
|
can start it like so:
|
|
|
|
.. code-block:: bash
|
|
|
|
$ mongod
|
|
|
|
Making a Connection with AsyncMongoClient
|
|
-----------------------------------------
|
|
The first step when working with **PyMongo** is to create a
|
|
:class:`~pymongo.asynchronous.mongo_client.AsyncMongoClient` to the running **mongod**
|
|
instance. Doing so is easy:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> from pymongo import AsyncMongoClient
|
|
>>> client = AsyncMongoClient()
|
|
|
|
The above code will connect on the default host and port. We can also
|
|
specify the host and port explicitly, as follows:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> client = AsyncMongoClient("localhost", 27017)
|
|
|
|
Or use the MongoDB URI format:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> client = AsyncMongoClient("mongodb://localhost:27017/")
|
|
|
|
By default, :class:`~pymongo.asynchronous.mongo_client.AsyncMongoClient` only connects to the database on its first operation.
|
|
To explicitly connect before performing an operation, use :meth:`~pymongo.asynchronous.mongo_client.AsyncMongoClient.aconnect`:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> client = await AsyncMongoClient().aconnect()
|
|
|
|
Getting a Database
|
|
------------------
|
|
A single instance of MongoDB can support multiple independent
|
|
`databases <https://www.mongodb.com/docs/manual/core/databases-and-collections>`_. When
|
|
working with PyMongo you access databases using attribute style access
|
|
on :class:`~pymongo.asynchronous.mongo_client.AsyncMongoClient` instances:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> db = client.test_database
|
|
|
|
If your database name is such that using attribute style access won't
|
|
work (like ``test-database``), you can use dictionary style access
|
|
instead:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> db = client["test-database"]
|
|
|
|
Getting a Collection
|
|
--------------------
|
|
A `collection <https://www.mongodb.com/docs/manual/core/databases-and-collections>`_ is a
|
|
group of documents stored in MongoDB, and can be thought of as roughly
|
|
the equivalent of a table in a relational database. Getting a
|
|
collection in PyMongo works the same as getting a database:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> collection = db.test_collection
|
|
|
|
or (using dictionary style access):
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> collection = db["test-collection"]
|
|
|
|
An important note about collections (and databases) in MongoDB is that
|
|
they are created lazily - none of the above commands have actually
|
|
performed any operations on the MongoDB server. Collections and
|
|
databases are created when the first document is inserted into them.
|
|
|
|
Documents
|
|
---------
|
|
Data in MongoDB is represented (and stored) using JSON-style
|
|
documents. In PyMongo we use dictionaries to represent documents. As
|
|
an example, the following dictionary might be used to represent a blog
|
|
post:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> import datetime
|
|
>>> post = {
|
|
... "author": "Mike",
|
|
... "text": "My first blog post!",
|
|
... "tags": ["mongodb", "python", "pymongo"],
|
|
... "date": datetime.datetime.now(tz=datetime.timezone.utc),
|
|
... }
|
|
|
|
Note that documents can contain native Python types (like
|
|
:class:`datetime.datetime` instances) which will be automatically
|
|
converted to and from the appropriate `BSON
|
|
<https://bsonspec.org/>`_ types.
|
|
|
|
Inserting a Document
|
|
--------------------
|
|
To insert a document into a collection we can use the
|
|
:meth:`~pymongo.asynchronous.collection.AsyncCollection.insert_one` method:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> posts = db.posts
|
|
>>> post_id = (await posts.insert_one(post)).inserted_id
|
|
>>> post_id
|
|
ObjectId('...')
|
|
|
|
When a document is inserted a special key, ``"_id"``, is automatically
|
|
added if the document doesn't already contain an ``"_id"`` key. The value
|
|
of ``"_id"`` must be unique across the
|
|
collection. :meth:`~pymongo.asynchronous.collection.AsyncCollection.insert_one` returns an
|
|
instance of :class:`~pymongo.results.InsertOneResult`. For more information
|
|
on ``"_id"``, see the `documentation on _id
|
|
<https://www.mongodb.com/docs/manual/reference/method/ObjectId/>`_.
|
|
|
|
After inserting the first document, the *posts* collection has
|
|
actually been created on the server. We can verify this by listing all
|
|
of the collections in our database:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> await db.list_collection_names()
|
|
['posts']
|
|
|
|
Getting a Single Document With :meth:`~pymongo.asynchronous.collection.AsyncCollection.find_one`
|
|
------------------------------------------------------------------------------------------------
|
|
The most basic type of query that can be performed in MongoDB is
|
|
:meth:`~pymongo.asynchronous.collection.AsyncCollection.find_one`. This method returns a
|
|
single document matching a query (or ``None`` if there are no
|
|
matches). It is useful when you know there is only one matching
|
|
document, or are only interested in the first match. Here we use
|
|
:meth:`~pymongo.asynchronous.collection.AsyncCollection.find_one` to get the first
|
|
document from the posts collection:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> import pprint
|
|
>>> pprint.pprint(await posts.find_one())
|
|
{'_id': ObjectId('...'),
|
|
'author': 'Mike',
|
|
'date': datetime.datetime(...),
|
|
'tags': ['mongodb', 'python', 'pymongo'],
|
|
'text': 'My first blog post!'}
|
|
|
|
The result is a dictionary matching the one that we inserted previously.
|
|
|
|
.. note:: The returned document contains an ``"_id"``, which was
|
|
automatically added on insert.
|
|
|
|
:meth:`~pymongo.asynchronous.collection.AsyncCollection.find_one` also supports querying
|
|
on specific elements that the resulting document must match. To limit
|
|
our results to a document with author "Mike" we do:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> pprint.pprint(await posts.find_one({"author": "Mike"}))
|
|
{'_id': ObjectId('...'),
|
|
'author': 'Mike',
|
|
'date': datetime.datetime(...),
|
|
'tags': ['mongodb', 'python', 'pymongo'],
|
|
'text': 'My first blog post!'}
|
|
|
|
If we try with a different author, like "Eliot", we'll get no result:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> await posts.find_one({"author": "Eliot"})
|
|
>>>
|
|
|
|
.. _async-querying-by-objectid:
|
|
|
|
Querying By ObjectId
|
|
--------------------
|
|
We can also find a post by its ``_id``, which in our example is an ObjectId:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> post_id
|
|
ObjectId(...)
|
|
>>> pprint.pprint(await posts.find_one({"_id": post_id}))
|
|
{'_id': ObjectId('...'),
|
|
'author': 'Mike',
|
|
'date': datetime.datetime(...),
|
|
'tags': ['mongodb', 'python', 'pymongo'],
|
|
'text': 'My first blog post!'}
|
|
|
|
Note that an ObjectId is not the same as its string representation:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> post_id_as_str = str(post_id)
|
|
>>> await posts.find_one({"_id": post_id_as_str}) # No result
|
|
>>>
|
|
|
|
A common task in web applications is to get an ObjectId from the
|
|
request URL and find the matching document. It's necessary in this
|
|
case to **convert the ObjectId from a string** before passing it to
|
|
``find_one``::
|
|
|
|
from bson.objectid import ObjectId
|
|
|
|
# The web framework gets post_id from the URL and passes it as a string
|
|
async def get(post_id):
|
|
# Convert from string to ObjectId:
|
|
document = await client.db.collection.find_one({'_id': ObjectId(post_id)})
|
|
|
|
.. seealso:: :ref:`web-application-querying-by-objectid`
|
|
|
|
Bulk Inserts
|
|
------------
|
|
In order to make querying a little more interesting, let's insert a
|
|
few more documents. In addition to inserting a single document, we can
|
|
also perform *bulk insert* operations, by passing a list as the
|
|
first argument to :meth:`~pymongo.asynchronous.collection.AsyncCollection.insert_many`.
|
|
This will insert each document in the list, sending only a single
|
|
command to the server:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> new_posts = [
|
|
... {
|
|
... "author": "Mike",
|
|
... "text": "Another post!",
|
|
... "tags": ["bulk", "insert"],
|
|
... "date": datetime.datetime(2009, 11, 12, 11, 14),
|
|
... },
|
|
... {
|
|
... "author": "Eliot",
|
|
... "title": "MongoDB is fun",
|
|
... "text": "and pretty easy too!",
|
|
... "date": datetime.datetime(2009, 11, 10, 10, 45),
|
|
... },
|
|
... ]
|
|
>>> result = await posts.insert_many(new_posts)
|
|
>>> result.inserted_ids
|
|
[ObjectId('...'), ObjectId('...')]
|
|
|
|
There are a couple of interesting things to note about this example:
|
|
|
|
- The result from :meth:`~pymongo.asynchronous.collection.AsyncCollection.insert_many` now
|
|
returns two :class:`~bson.objectid.ObjectId` instances, one for
|
|
each inserted document.
|
|
- ``new_posts[1]`` has a different "shape" than the other posts -
|
|
there is no ``"tags"`` field and we've added a new field,
|
|
``"title"``. This is what we mean when we say that MongoDB is
|
|
*schema-free*.
|
|
|
|
Querying for More Than One Document
|
|
-----------------------------------
|
|
To get more than a single document as the result of a query we use the
|
|
:meth:`~pymongo.asynchronous.collection.AsyncCollection.find`
|
|
method. :meth:`~pymongo.asynchronous.collection.AsyncCollection.find` returns a
|
|
:class:`~pymongo.asynchronous.cursor.AsyncCursor` instance, which allows us to iterate
|
|
over all matching documents. For example, we can iterate over every
|
|
document in the ``posts`` collection:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> async for post in posts.find():
|
|
... pprint.pprint(post)
|
|
...
|
|
{'_id': ObjectId('...'),
|
|
'author': 'Mike',
|
|
'date': datetime.datetime(...),
|
|
'tags': ['mongodb', 'python', 'pymongo'],
|
|
'text': 'My first blog post!'}
|
|
{'_id': ObjectId('...'),
|
|
'author': 'Mike',
|
|
'date': datetime.datetime(...),
|
|
'tags': ['bulk', 'insert'],
|
|
'text': 'Another post!'}
|
|
{'_id': ObjectId('...'),
|
|
'author': 'Eliot',
|
|
'date': datetime.datetime(...),
|
|
'text': 'and pretty easy too!',
|
|
'title': 'MongoDB is fun'}
|
|
|
|
Just like we did with :meth:`~pymongo.asynchronous.collection.AsyncCollection.find_one`,
|
|
we can pass a document to :meth:`~pymongo.asynchronous.collection.AsyncCollection.find`
|
|
to limit the returned results. Here, we get only those documents whose
|
|
author is "Mike":
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> async for post in posts.find({"author": "Mike"}):
|
|
... pprint.pprint(post)
|
|
...
|
|
{'_id': ObjectId('...'),
|
|
'author': 'Mike',
|
|
'date': datetime.datetime(...),
|
|
'tags': ['mongodb', 'python', 'pymongo'],
|
|
'text': 'My first blog post!'}
|
|
{'_id': ObjectId('...'),
|
|
'author': 'Mike',
|
|
'date': datetime.datetime(...),
|
|
'tags': ['bulk', 'insert'],
|
|
'text': 'Another post!'}
|
|
|
|
Counting
|
|
--------
|
|
If we just want to know how many documents match a query we can
|
|
perform a :meth:`~pymongo.asynchronous.collection.AsyncCollection.count_documents` operation
|
|
instead of a full query. We can get a count of all of the documents
|
|
in a collection:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> await posts.count_documents({})
|
|
3
|
|
|
|
or just of those documents that match a specific query:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> await posts.count_documents({"author": "Mike"})
|
|
2
|
|
|
|
Range Queries
|
|
-------------
|
|
MongoDB supports many different types of `advanced queries
|
|
<https://www.mongodb.com/docs/manual/reference/operator/>`_. As an
|
|
example, lets perform a query where we limit results to posts older
|
|
than a certain date, but also sort the results by author:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> d = datetime.datetime(2009, 11, 12, 12)
|
|
>>> async for post in posts.find({"date": {"$lt": d}}).sort("author"):
|
|
... pprint.pprint(post)
|
|
...
|
|
{'_id': ObjectId('...'),
|
|
'author': 'Eliot',
|
|
'date': datetime.datetime(...),
|
|
'text': 'and pretty easy too!',
|
|
'title': 'MongoDB is fun'}
|
|
{'_id': ObjectId('...'),
|
|
'author': 'Mike',
|
|
'date': datetime.datetime(...),
|
|
'tags': ['bulk', 'insert'],
|
|
'text': 'Another post!'}
|
|
|
|
Here we use the special ``"$lt"`` operator to do a range query, and
|
|
also call :meth:`~pymongo.asynchronous.cursor.AsyncCursor.sort` to sort the results
|
|
by author.
|
|
|
|
Indexing
|
|
--------
|
|
|
|
Adding indexes can help accelerate certain queries and can also add additional
|
|
functionality to querying and storing documents. In this example, we'll
|
|
demonstrate how to create a `unique index
|
|
<https://mongodb.com/docs/manual/core/index-unique/>`_ on a key that rejects
|
|
documents whose value for that key already exists in the index.
|
|
|
|
First, we'll need to create the index:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> result = await db.profiles.create_index([("user_id", pymongo.ASCENDING)], unique=True)
|
|
>>> sorted(list(await db.profiles.index_information()))
|
|
['_id_', 'user_id_1']
|
|
|
|
Notice that we have two indexes now: one is the index on ``_id`` that MongoDB
|
|
creates automatically, and the other is the index on ``user_id`` we just
|
|
created.
|
|
|
|
Now let's set up some user profiles:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> user_profiles = [{"user_id": 211, "name": "Luke"}, {"user_id": 212, "name": "Ziltoid"}]
|
|
>>> result = await db.profiles.insert_many(user_profiles)
|
|
|
|
The index prevents us from inserting a document whose ``user_id`` is already in
|
|
the collection:
|
|
|
|
.. code-block:: pycon
|
|
|
|
>>> new_profile = {"user_id": 213, "name": "Drew"}
|
|
>>> duplicate_profile = {"user_id": 212, "name": "Tommy"}
|
|
>>> result = await db.profiles.insert_one(new_profile) # This is fine.
|
|
>>> result = await db.profiles.insert_one(duplicate_profile)
|
|
Traceback (most recent call last):
|
|
DuplicateKeyError: E11000 duplicate key error index: test_database.profiles.$user_id_1 dup key: { : 212 }
|
|
|
|
.. seealso:: The MongoDB documentation on `indexes <https://www.mongodb.com/docs/manual/indexes/>`_
|
|
|
|
Task Cancellation
|
|
-----------------
|
|
`Cancelling <https://docs.python.org/3/library/asyncio-task.html#task-cancellation>`_ an asyncio Task
|
|
that is running a PyMongo operation is treated as a fatal interrupt. Any connections, cursors, and transactions
|
|
involved in a cancelled Task will be safely closed and cleaned up as part of the cancellation. If those resources are
|
|
also used elsewhere, attempting to utilize them after the cancellation will result in an error.
|