180 lines
5.8 KiB
ReStructuredText
180 lines
5.8 KiB
ReStructuredText
Aggregation Examples
|
|
====================
|
|
|
|
There are several methods of performing aggregations in MongoDB. These
|
|
examples cover the new aggregation framework, using map reduce and using the
|
|
group method.
|
|
|
|
.. testsetup::
|
|
|
|
from pymongo import MongoClient
|
|
client = MongoClient()
|
|
client.drop_database('aggregation_example')
|
|
|
|
Setup
|
|
-----
|
|
To start, we'll insert some example data which we can perform
|
|
aggregations on:
|
|
|
|
.. doctest::
|
|
|
|
>>> from pymongo import MongoClient
|
|
>>> db = MongoClient().aggregation_example
|
|
>>> result = db.things.insert_many([{"x": 1, "tags": ["dog", "cat"]},
|
|
... {"x": 2, "tags": ["cat"]},
|
|
... {"x": 2, "tags": ["mouse", "cat", "dog"]},
|
|
... {"x": 3, "tags": []}])
|
|
>>> result.inserted_ids
|
|
[ObjectId('...'), ObjectId('...'), ObjectId('...'), ObjectId('...')]
|
|
|
|
.. _aggregate-examples:
|
|
|
|
Aggregation Framework
|
|
---------------------
|
|
|
|
This example shows how to use the
|
|
:meth:`~pymongo.collection.Collection.aggregate` method to use the aggregation
|
|
framework. We'll perform a simple aggregation to count the number of
|
|
occurrences for each tag in the ``tags`` array, across the entire collection.
|
|
To achieve this we need to pass in three operations to the pipeline.
|
|
First, we need to unwind the ``tags`` array, then group by the tags and
|
|
sum them up, finally we sort by count.
|
|
|
|
As python dictionaries don't maintain order you should use :class:`~bson.son.SON`
|
|
or :class:`collections.OrderedDict` where explicit ordering is required
|
|
eg "$sort":
|
|
|
|
.. note::
|
|
|
|
aggregate requires server version **>= 2.1.0**.
|
|
|
|
.. doctest::
|
|
|
|
>>> from bson.son import SON
|
|
>>> pipeline = [
|
|
... {"$unwind": "$tags"},
|
|
... {"$group": {"_id": "$tags", "count": {"$sum": 1}}},
|
|
... {"$sort": SON([("count", -1), ("_id", -1)])}
|
|
... ]
|
|
>>> import pprint
|
|
>>> pprint.pprint(list(db.things.aggregate(pipeline)))
|
|
[{u'_id': u'cat', u'count': 3},
|
|
{u'_id': u'dog', u'count': 2},
|
|
{u'_id': u'mouse', u'count': 1}]
|
|
|
|
To run an explain plan for this aggregation use the
|
|
:meth:`~pymongo.database.Database.command` method::
|
|
|
|
>>> db.command('aggregate', 'things', pipeline=pipeline, explain=True)
|
|
{u'ok': 1.0, u'stages': [...]}
|
|
|
|
As well as simple aggregations the aggregation framework provides projection
|
|
capabilities to reshape the returned data. Using projections and aggregation,
|
|
you can add computed fields, create new virtual sub-objects, and extract
|
|
sub-fields into the top-level of results.
|
|
|
|
.. seealso:: The full documentation for MongoDB's `aggregation framework
|
|
<http://docs.mongodb.org/manual/applications/aggregation>`_
|
|
|
|
Map/Reduce
|
|
----------
|
|
|
|
Another option for aggregation is to use the map reduce framework. Here we
|
|
will define **map** and **reduce** functions to also count the number of
|
|
occurrences for each tag in the ``tags`` array, across the entire collection.
|
|
|
|
Our **map** function just emits a single `(key, 1)` pair for each tag in
|
|
the array:
|
|
|
|
.. doctest::
|
|
|
|
>>> from bson.code import Code
|
|
>>> mapper = Code("""
|
|
... function () {
|
|
... this.tags.forEach(function(z) {
|
|
... emit(z, 1);
|
|
... });
|
|
... }
|
|
... """)
|
|
|
|
The **reduce** function sums over all of the emitted values for a given key:
|
|
|
|
.. doctest::
|
|
|
|
>>> reducer = Code("""
|
|
... function (key, values) {
|
|
... var total = 0;
|
|
... for (var i = 0; i < values.length; i++) {
|
|
... total += values[i];
|
|
... }
|
|
... return total;
|
|
... }
|
|
... """)
|
|
|
|
.. note:: We can't just return ``values.length`` as the **reduce** function
|
|
might be called iteratively on the results of other reduce steps.
|
|
|
|
Finally, we call :meth:`~pymongo.collection.Collection.map_reduce` and
|
|
iterate over the result collection:
|
|
|
|
.. doctest::
|
|
|
|
>>> result = db.things.map_reduce(mapper, reducer, "myresults")
|
|
>>> for doc in result.find():
|
|
... pprint.pprint(doc)
|
|
...
|
|
{u'_id': u'cat', u'value': 3.0}
|
|
{u'_id': u'dog', u'value': 2.0}
|
|
{u'_id': u'mouse', u'value': 1.0}
|
|
|
|
Advanced Map/Reduce
|
|
-------------------
|
|
|
|
PyMongo's API supports all of the features of MongoDB's map/reduce engine.
|
|
One interesting feature is the ability to get more detailed results when
|
|
desired, by passing `full_response=True` to
|
|
:meth:`~pymongo.collection.Collection.map_reduce`. This returns the full
|
|
response to the map/reduce command, rather than just the result collection:
|
|
|
|
.. doctest::
|
|
|
|
>>> pprint.pprint(
|
|
... db.things.map_reduce(mapper, reducer, "myresults", full_response=True))
|
|
{...u'counts': {u'emit': 6, u'input': 4, u'output': 3, u'reduce': 2},
|
|
u'ok': ...,
|
|
u'result': u'...',
|
|
u'timeMillis': ...}
|
|
|
|
All of the optional map/reduce parameters are also supported, simply pass them
|
|
as keyword arguments. In this example we use the `query` parameter to limit the
|
|
documents that will be mapped over:
|
|
|
|
.. doctest::
|
|
|
|
>>> results = db.things.map_reduce(
|
|
... mapper, reducer, "myresults", query={"x": {"$lt": 2}})
|
|
>>> for doc in results.find():
|
|
... pprint.pprint(doc)
|
|
...
|
|
{u'_id': u'cat', u'value': 1.0}
|
|
{u'_id': u'dog', u'value': 1.0}
|
|
|
|
You can use :class:`~bson.son.SON` or :class:`collections.OrderedDict` to
|
|
specify a different database to store the result collection:
|
|
|
|
.. doctest::
|
|
|
|
>>> from bson.son import SON
|
|
>>> pprint.pprint(
|
|
... db.things.map_reduce(
|
|
... mapper,
|
|
... reducer,
|
|
... out=SON([("replace", "results"), ("db", "outdb")]),
|
|
... full_response=True))
|
|
{...u'counts': {u'emit': 6, u'input': 4, u'output': 3, u'reduce': 2},
|
|
u'ok': ...,
|
|
u'result': {u'collection': ..., u'db': ...},
|
|
u'timeMillis': ...}
|
|
|
|
.. seealso:: The full list of options for MongoDB's `map reduce engine <http://www.mongodb.org/display/DOCS/MapReduce>`_
|