The Aggregation Framework in MongoDB is a powerful tool for data processing and analysis and its widely used in many PHP Applications. It allows you to perform complex data transformations and aggregations on collections, enabling you to extract valuable insights from your data. The Aggregation Framework operates on a set of documents and pipelines, which consist of stages that define the data processing steps.
Here's an explanation of how the Aggregation Framework works:
Pipeline Stages:
An aggregation pipeline consists of multiple stages, where each stage represents a specific data processing step. Each stage takes input from the previous stage and produces output for the next stage. There are several pipeline stages available in the Aggregation Framework, including $match, $group, $project, $sort, $limit, $skip,
and many more.
Filtering Documents with $match:
The $match stage allows you to filter documents based on specific conditions. It works similar to the find() method and uses the same query syntax. You can specify criteria using query operators to match documents that satisfy certain conditions.
//...
$collection = (new MongoDB\Client)->mydatabase->mycollection;
$keyword = 'example';
$pipeline = [
[
'$match' => [
'field' => $keyword,
'otherField' => ['$gt' => 10]
]
]
];
$options = [];
$cursor = $collection->aggregate($pipeline, $options);
foreach ($cursor as $document) {
// Process each matching document
}
//...
In the above example, we have a collection named mycollection
in the mydatabase
. We want to filter documents based on the values of the fields field
and otherField
.
Within the $match
stage of the pipeline, we specify the fields and their respective values to match. In this case, we filter the documents where the field field is equal to the value stored in $keyword,
and the field otherField
is greater than 10.
You can add more fields and conditions within the $match
stage to further filter the documents.
You can also specify additional options when executing the aggregation pipeline using the $options
parameter. For example, you can control the maximum number of documents returned using the limit option, and you can specify the order in which documents are returned using the sort option.
Finally, we execute the aggregation pipeline using the aggregate()
method, passing the pipeline and options as parameters. The resulting cursor iterates over the matching documents, and you can process each document as needed.
You can extend this example to include more complex filtering conditions using various operators such as $eq, $gt, $lt, $in, $regex,
and logical operators like $and, $or, and $not
, depending on your specific filtering needs.
Grouping Documents with $group:
The $group
stage enables you to group documents based on a specific field or set of fields. It allows you to perform aggregations like sum, count, average, minimum, maximum, and more on the grouped data. You can use various aggregation operators like $sum, $avg,$min, $max, and $push
to perform calculations within the groups.
//...
$collection = (new MongoDB\Client)->mydatabase->mycollection;
$pipeline = [
[
'$group' => [
'_id' => '$category',
'count' => ['$sum' => 1],
'totalAmount' => ['$sum' => '$amount']
]
]
];
$options = [];
$cursor = $collection->aggregate($pipeline, $options);
foreach ($cursor as $document) {
$category = $document['_id'];
$count = $document['count'];
$totalAmount = $document['totalAmount'];
// Process the grouped data
echo "Category: $category\n";
echo "Count: $count\n";
echo "Total Amount: $totalAmount\n";
echo "\n";
}
//...
In the above example, we have a collection named mycollection
in the mydatabase
. We want to group documents based on the value of the category field.
Within the $group
stage of the pipeline, we specify the field _id
to indicate the field by which we want to group the documents. In this case, we group the documents based on the category field.
Inside the $group
stage, we use the $sum
operator to perform aggregations within each group. We calculate the count of documents in each group using count, and the sum of the amount field within each group using totalAmount
.
You can add additional accumulators and calculations within the $group
stage to perform various aggregations and transformations on the grouped data.
You can also specify additional options when executing the aggregation pipeline using the $options
parameter. For example, you can control the maximum number of documents returned using the limit option, and you can specify the order in which documents are returned using the sort option.
Finally, we execute the aggregation pipeline using the aggregate()
method, passing the pipeline and options as parameters. The resulting cursor iterates over the grouped data, and you can access the grouped fields and their respective aggregations for further processing.
You can extend this example to include more complex grouping operations, perform additional aggregations like calculating averages, minimums, maximums, and apply various operators and expressions based on your specific grouping needs.
Modifying Documents with $project:
The $project stage allows you to reshape the documents by including or excluding specific fields. You can also create new fields, modify existing fields, rename fields, and perform various data transformations using expressions. It is useful for shaping the output of the aggregation pipeline.
/...
$collection = (new MongoDB\Client)->mydatabase->mycollection;
$pipeline = [
[
'$project' => [
'newField' => '$existingField',
'newField2' => ['$add' => ['$existingField1', '$existingField2']],
'existingField3' => 0 // Exclude existingField3
]
]
];
$options = [];
$cursor = $collection->aggregate($pipeline, $options);
foreach ($cursor as $document) {
// Process each modified document
}
//...
In the above example, we have a collection named mycollection
in the mydatabase
. We want to reshape and modify the documents using the $project
stage.
Within the $project
stage of the pipeline, we specify the fields we want to include or exclude in the output document. In this case, we create a new field newField
and assign it the value of an existing field existingField
. We also create newField2
by adding the values of existingField1
and existingField2
. Additionally, we exclude existingField3
from the output document.
You can add more fields to include or exclude, perform calculations and transformations, rename fields, or even include computed fields using expressions and operators provided by the Aggregation Framework.
You can also specify additional options when executing the aggregation pipeline using the $options
parameter. For example, you can control the maximum number of documents returned using the limit option, and you can specify the order in which documents are returned using the sort option.
Finally, we execute the aggregation pipeline using the aggregate()
method, passing the pipeline and options as parameters. The resulting cursor iterates over the modified documents, and you can process each document as needed.
Sorting Documents with $sort:
The $sort
stage is used to sort the documents based on one or more fields. It enables you to specify the sort order, which can be ascending (1) or descending (-1).
/...
$collection = (new MongoDB\Client)->mydatabase->mycollection;
$pipeline = [
[
'$sort' => [
'field1' => 1, // Ascending order
'field2' => -1 // Descending order
]
]
];
$options = [];
$cursor = $collection->aggregate($pipeline, $options);
foreach ($cursor as $document) {
// Process each sorted document
}
//...
We want to sort the documents based on the values of field1
and field2
.
Within the $sort
stage of the pipeline, we specify the fields we want to sort on and the sort order. In this case, we sort field1
in ascending order (1) and field2
in descending order (-1).
You can add more fields to the $sort
stage to define a multi-field sort order. MongoDB applies the sorting based on the order in which fields are listed within the $sort
stage.
Limiting and Skipping Documents with $limit and $skip:
The $limit
stage is used to limit the number of documents returned in the result set, while the $skip
stage allows you to skip a certain number of documents before returning the remaining ones. These stages are useful for pagination and controlling the amount of data returned.
//...
$collection = (new MongoDB\Client)->mydatabase->mycollection;
$limit = 10; // Number of documents to limit
$skip = 5; // Number of documents to skip
$pipeline = [
['$skip' => $skip],
['$limit' => $limit]
];
$options = [];
$cursor = $collection->aggregate($pipeline, $options);
foreach ($cursor as $document) {
// Process each limited and skipped document
}
//...
We want to limit the number of documents returned to 10 and skip the first 5 documents.
Within the pipeline, we use the $skip
stage to specify the number of documents to skip ($skip = 5
in this example) and the $limit stage to limit the number of documents to retrieve ($limit = 10
in this example).
You can adjust the values of $skip
and $limit
based on your specific requirements.
You can also specify additional options when executing the aggregation pipeline using the $options
parameter. For example, you can control the maximum number of documents returned using the maxTimeMS
option.
Performing Aggregations and Transformations:
You can combine multiple pipeline stages to perform complex aggregations and transformations. By chaining the stages together in the desired order, you can create sophisticated data processing pipelines. Each stage processes the data sequentially, with the output of one stage serving as the input for the next stage.
//...
$collection = (new MongoDB\Client)->mydatabase->mycollection;
$category = 'example';
$limit = 10;
$skip = 5;
$pipeline = [
['$match' => ['category' => $category]],
['$group' => ['_id' => '$category', 'count' => ['$sum' => 1]]],
['$sort' => ['count' => -1]],
['$skip' => $skip],
['$limit' => $limit]
];
$options = [];
$cursor = $collection->aggregate($pipeline, $options);
foreach ($cursor as $document) {
// Process each document matching the pipeline stages
}
//...
In the above example, we have a collection named mycollection
in the mydatabase
. We want to perform the following operations:
Filter documents based on the category
field value using the $match
stage.
Group the filtered documents by the category
field and calculate the count using the $group
stage.
Sort the grouped documents in descending order based on the count using the $sort
stage.
Skip the first 5 documents using the $skip
stage.
Limit the result to retrieve only 10 documents using the $limit
stage.
The Aggregation Framework in MongoDB provides a flexible and efficient way to perform data analysis and manipulation. It is particularly useful when you need to process large volumes of data, perform complex aggregations, and transform data within the database itself, reducing the need for multiple round trips to the application layer.
More articles on this topic:
What is MongoDB and how to use it from PHP
For more details check on the official MongoDB Documentation.