4- MongoDB aggregation framework (1)
4- MongoDB aggregation framework (1)
1
MongoDB
Aggregation Framework
2
What is the MongoDB aggregation framework?
3
Why do we need Aggregation Framework
• We might want to aggregate, as in group or modify our data in some way, instead of always just
filtering for the right documents.
• We can also calculate using aggregation.
• With MQL: we can filter and update data
• With Aggregation Framework: we can compute and reshape data
4
Example
Let's find all documents that have Wi-Fi as one of the amenities, only include the price and
address in the resulting cursor:
db.listingsAndReviews.find(
{ amenities : 'Wifi' },
{ price : 1, address : 1, _id : 0 }
)
db.listingsAndReviews.aggregate(
[ { $match : { amenities : 'Wifi' } },
{ $project : { price : 1, address : 1, _id : 0 } } ]
)
5
Pipeline
6
Pipeline
$match $project $group
{ }
7
Aggregation Structure and Syntax
Syntax: db.collection.aggregate( [ { <stage1> }, { <stage2> }, ... ], {option})
Example:
db.solarSystem.aggregate( [
{ $match : { atmosphericComposition: { $in : [/O2/] }, meanTemperature: { $gte : -40, $lte :40} } },
{ $project : { _id : 0, name : 1, hasMoons: { $gt : [ '$numberOfMoons', 0 ] } } }
],
{ allowDiskUse : true }
)
• Pipelines are always an array of one or more stages.
• Stages are composed of one or more aggregation operators or expressions
• Expressions may take a single argument or an array of arguments (Read More)
8
Common Pipeline States/Expression
Method Description
Filters the documents to pass only the documents that match the specified condition(s) to the next
$match()
pipeline stage. (Read more)
Reshapes each document in the stream, such as by adding new fields or removing existing fields.
$project()
(Read more)
$group() Group documents in collection, which can be used for statistics. (Read more)
Deconstructs an array field from the input documents to output a document for each element. (Read
$unwind()
more)
Performs a left outer join to an unsharded collection in the same database to filter in documents
$lookup()
from the "joined" collection for processing. (Read more)
Restricts the contents of the documents based on information stored in the documents themselves.
$redact
(Read more)
Takes the documents returned by the aggregation pipeline and writes them to a specified collection.
$out
(Read more)
$merge Writes the results of the aggregation pipeline to a specified collection. (Read more)
• Filters the document stream to allow only matching documents to pass unmodified into the next
pipeline stage.
• Place the $match as early in the aggregation pipeline as possible.
• $match can be used multiple times in pipeline.
• $match uses standard MongoDB query operators.
• you cannot use $where with $match.
10
• Example:
$match stage
db. sinhvien.aggregate( [ { $match : { ten: { $eq: 'Nở' } } } ] )
db. sinhvien.aggregate( [
{ $match : { ten: { $eq: 'Nở' } } },
{ $count: 'TongSoSV' }
])
db.sinhvien.aggregate( [
{ $match : {
$and : [
{ 'lienLac.email' : 'teo@gmail.com' },
{ _id : { $eq : '57' } }
]
}
}])
11
$project stage - Shaping documents
Syntax: { $project : { <specification(s)> } }
• With $project state we can selectively remove and retain fields and also reassign existing field values and
derive entirely new fields.
Example:
db.solarSystem.aggregate( [ { $project : { _id : 0, name : 1, gravity: 1} } ] )
db.solarSystem.aggregate( [ { $project : { _id : 0, name : 1, 'gravity.value' : 1} } ] )
db.solarSystem.aggregate( [ { $project : { _id : 0, name : 1, surfacegravity: '$gravity.value' } } ] )
db.solarSystem.aggregate( [ { $project : { _id : 0, name : 1, new_surfacegravity: {
$multiply: [
{ $divide: [ '$gravity.value', 10 ] }, 100
]
}}}])
12
Accumulator Expression with $project stage
• Accumulator expressions within $project work over an array within the
given document
• Some of accumulator expressions: $avg, $min, $max, $sum, …
• We're going to explore 'icecream_data' collection in dbtest database
Example:
db.icecream_data.aggregate( [ { $project : { max_high: { $max: '$trends.avg_high_tmp' } } } ] )
[ { _id : ObjectId('59bff494f70ff89cacc36f90'), max_high: 87 } ]
13
$group stage
Syntax: { $group : { _id : <expression>, <field1>: { <accumulator1> : <expression1> }, ... } }
Field Description
_id Required. If you specify an _id value of null, or any other constant value, the $group stage
calculates accumulated values for all the input documents as a whole.
field Optional. Computed using the accumulator operators.
• Groups input documents by the specified _id expression and for each distinct grouping, outputs a
document.
• The _id field of each output document contains the unique group by value.
• The output documents can also contain computed fields that hold the values of some accumulator
expression.
14
$group stage
Example:
db.movies.aggregate( [ { $group : { _id : '$year', 'numFilmsThisYear' : { $sum: 1 } } } ] )
db.movies.aggregate( [
{ $group : {
_id : '$year' ,
'numFilmsThisYear': { $sum: 1 }
} },
{ $sort: { _id : 1} }
])
15
$group stage
• Grouping as before, then sorting in descending order based on the count
db.movies.aggregate( [
{ $group : { _id : '$year', count : { $sum : 1 } } },
{ $sort : { count : -1} }
])
• Grouping on the number of directors a film has, demonstrating that we have to validate types to protect some
expressions
db.movies.aggregate( [
{ $group : {
_id : { numDirectors : { $cond : [ { $isArray : '$directors' }, { $size : '$directors' }, 0 ] } },
numFilms : { $sum : 1 }, averageMetacritic: { $avg : '$metacritic' } } },
{ $sort : { '_id.numDirectors' : -1 } }
])
16
$group stage
• Grouping on multiple columns
db.movies.aggregate( [
{ $group :
{ _id : { year : '$year', type : '$type' },
count : { $sum : 1 }, title : { $first : '$title' }
} },
{ $sort : { count : -1 } }
])
• Showing how to group all documents together. By convention, we use null or an empty string
db.movies.aggregate( [ { $group : { _id : null, count : { $sum: 1 } } } ] )
17
$unwind stage
Syntax: { $unwind : <field path> }
• Deconstructs an array field from the input documents to output a document for each element. Each output
document is the input document with the value of the array field replaced by the element.
Example:
18
$unwind stage
• How to to group on year and genres of Movies collection?
db.movies.aggregate( [
{ $unwind : '$genres' },
{ $group : { _id : {year : '$year', genre : '$genres' }, numFilms : { $sum: 1 }, } },
{ $sort : { '_id.genre' : 1, numFilms: -1} }
])
• Finding the top rated genres per year from 1990 to 2015...
db.movies.aggregate( [
{ $match : { 'imdb.rating' : { $gt : 0 }, year: { $gte : 1990, $lte : 2015 }, runtime : { $gte : 90 } } },
{ $unwind : '$genres' },
{ $group : { _id : { year : '$year' , genre : '$genres' }, average_rating : { $avg: '$imdb.rating' } } },
{ $sort : { '_id.year' : -1, average_rating : -1 } }
])
19
$unwind stage
Recap on a few things:
• $unwind only works on an array of values.
• Using unwind on large collections with big documents may lead to performance issues.
20
$lookup stage
Syntax: equality match with a single join condition
{ $lookup : {
from : <collection to join>,
localField : <field from the input documents>,
foreignField : <field from the documents of the 'from' collection>,
as : <output array field>
}
}
• Performs a left outer join to an unsharded collection in the same database to filter in documents from the
'joined' collection for processing.
• To each input document, the $lookup stage adds a new array field whose elements are the matching
documents from the 'joined' collection.
21
$lookup stage
Example:
db.air_alliances.aggregate( [
{ $lookup : {
from : 'air_airlines',
localField : 'airlines',
foreignField : 'name',
as : 'airlines' }
}
])
22
$lookup stage
Example:
db.air_alliances.aggregate( [
{ $match : { name : 'SkyTeam' } },
{ $lookup : {
from : 'air_airlines',
localField : 'airlines',
foreignField : 'name',
as : 'airlines' }
}
])
23
$lookup stage
Syntax: correlated subqueries using concise syntax (New in version 5.0)
{ $lookup : {
from : <collection to join>,
localField : <field from local collection's documents>,
foreignField : <field from foreign collection's documents>,
let : { <var_1>: <expression>, …, <var_n>: <expression> },
pipeline : [ <pipeline to run> ],
as : <output array field>
}
}
• let: Optional. Specifies the variables to use in the pipeline stages. Use the variable expressions to access the
document fields that are input to the pipeline
• pipeline: determines the resulting documents from the joined collection. To return all documents, specify an
empty pipeline []. The pipeline cannot directly access the document fields. Instead, define variables for the
document fields using the let option and then reference the variables in the pipeline stages
24
$lookup stage
Examples:
db.air_alliances.aggregate( [
{ $lookup : {
from : 'air_airlines',
localField : 'airlines',
foreignField : 'name',
pipeline: [ { $project : { _id : 0, name : 1, country : 1 } } ],
as : 'airlines' } }
])
25
$lookup stage
Examples:
Collection: Restaurants
Collection: Orders
26
$lookup stage
Examples:
db.Orders.aggregate( [
{ $lookup : {
from : 'Restaurants',
localField : 'restaurant_name',
foreignField : 'name',
let : { orders_drink : '$drink' },
pipeline : [ {
$match : { $expr : { $in : [ '$$orders_drink', '$beverages' ] } }
} ],
as : 'matches'
}
}])
Example: list of warehouses with product quantity greater than or equal to the ordered product quantity.
Collection: item_orders
Collection: warehouses
28
$lookup stage
Example: list of warehouses with product quantity greater than or equal to the ordered product quantity.
db.item_orders.aggregate( [
{ $lookup : {
from : 'warehouses',
let : { order_item : '$item', order_qty : '$ordered' },
pipeline : [
{ $match : { $expr:
{ $and: [
{ $eq: [ '$stock_item', '$$order_item' ] },
{ $gte : [ '$instock', '$$order_qty' ] }
] } } },
{ $project : { stock_item: 0, _id : 0 } }
],
as: 'stockdata' }
}])
29
$redact stage
Syntax: { $redact : <expression> } (Read more about <expression>)
Restricts the contents of the documents based on information stored in the documents themselves.
The argument can be any valid expression as long as it resolves to the $$DESCEND, $$PRUNE, or
$$KEEP system variables.
30
$redact stage
Examples:
db.employees.aggregate( [ { $match: { employee_ID : '04f28c2a-f288-4194-accc-cfc1b585eee6' } } ] )
level 1
level 2
level 3
31
$redact stage
Examples:
db.employees.aggregate( [ { $redact : { $cond : [ { $in : [ 'Finance’, '$acl' ] }, '$$DESCEND', '$$PRUNE'] } } ] )
db.employees.aggregate( [ { $redact : { $cond : [ { $in : ['Management', '$acl'] }, '$$DESCEND', '$$PRUNE'] } } ] )
db.employees.aggregate( [ { $redact : { $cond : [ { $in : ['HR', '$acl'] }, '$$DESCEND', '$$PRUNE'] } } ] )
32
$out stage
• Takes the documents returned by the aggregation pipeline and writes them to a specified collection
• The $out stage must be the last stage in the pipeline. The $out operator lets the aggregation framework
return result sets of any size
Syntax: { $out : { db : <output-db> , coll : <output-collection> } }
• The $out operation creates a new collection if one does not already exist.
• If the collection specified by the $out operation already exists, the $out stage atomically replaces the
existing collection with the new results collection
Example:
db.movies.aggregate( [
{ $group : { _id : '$year', count : { $sum : 1 } } },
{ $sort : { count : -1 } },
{ $out : { db : 'reporting' , coll : 'movies' } }
])
33
$merge stage
Writes the results of the aggregation pipeline to a specified collection. The $merge operator must be the last
stage in the pipeline
• Can output to a collection in the same or different database.
• Creates a new collection if the output collection does not already exist
• Can incorporate results (insert new documents, merge documents, replace documents, keep existing
documents, process documents with a custom update pipeline) into an existing collection.
Syntax:
{ $merge: {
into : <collection> -or- { db : <db>, coll : <collection> },
on : <identifier field> -or- [ <identifier field1>, ...], // Optional
let : <variables>, // Optional
whenMatched : <replace|keepExisting|merge|fail|pipeline>, // Optional
whenNotMatched : <insert|discard|fail> // Optional
}}
34
$merge stage
Examples:
db.movies.aggregate( [
{ $group : { _id : '$year', count : { $sum : 1 } } },
{ $sort : { count : -1 } },
{ $out : { db : 'reporting', coll : 'movies2' } }
])
db.movies.aggregate( [
{ $group : { _id : '$year', count : { $sum : 1 }, title : { '$first' : '$title' } } },
{ $sort : { count : -1 } },
{ $merge : {
into : { db : 'reporting', coll : 'movies2' },
on : '_id',
whenMatched : 'merge',
whenNotMatched : 'insert' } }
])
35
Views
• A MongoDB view is a queryable object whose contents are defined by an aggregation pipeline on
other collections or views
• MongoDB does not persist the view contents to disk. A view's content is computed on-demand
when a client queries the view
• You can:
✓ Create a view on a collection of employee data to exclude any private or personal information
(PII). Applications can query the view for employee data that does not contain any PII.
✓ Create a view on a collection of collected sensor data to add computed fields and metrics.
Applications can use simple find operations to query the data
✓ …
36
Create View
Syntax: db.createView( <viewName> , <source> , [<pipeline>] , <options> )
Parameter Type Description
<viewName> String The name of the view to create.
<source> String The name of the source collection or view from which to create the view.
You must create views in the same database as the source collection.
<pipeline> Array An array that consists of the aggregation pipeline stage(s). The view
definition pipeline cannot include the $out or the $merge stage
<options> Document Optional. Additional options for the method.