0% found this document useful (0 votes)
3 views

4- MongoDB aggregation framework (1)

The document provides an overview of the MongoDB Aggregation Framework, explaining its purpose, structure, and syntax. It details various stages of the aggregation pipeline such as $match, $project, $group, $unwind, and $lookup, along with examples for each stage. The document emphasizes the importance of the order of operations in the pipeline and includes practical examples to illustrate how to manipulate and aggregate data in MongoDB.

Uploaded by

hao.dep.dzai3105
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

4- MongoDB aggregation framework (1)

The document provides an overview of the MongoDB Aggregation Framework, explaining its purpose, structure, and syntax. It details various stages of the aggregation pipeline such as $match, $project, $group, $unwind, and $lookup, along with examples for each stage. The document emphasizes the importance of the order of operations in the pipeline and includes practical examples to illustrate how to manipulate and aggregate data in MongoDB.

Uploaded by

hao.dep.dzai3105
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

mongoDB

Bộ môn: Kỹ Thuật Phần Mềm

Giáo viên: Trần Thế Trung.


Email: tranthetrung@iuh.edu.vn

1
MongoDB
Aggregation Framework

1.Giới thiệu về Aggregation Framework.


2.Cấu trúc và cú pháp sử dụng Aggregation.
3.Tạo và sử dụng View.

2
What is the MongoDB aggregation framework?

• In its simplest form, is just another way to


query data in MongoDB
• Everything we know how to do using the
MongoDB query language (MQL) can also
MongoDB be done using the Aggregation framework
MongoDB
Aggregation Query
Framework Language

3
Why do we need Aggregation Framework
• We might want to aggregate, as in group or modify our data in some way, instead of always just
filtering for the right documents.
• We can also calculate using aggregation.
• With MQL: we can filter and update data
• With Aggregation Framework: we can compute and reshape data

4
Example
Let's find all documents that have Wi-Fi as one of the amenities, only include the price and
address in the resulting cursor:

db.listingsAndReviews.find(
{ amenities : 'Wifi' },
{ price : 1, address : 1, _id : 0 }
)

db.listingsAndReviews.aggregate(
[ { $match : { amenities : 'Wifi' } },
{ $project : { price : 1, address : 1, _id : 0 } } ]
)

5
Pipeline

6
Pipeline
$match $project $group

{ }

• Aggregation framework works as a pipeline, where the order of actions in


the pipeline matters.
• In the db.collection.aggregate() method, pipeline stages appear in an
array
db.collection.aggregate( [ { <stage1> }, { <stage2> }, ... ] )
• Each action is executed in the order in which we list it.

7
Aggregation Structure and Syntax
Syntax: db.collection.aggregate( [ { <stage1> }, { <stage2> }, ... ], {option})

• Each stage is a JSON object of key value pairs.


• Options may be passed in. For example, specifying whether to allow disk use for large aggregations, or to
view the explain plan of the aggregation to see whether it is using indexes.

Example:
db.solarSystem.aggregate( [
{ $match : { atmosphericComposition: { $in : [/O2/] }, meanTemperature: { $gte : -40, $lte :40} } },
{ $project : { _id : 0, name : 1, hasMoons: { $gt : [ '$numberOfMoons', 0 ] } } }
],
{ allowDiskUse : true }
)
• Pipelines are always an array of one or more stages.
• Stages are composed of one or more aggregation operators or expressions
• Expressions may take a single argument or an array of arguments (Read More)

8
Common Pipeline States/Expression
Method Description
Filters the documents to pass only the documents that match the specified condition(s) to the next
$match()
pipeline stage. (Read more)
Reshapes each document in the stream, such as by adding new fields or removing existing fields.
$project()
(Read more)
$group() Group documents in collection, which can be used for statistics. (Read more)
Deconstructs an array field from the input documents to output a document for each element. (Read
$unwind()
more)
Performs a left outer join to an unsharded collection in the same database to filter in documents
$lookup()
from the "joined" collection for processing. (Read more)
Restricts the contents of the documents based on information stored in the documents themselves.
$redact
(Read more)
Takes the documents returned by the aggregation pipeline and writes them to a specified collection.
$out
(Read more)
$merge Writes the results of the aggregation pipeline to a specified collection. (Read more)

self study: $sort(), $limit(), $skip(), … (Read More)


9
$match stage
Syntax: { $match : { <query> } }

• Filters the document stream to allow only matching documents to pass unmodified into the next
pipeline stage.
• Place the $match as early in the aggregation pipeline as possible.
• $match can be used multiple times in pipeline.
• $match uses standard MongoDB query operators.
• you cannot use $where with $match.

10
• Example:
$match stage
db. sinhvien.aggregate( [ { $match : { ten: { $eq: 'Nở' } } } ] )

db. sinhvien.aggregate( [
{ $match : { ten: { $eq: 'Nở' } } },
{ $count: 'TongSoSV' }
])

db.sinhvien.aggregate( [
{ $match : {
$and : [
{ 'lienLac.email' : 'teo@gmail.com' },
{ _id : { $eq : '57' } }
]
}
}])

11
$project stage - Shaping documents
Syntax: { $project : { <specification(s)> } }

• With $project state we can selectively remove and retain fields and also reassign existing field values and
derive entirely new fields.

• $project can be used as many times as required with an aggregation pipeline

Example:
db.solarSystem.aggregate( [ { $project : { _id : 0, name : 1, gravity: 1} } ] )
db.solarSystem.aggregate( [ { $project : { _id : 0, name : 1, 'gravity.value' : 1} } ] )
db.solarSystem.aggregate( [ { $project : { _id : 0, name : 1, surfacegravity: '$gravity.value' } } ] )
db.solarSystem.aggregate( [ { $project : { _id : 0, name : 1, new_surfacegravity: {
$multiply: [
{ $divide: [ '$gravity.value', 10 ] }, 100
]
}}}])
12
Accumulator Expression with $project stage
• Accumulator expressions within $project work over an array within the
given document
• Some of accumulator expressions: $avg, $min, $max, $sum, …
• We're going to explore 'icecream_data' collection in dbtest database

Example:
db.icecream_data.aggregate( [ { $project : { max_high: { $max: '$trends.avg_high_tmp' } } } ] )
[ { _id : ObjectId('59bff494f70ff89cacc36f90'), max_high: 87 } ]

db.icecream_data.aggregate( [ { $project : { min_low: { $min: '$trends.avg_low_tmp' } } } ] )


[ { _id : ObjectId('59bff494f70ff89cacc36f90'), min_low: 27 } ]

db.icecream_data.aggregate( [ { $project : { average_sale: { $avg: '$trends.icecream_sales_in_millions' } } } ] )


[ { _id : ObjectId('59bff494f70ff89cacc36f90'),
average_sale: 133.41666666666666 } ]

13
$group stage
Syntax: { $group : { _id : <expression>, <field1>: { <accumulator1> : <expression1> }, ... } }
Field Description
_id Required. If you specify an _id value of null, or any other constant value, the $group stage
calculates accumulated values for all the input documents as a whole.
field Optional. Computed using the accumulator operators.

• Groups input documents by the specified _id expression and for each distinct grouping, outputs a
document.
• The _id field of each output document contains the unique group by value.
• The output documents can also contain computed fields that hold the values of some accumulator
expression.

14
$group stage
Example:
db.movies.aggregate( [ { $group : { _id : '$year', 'numFilmsThisYear' : { $sum: 1 } } } ] )

access the value of


year's field

db.movies.aggregate( [
{ $group : {
_id : '$year' ,
'numFilmsThisYear': { $sum: 1 }
} },
{ $sort: { _id : 1} }
])

15
$group stage
• Grouping as before, then sorting in descending order based on the count
db.movies.aggregate( [
{ $group : { _id : '$year', count : { $sum : 1 } } },
{ $sort : { count : -1} }
])

• Grouping on the number of directors a film has, demonstrating that we have to validate types to protect some
expressions
db.movies.aggregate( [
{ $group : {
_id : { numDirectors : { $cond : [ { $isArray : '$directors' }, { $size : '$directors' }, 0 ] } },
numFilms : { $sum : 1 }, averageMetacritic: { $avg : '$metacritic' } } },
{ $sort : { '_id.numDirectors' : -1 } }
])

16
$group stage
• Grouping on multiple columns
db.movies.aggregate( [
{ $group :
{ _id : { year : '$year', type : '$type' },
count : { $sum : 1 }, title : { $first : '$title' }
} },
{ $sort : { count : -1 } }
])

• Showing how to group all documents together. By convention, we use null or an empty string
db.movies.aggregate( [ { $group : { _id : null, count : { $sum: 1 } } } ] )

• Filtering results to only get documents with a numeric metacritic value


db.movies.aggregate( [
{ $match : { metacritic : { $gte : 0 } } },
{ $group : { _id : null, averageMetacritic: { $avg: '$metacritic' } } }
])

17
$unwind stage
Syntax: { $unwind : <field path> }
• Deconstructs an array field from the input documents to output a document for each element. Each output
document is the input document with the value of the array field replaced by the element.

Example:

{ Title : 'The Martian' , genres : [ 'Action', 'Adventure', 'Sci-Fi' ] }


{ Title : 'Batman Begins' , genres : [ 'Action' , 'Adventure' ] }

{ Title : 'The Martian' , genres : 'Action' }


$unwind : '$genres'
{ Title : 'The Martian' , genres : 'Adventure' }
{ Title : 'The Martian' , genres : 'Sci_Fi' }
{ Title : 'Batman Begins' , genres : 'Action' }
{ Title : 'Batman Begins' , genres : 'Adventure' }

18
$unwind stage
• How to to group on year and genres of Movies collection?
db.movies.aggregate( [
{ $unwind : '$genres' },
{ $group : { _id : {year : '$year', genre : '$genres' }, numFilms : { $sum: 1 }, } },
{ $sort : { '_id.genre' : 1, numFilms: -1} }
])

• Finding the top rated genres per year from 1990 to 2015...
db.movies.aggregate( [
{ $match : { 'imdb.rating' : { $gt : 0 }, year: { $gte : 1990, $lte : 2015 }, runtime : { $gte : 90 } } },
{ $unwind : '$genres' },
{ $group : { _id : { year : '$year' , genre : '$genres' }, average_rating : { $avg: '$imdb.rating' } } },
{ $sort : { '_id.year' : -1, average_rating : -1 } }
])

19
$unwind stage
Recap on a few things:
• $unwind only works on an array of values.
• Using unwind on large collections with big documents may lead to performance issues.

20
$lookup stage
Syntax: equality match with a single join condition
{ $lookup : {
from : <collection to join>,
localField : <field from the input documents>,
foreignField : <field from the documents of the 'from' collection>,
as : <output array field>
}
}

• Performs a left outer join to an unsharded collection in the same database to filter in documents from the
'joined' collection for processing.
• To each input document, the $lookup stage adds a new array field whose elements are the matching
documents from the 'joined' collection.
21
$lookup stage
Example:

Collection: air_airlines Collection: air_alliances

db.air_alliances.aggregate( [
{ $lookup : {
from : 'air_airlines',
localField : 'airlines',
foreignField : 'name',
as : 'airlines' }
}
])
22
$lookup stage
Example:

db.air_alliances.aggregate( [
{ $match : { name : 'SkyTeam' } },
{ $lookup : {
from : 'air_airlines',
localField : 'airlines',
foreignField : 'name',
as : 'airlines' }
}
])

23
$lookup stage
Syntax: correlated subqueries using concise syntax (New in version 5.0)
{ $lookup : {
from : <collection to join>,
localField : <field from local collection's documents>,
foreignField : <field from foreign collection's documents>,
let : { <var_1>: <expression>, …, <var_n>: <expression> },
pipeline : [ <pipeline to run> ],
as : <output array field>
}
}

• let: Optional. Specifies the variables to use in the pipeline stages. Use the variable expressions to access the
document fields that are input to the pipeline
• pipeline: determines the resulting documents from the joined collection. To return all documents, specify an
empty pipeline []. The pipeline cannot directly access the document fields. Instead, define variables for the
document fields using the let option and then reference the variables in the pipeline stages

24
$lookup stage
Examples:
db.air_alliances.aggregate( [
{ $lookup : {
from : 'air_airlines',
localField : 'airlines',
foreignField : 'name',
pipeline: [ { $project : { _id : 0, name : 1, country : 1 } } ],
as : 'airlines' } }
])

25
$lookup stage
Examples:

Collection: Restaurants
Collection: Orders

26
$lookup stage
Examples:
db.Orders.aggregate( [
{ $lookup : {
from : 'Restaurants',
localField : 'restaurant_name',
foreignField : 'name',
let : { orders_drink : '$drink' },
pipeline : [ {
$match : { $expr : { $in : [ '$$orders_drink', '$beverages' ] } }
} ],
as : 'matches'
}
}])

(Read more about $expr)


27
$lookup stage
Syntax: perform multiple joins and a correlated subquery with $lookup
{ $lookup : {
from : <joined collection>,
let : { <var_1>: <expression>, …, <var_n>: <expression> },
pipeline : [ <pipeline to run on joined collection> ],
as : <output array field>
}
}

Example: list of warehouses with product quantity greater than or equal to the ordered product quantity.

Collection: item_orders
Collection: warehouses

28
$lookup stage
Example: list of warehouses with product quantity greater than or equal to the ordered product quantity.
db.item_orders.aggregate( [
{ $lookup : {
from : 'warehouses',
let : { order_item : '$item', order_qty : '$ordered' },
pipeline : [
{ $match : { $expr:
{ $and: [
{ $eq: [ '$stock_item', '$$order_item' ] },
{ $gte : [ '$instock', '$$order_qty' ] }
] } } },
{ $project : { stock_item: 0, _id : 0 } }
],
as: 'stockdata' }
}])

29
$redact stage
Syntax: { $redact : <expression> } (Read more about <expression>)
Restricts the contents of the documents based on information stored in the documents themselves.
The argument can be any valid expression as long as it resolves to the $$DESCEND, $$PRUNE, or
$$KEEP system variables.

System Variable Description


$$DESCEND $redact returns the fields at the current document level, excluding embedded
documents
$$PRUNE $redact excludes all fields at this current document/embedded document level,
without further inspection of any of the excluded fields
$$KEEP $redact returns or keeps all fields at this current document/embedded document
level, without further inspection of the fields at this level

30
$redact stage
Examples:
db.employees.aggregate( [ { $match: { employee_ID : '04f28c2a-f288-4194-accc-cfc1b585eee6' } } ] )

level 1

level 2

level 3

31
$redact stage
Examples:
db.employees.aggregate( [ { $redact : { $cond : [ { $in : [ 'Finance’, '$acl' ] }, '$$DESCEND', '$$PRUNE'] } } ] )
db.employees.aggregate( [ { $redact : { $cond : [ { $in : ['Management', '$acl'] }, '$$DESCEND', '$$PRUNE'] } } ] )
db.employees.aggregate( [ { $redact : { $cond : [ { $in : ['HR', '$acl'] }, '$$DESCEND', '$$PRUNE'] } } ] )

32
$out stage
• Takes the documents returned by the aggregation pipeline and writes them to a specified collection
• The $out stage must be the last stage in the pipeline. The $out operator lets the aggregation framework
return result sets of any size
Syntax: { $out : { db : <output-db> , coll : <output-collection> } }
• The $out operation creates a new collection if one does not already exist.
• If the collection specified by the $out operation already exists, the $out stage atomically replaces the
existing collection with the new results collection
Example:
db.movies.aggregate( [
{ $group : { _id : '$year', count : { $sum : 1 } } },
{ $sort : { count : -1 } },
{ $out : { db : 'reporting' , coll : 'movies' } }
])

33
$merge stage
Writes the results of the aggregation pipeline to a specified collection. The $merge operator must be the last
stage in the pipeline
• Can output to a collection in the same or different database.
• Creates a new collection if the output collection does not already exist
• Can incorporate results (insert new documents, merge documents, replace documents, keep existing
documents, process documents with a custom update pipeline) into an existing collection.
Syntax:
{ $merge: {
into : <collection> -or- { db : <db>, coll : <collection> },
on : <identifier field> -or- [ <identifier field1>, ...], // Optional
let : <variables>, // Optional
whenMatched : <replace|keepExisting|merge|fail|pipeline>, // Optional
whenNotMatched : <insert|discard|fail> // Optional
}}
34
$merge stage
Examples:
db.movies.aggregate( [
{ $group : { _id : '$year', count : { $sum : 1 } } },
{ $sort : { count : -1 } },
{ $out : { db : 'reporting', coll : 'movies2' } }
])

db.movies.aggregate( [
{ $group : { _id : '$year', count : { $sum : 1 }, title : { '$first' : '$title' } } },
{ $sort : { count : -1 } },
{ $merge : {
into : { db : 'reporting', coll : 'movies2' },
on : '_id',
whenMatched : 'merge',
whenNotMatched : 'insert' } }
])

35
Views
• A MongoDB view is a queryable object whose contents are defined by an aggregation pipeline on
other collections or views
• MongoDB does not persist the view contents to disk. A view's content is computed on-demand
when a client queries the view
• You can:
✓ Create a view on a collection of employee data to exclude any private or personal information
(PII). Applications can query the view for employee data that does not contain any PII.
✓ Create a view on a collection of collected sensor data to add computed fields and metrics.
Applications can use simple find operations to query the data
✓ …

36
Create View
Syntax: db.createView( <viewName> , <source> , [<pipeline>] , <options> )
Parameter Type Description
<viewName> String The name of the view to create.
<source> String The name of the source collection or view from which to create the view.
You must create views in the same database as the source collection.
<pipeline> Array An array that consists of the aggregation pipeline stage(s). The view
definition pipeline cannot include the $out or the $merge stage
<options> Document Optional. Additional options for the method.

Example: create a view in aggregation DB


db.createView(
'maleEmployees' ,
'employees' ,
[ { $match : { gender : 'male' } }, { $project : { acl : 0, employee_compensation : 0 } } ]
)

Query the view: db.maleEmployees.find()


37
Question?
38

You might also like