0% found this document useful (0 votes)
30 views69 pages

DPA Lecture 6

The document discusses document databases and MongoDB. It covers key concepts like document structure, embedding, linking and many-to-many relationships in MongoDB. Examples are provided to illustrate one-to-one, one-to-many and many-to-many relationships and when to use embedding vs linking.

Uploaded by

shbjalam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
30 views69 pages

DPA Lecture 6

The document discusses document databases and MongoDB. It covers key concepts like document structure, embedding, linking and many-to-many relationships in MongoDB. Examples are provided to illustrate one-to-one, one-to-many and many-to-many relationships and when to use embedding vs linking.

Uploaded by

shbjalam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 69

L7 Data Processing and Analytics

Lecture 6: Document Databases and


MongoDB

Dr Ismail Alarab
ialarab@bournemouth.ac.uk

www.bournemouth.ac.uk
Recap

- Graph Database
- Graph vs Relational
- Neo4J
- Cypher Query
Unit Attendance

Scan Me!
Lecture Outline

➢Key – Value
➢Document Databases
➢MongoDB
Key-Value
Key - Value

● A key-value pair where:


○ Key: usually a string [mapped into a number]
○ Value: data or object that Key is associated with
Value
Key

{ name: "John", Value


Key age: 35
}
Key - Value

● The characteristic feature of a key-value store is that it is “simple but


quick“
● Data is stored in a simple key-value structure and the key-value store
is ignorant of the content of the value part
● Notations
○ (ordered pair)
○ {Key: Value} (JSON notation , Python notation)
○ Key → Value
Document Database
Document Database

▪ Document-Based data model


▪ Document DBs store data in a semi-structured and nested text
format like XML documents or JSON documents
▪ A schema-less model lets you represent data with variable
▪ properties.
▪ Each document stored in a collection
▪ Collections
o Have index set in common
o Like tables of relational db’s.
o Documents do not have to have uniform structure
JSON

Example
▪ JSON: “JavaScript Object Notation” {
"name": "John",
▪ Easy for humans to write/read, easy for "age": 30,
computers to parse/generate "city": "New York"
}
▪ Objects can be nested
▪ Built on
o name/value pairs
o Ordered list of values
Example: JSON Schema and JSON

JSON Schema JSON

{ {
"$schema": "http://json-schema.org/draft-07/schema#", "name": "Alice",
"type": "object", "age": 25,
"properties": { "email": "alice@example.com"
"name": { "type": "string" }, }
"age": { "type": "integer"},
"email": { "type": "string"}
},
"required": ["name", "age", "email"]
}

• Key-value pair are separated by comma


• The property keys should be followed consistently for any collection to ensure data
integrity and consistency
Example On Data Consistency

Assuming the JSON Schema has this properties (from the previous slide):
"properties": {
"name": { "type": "string" },
"age": { "type": "integer"},
"email": { "type": "string"}
},

Consistent Data Inconsistent Data


{ {
"name": "Alice", “Firstname": "Alice",
"age": 25, “age": 25,
"email": "alice@example.com" “EMAIL": "alice@example.com"
} }
Example

Document Example: {
"_id": “1",
"name": "John",
"age": 30,
}
Collection Example: [
{
"_id": “1”,
"name": "John ",
"age": 30,
},
{
"_id": “2",
"name": “Amanda",
"age": 25,
}
]
MongoDB
MongoDB

• MongoDB is a document-based database.


• BSON document data store
• Database and collections are created automatically
• Shifting from relational data model to a new data model is to
replace the concept of a ‘row’ with a flexible model, the ‘document’.
• MongoDB is schema-free, the keys used in documents are not
predefined or fixed.
• While migrating data to MongoDB, any issues of new or missing
keys can be resolved at the application level, rather than changing
the schema in the database.
_id Field

• By default, each document contains an _id field. This field has a


number of special characteristics:

– Value serves as primary key for collection.

– Value is unique, immutable, and may be any non-array type.

– Default data type is ObjectId, which is “small, likely unique,


fast to generate and order.”

– Sorting on an ObjectId value is roughly equivalent to sorting on creation


time.
_id Field Example
Input: db.products.save(
{ _id: 100, item: "water", qty: 30 } )
Specifying “_id”:

Output { "_id" : 100,


"item" : "water",
"qty" : 30
}

Input: db.products.save(
Without Specifying “_id”: { item: "book", qty: 40 } )

Output { "_id" :
ObjectId("50691737d386d8fadbd6b01d
"),
"item" : "book",
"qty" : 40
}
MongoDB vs SQL

MongoDB SQL
Store Unstructured Data (NoSQL) Store Structured Data
Document Tuple/Record
Collection Table/View
PK: _id Field PK: Any Attribute(s)
Uniformity not required Uniform Relation Schema
Index Index
Embedded Structure Joins
Shard Partition

Note: Sharding in MongoDB is horizontal split to store data in smaller subsets across multiple servers
Mongo Is Schema-Free

• The purpose of schema in SQL is for meeting the requirements


of tables and quirky implementation

• Every “row” in a database “table” is a data structure, much like


a “struct” in C, or a “class” in Java. A table is then an array (or
list) of such data structures

• Our design in MongoDB is basically same way how we design a


compound data type binding in JSON
Embedding & Linking
One to One relationship (Example 1)

zip = { zip = {
_id: 35004, {
city: “ACMAR”, _id: 35004,
loc: [-86, 33], city: “ACMAR”,
pop: 6065, loc: [-86, 33],
State: “AL” pop: 6065,
} State: “AL”
},
council_person = {
zip_id = 35004, council_person : {
name: “John Doe", zip_id = 35004,
address: “123 Fake St.”, name: “John Doe",
Phone: 123456 address: “123 Fake St.”,
} Phone: 123456
}
}
Example 2

MongoDB: The Definitive Guide, By Kristina Chodorow and Mike Dirolf


Published: 9/24/2010
Pages: 216
Language: English
Publisher: O’Reilly Media, CA
One to Many Relationship – Embedding

book = {
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher: {
name: "O’Reilly Media",
founded: "1980",
location: "CA"
}
}
One to Many Relationship - Linking

publisher = {
_id: "oreilly",
name: "O’Reilly Media",
founded: "1980",
location: "CA"
}
book = {
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher_id: "oreilly"
}
Linking vs Embedding

• Embedding is a bit like pre-joining data


• Document level operations are easy for the server to handle
• Embed when the “many” objects always appear with
(viewed in the context of) their parents.
• Linking when you need more flexibility
Many to Many Relationship

• Can put relation in either one of the


documents (embedding in one of the
documents)

• Focus how data is accessed and queried


Example

book = { author = {
title: "MongoDB: The Definitive Guide", _id: "kchodorow",
authors : [ name: "Kristina Chodorow“
{ _id: "kchodorow", name: "Kristina Chodorow” }, }
{ _id: "mdirolf", name: "Mike Dirolf” }
],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}

Syntax to find all books of an author name:


db.books.find( { authors.name : "Kristina Chodorow" } )
Example 3

• Book can be checked out by one student at a time


• Student can check out many books
Modelling Checkouts

student = {
_id: "joe"
name: "Joe Bookreader",
join_date: ISODate("2011-10-15"),
address: { ... }
}

book = {
_id: "123456789"
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ],
...
}
Modelling Checkouts

student = {
_id: "joe"
name: "Joe Bookreader",
join_date: ISODate("2011-10-15"),
address: { ... },
checked_out: [
{ _id: "123456789", checked_out: "2012-10-15" },
{ _id: "987654321", checked_out: "2012-09-12" },
...
]
}
CRUD: Create Read Update Delete
CRUD

To insert documents into a collection/make a new


collection:
CRUD: Inserting Data

Insert one document

Inserting a document with a field name new to the collection is


inherently supported by the BSON model.
Also, inserting one vs many documents:
db.<collection>.insertOne(<document>)
db.<collection>.insertMany([<document1>, <document2>, …])
Practical Example With MongoDB

Reader ReaderID Name


2468 Nasr
2512 Aboubakr

Book BookID Title


1002 Introduction to DBS BookLending BookID ReaderID ReturnDate

1004 Patterns of enterprise 1002 2468 25-10-2016


application architecture 1006 2468 27-10-2016
1006 Con Quixote 1004 2512 31-10-2016
Example on Create/Insert: MongoDB Inserting Data
db.BookLending.insert({ Book: "Introduction to DBS", Reader: "Nasr", ReturnDate: "25-10-
2016" })

db.BookLending.insert({ Book: "Don Quixote", Reader: "Nasr", ReturnDate: "27-10-2016" })

db.BookLending.insert({ Book: "Patterns of enterprise application architecture", Reader:


"Aboubakr", ReturnDate: "31-10-2016" })

Another way of inserting the same data using lists:

db.BookLending.insertMany([
{Book:"Introduction to DBS", Reader: "Nasr", ReturnDate:"25-10-2016"},
{Book:"Don Quixote", Reader: "Nasr”, ReturnDate:”27-10-2016”},
{Book:"Patterns of enterprise application architecture", Reader: "Aboubakr", ReturnDate:"31-10-
2016"}
])
CRUD: Querying
CRUD: Querying With “AND” Condition

. Example: (AND)
db.BookLending.find(
AND Clause in MongoDB
{Book: "Introduction to DBS",
Reader: "Nasr"
})
CRUD: Querying With “OR” Condition

Example: (OR)
. OR Clause in MongoDB db.BookLending.find({
$or: [
{ Book: "Introduction to DBS"
},
{ Reader: "Nasr" }
]
})

Example: (Multiple fields)


db.BookLending.find({ReturnDate: { $in: ["25-10-2016", "27-10-2016"] }})
CRUD: Querying With Include/Exclude Results
Example: (Exclude)
. db.BookLending.find(
{ Reader: "Nasr" }, // Match
{ Reader: 0 } // 0 to Exclude
)

Example: (Include/Exclude)
db.BookLending.find(
{ Reader: "Nasr" },
{ Book: 1, ReturnDate: 1, _id: 0
}
)
Example: More about Querying

.
Example: Querying

Example 1: (returning all books)


db.BookLending.find()

Example 2: (finding a book lending by the reader name)


db.BookLending.find({ Reader: "Nasr" })

Example 3: (finding book lending before a date


db.BookLending.find({ ReturnDate: { $lt: "27-10-2016" } })

Example 4: (find book lending by title with a specific word)


db.BookLending.find({ Book: /DBS/ })
CRUD: Updating

.
CRUD: Updating

.
Example: Updating ($set and $unset)

db.BookLending.updateOne(
{ Book: "Introduction to DBS", Reader: "Nasr" }, //Match first
{ $set: { ReturnDate: "30-10-2016" } } // Update the first one matching the criteria
)

db.BookLending.updateMany(
{ Reader: "Nasr" }, // Match first
{ $set: { ReturnDate: "30-10-2016" } } // Update all documents who match the criteria
)

db.BookLending.updateOne(
{ Book: "Introduction to DBS", Reader: "Nasr" },
{ $unset: { ReturnDate: "" } } // Remove a property)
CRUD: Removal

Also, there is deleteOne and deleteMany


Example: Removal

db.BookLending.deleteMany({ Reader: "Nasr" }) //Delete all who match the criteria

db.BookLending.deleteOne({ Reader: "Nasr" }) //Delete first document who match the criteria

db.BookLending.remove({ Reader: "Nasr" }) //Delete all who match the criteria


Using the Shell

To check which db you’re using db


Show all databases show dbs
Switch db’s/make a new one use <name>
See what collections exist show collections
MongoDB Aggregation Framework

● For usual aggregation operations (sum, avg, etc.)


● A pipeline of operations
● It is possible to:
○ Group by categories
○ Impose conditions
○ Select fields for output
● Somewhat different syntax than usual CRUD queries
Aggregation Matching

● $match operator.
db.orders.aggregate([{ $match: { status: "A" }}])
● Selecting all documents where status attribute is “A”
● No aggregation yet
● Note that this is equivalent to using find operator
Aggregation Matching (Example)
db.orders.insertMany([
{ _id: 1, status: "A", amount: 50 },
{ _id: 2, status: "B", amount: 75 },
{ _id: 3, status: "A", amount: 100 },
{ _id: 4, status: "A", amount: 150 },
{ _id: 5, status: "B", amount: 200 }
])
Input code: Output:
db.orders.aggregate([ [
{ { _id: 1, status: 'A', amount: 50 },
$match: { status: "A" } { _id: 3, status: 'A', amount: 100 },
} { _id: 4, status: 'A', amount: 150 }
]) ]
Aggregation Grouping With Sum

● $group operator:
db.orders.aggregate([
{
$group: {
_id: "$status", // Group by the status field
total: { $sum: "$amount" } // Calculate the total amount for each
status group
}
}
])
● Calculating sum of amount attribute per status, showing it
as “total”.
● Specifying the attribute to group in _id. Possible to have
combinations of grouping fields.
Aggregation Grouping With Sum (Example)

db.orders.insertMany([
{ _id: 1, status: "A", amount: 50 },
{ _id: 2, status: "B", amount: 75 },
{ _id: 3, status: "A", amount: 100 },
{ _id: 4, status: "A", amount: 150 },
{ _id: 5, status: "B", amount: 200 }
])
Input code: Output:
db.orders.aggregate([{ [
$group: { { _id: 'A', total: 300 },
_id: "$status", { _id: 'B', total: 275 }
total: ]
{ $sum: "$amount" } }}])
Aggregation Grouping With Max

● Different aggregation operators, such as $max, $min, $avg, etc.


db.orders.aggregate([
{
$group: {
_id: “$status”, // Group all documents into one group using null
maximum: { $max: "$amount" }
}
}
])
● Finding the maximum among amount attribute per status, showing it as
“maximum”.
Aggregation Grouping With Max (Example)
db.orders.insertMany([
{ _id: 1, status: "A", amount: 50 },
{ _id: 2, status: "B", amount: 75 },
{ _id: 3, status: "A", amount: 100 },
{ _id: 4, status: "A", amount: 150 },
{ _id: 5, status: "B", amount: 200 }
])
Input code: Output:
db.orders.aggregate([{ [
$group: { _id: “$status”, { _id: 'A', maximum: 150 },
maximum:{ $max: "$amount" { _id: 'B', maximum: 200 }
} ]
}}])
Aggregation Grouping

● What about no grouping?


● E.g. finding the maximum amount in general
● Use _id:1
Input code: Output:
db.orders.aggregate([ [
{ $group: { _id: 1, maximum: 200 }
{ _id:1, ]
maximum: { $max: "$amount" }
}
}
])
Aggregation Grouping With Counts

● Counts(No of Occurrence) can be calculated by specifying


$sum:1

Find out how many times each status appears?

db.orders.aggregate([
{ $group: { _id: "$status", total: {
$sum:1} } }
])
Aggregation Grouping With Counts (Example)

db.orders.insertMany([
{ _id: 1, status: "A", amount: 50 },
{ _id: 2, status: "B", amount: 75 },
{ _id: 3, status: "A", amount: 100 },
{ _id: 4, status: "A", amount: 150 },
{ _id: 5, status: "B", amount: 200 }
])
Input code: Output:
db.orders.aggregate([{ [
$group: { _id: 'B', total: 2 },
{ _id: "$status", { _id: 'A', total: 3 }
total: {$sum:1} } } ]
])
Aggregation Projection

● Selecting fields to view in the result


● Similar to find functionality.
Assuming the following documents:
db.students.insertMany([
{ "_id": 1, "name": "Alice", "age": 20, "gender": "female", "grade": "A" },
{ "_id": 2, "name": "Bob", "age": 22, "gender": "male", "grade": “A" },
{ "_id": 3, "name": "Charlie", "age": 21, "gender": "male", "grade": “B" }
])

Input code: Output:


db.students.aggregate([ [
{ { name: 'Alice', age: 20 },
$project: { { name: 'Bob', age: 22 },
name: 1, { name: 'Charlie', age: 21 }
age: 1, ]
_id: 0} }])
Aggregation Pipeline: By Example

Assume we have some sample data as:

db.orders.insertMany([
{ "_id": 1, "product": "Phone", "type": "iOS", "quantity": 2, "price": 500 },
{ "_id": 2, "product": "Phone", "type": "iOS", "quantity": 3, "price": 600 },
{ "_id": 3, "product": "Laptop", "type": "Android", "quantity": 1, "price": 1200 },
{ "_id": 4, "product": "Laptop", "type": "Android", "quantity": 2, "price": 1000 },
{ "_id": 5, "product": "Tablet", "type": "iOS", "quantity": 3, "price": 300 }
]);
Exercise:
We are required to write an aggregation pipeline to match the product with
more than 1 quantity, group the orders by type (IOS/Android) and find the
totalRevenue of each group, project the fields “type” and “totalRevenue”
Aggregation Pipeline: By Example

Match (filter) orders with more than 1 quantity

Step 1:
db.orders.aggregate([
{
$match: { quantity: { $gt: 1 }
},
// To be continued
Aggregation Pipeline: By Example

Group by type with totalRevenue

Step 2:

{
$group: {
_id: "$type", // Group by type (IOS/Android)
totalRevenue: { $sum: { $multiply: ["$quantity", "$price"] } }
}
},
Aggregation Pipeline: By Example

Project the relevant fields

Step 3:

{
$project: {
type: "$_id", // Rename _id to type
totalRevenue: 1, // Include the totalRevenue field
_id : 0 // Exclude _id
}
Aggregation Pipeline: By Example

{ "_id": 1, "product": "Phone", "type": "iOS", "quantity": 2, "price": 500 },


{ "_id": 2, "product": "Phone", "type": "iOS", "quantity": 3, "price": 600 },
{ "_id": 3, "product": "Laptop", "type": "Android", "quantity": 1, "price": 1200 },
{ "_id": 4, "product": "Laptop", "type": "Android", "quantity": 2, "price": 1000 },
{ "_id": 5, "product": "Tablet", "type": "iOS", "quantity": 3, "price": 300 }

Input Code: Output


db.orders.aggregate([ { totalRevenue: 3700, type: 'iOS' },
{ $match: { quantity: { $gt: 1 } } { totalRevenue: 2000, type: 'Android' } ]
},
{ $group: { _id: "$type",
totalRevenue: { $sum: { $multiply: ["$quantity", "$price"] } }}
},
{ $project: {
type: "$_id",
totalRevenue: 1 ,
_id: 0}
}]);
Formatting

• You can use .pretty() in the end of your query for


better formatted results.

Example:

db.order.find({}).pretty()

This query will retrieve all documents from the orders collection
and display them in a more structured output.
JOINS

• Joins only when absolutely necessary


• Possible in aggregation framework using $lookup operator
• BUT! Keep in mind that you should use fully self-contained
documents as much as possible
• Best practices to reduce $lookup:
https://www.mongodb.com/docs/atlas/schema-
suggestions/reduce-lookup-operations/
Best Practices in Document DBs

Some tips for improving schema:


• Avoid Unbounded Arrays
• Remove unnecessary Indexes
• Reduce the size of large documents
• Reduce Collections
https://www.mongodb.com/docs/atlas/performance-
advisor/schema-suggestions/
Summary

● Key-value structure is humanly intuitive and there are currently


many good implementations for it.

● JSON one of the more popular formats for key-value stores

● Document database, BSON, _id field

● MongoDB: Document based data model, example queries

● MongoDB: more feature packed; for a larger class of applications


○ Simple querying with find operator
○ More complex operations with the aggregation pipeline
Reading

● Wiese, Chapter 6
● Mongodb manual https://docs.mongodb.com/manual

Homework:
● Configure MongoDB on your machine
○ Either the cloud version (follow the tutorial on Brightspace)
○ Or locally
(https://docs.mongodb.com/manual/installation/#mongodb-communityedition-
installation-tutorials)
QUESTIONS?

69

You might also like