MongoDB Schema Design Basics
MongoDB Schema Design Basics
high-‐performance,
document-‐oriented
database
Schema Design
Basics
Alvin Richards
alvin@10gen.com
This talk
Part One
Part Two
‣ Single Table
Inheritance
‣ Getting a flavor
‣ Many – Many
‣ Creating a Schema
‣ Trees
‣ Indexes
‣ Lists / Queues /
‣ Evolving the
Stacks
Schema
So why model data?
A brief history of normalization
• 1970 E.F.Codd introduces 1st Normal Form (1NF)
• 1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF)
• 1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF)
• 2002 Date, Darween, Lorentzos define 6th Normal Form (6NF)
Goals:
• Avoid anomalies when inserting, updating or deleting
• Minimize redesign when extending the schema
• Make the model informative to users
• Avoid bias towards a particular style of query
* source : wikipedia
Relational made normalized
data look like this
Document databases make
normalized data look like this
Some terms before we proceed
RDBMS
Document DBs
Table
Collection
Row(s)
JSON Document
Index
Index
Join
Embedding & Linking across
documents
Partition
Shard
Partition Key
Shard Key
DB Considerations
How can we manipulate Access Patterns ?
this data ?
• Read / Write Ratio
• Dynamic Queries
• Types of updates
• Secondary Indexes
• Types of queries
• Atomic Updates
• Data life-cycle
• Map Reduce
Considerations
• No Joins
• Document writes are atomic
Design Session
>db.post.save(post)
Find the document
>db.posts.find()
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "kyle",
date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)",
text : "My first blog",
tags : [ "mongodb", "intro" ] }
Notes:
• ID must be unique, but can be anything you’d like
• MongoDB will generate a default ID if one is not
supplied
Add and index, find via Index
Secondary index for “author”
>db.posts.ensureIndex({author: 1})
>db.posts.find({author: 'kyle'})
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "kyle",
... }
Verifying indexes exist
>db.system.indexes.find()
// Index on ID
{ name : "_id_",
ns : "test.posts",
key : { "_id" : 1 } }
// Index on author
{ _id : ObjectId("4c4ba6c5672c685e5e8aabf4"),
ns : "test.posts",
key : { "author" : 1 },
name : "author_1" }
Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
$lt, $lte, $gt, $gte, $ne,
Regular expressions:
// posts where author starts with k
>db.posts.find({author: /^k*/i })
Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
$lt, $lte, $gt, $gte, $ne,
Regular expressions:
// posts where author starts with k
>db.posts.find({author: /^k*/i })
Counting:
// posts written by mike
>db.posts.find({author:
“mike”}).count()
Extending the Schema
>db.posts.find({comments.author:”kyle”})
Extending the Schema
// create index on nested documents:
>db.posts.ensureIndex({"comments.author": 1})
>db.posts.find({comments.author:”kyle”})
>db.posts.find({comments.author:”kyle”})
>db[res.result].find()
{ _id : "intro", value : { count : 1 } }
{ _id : "mongodb", value : { count : 1 } }
Review
So Far:
- Started out with a simple schema
- Queried Data
- Evolved the schema
- Queried / Updated the data some more
Wordnik
9B records, 100M queries / week, 1.2TB
{
entry : {
header: { id: 0,
headword: "m",
sourceDictionary: "GCide",
textProns : [
{text: "(em)",
seq:0}
],
syllables: [
{id: 0,
text: "m"}
],
sourceDictionary: "1913 Webster",
headWord: "m",
id: 1,
definitions: : [
{text: "M, the thirteenth letter..."},
{text: "As a numeral, M stands for 1000"}]
}
}
}
Review
So Far:
- Started out with a simple schema
- Queried Data
- Evolved the schema
- Queried / Updated the data some more
Observations:
- Using Rich Documents works well
- Simplify relations by embedding them
- Iterative development is easy with MongoDB
Single Table Inheritance
>db.shapes.find()
{ _id: ObjectId("..."), type: "circle", area: 3.14, radius: 1}
{ _id: ObjectId("..."), type: "square", area: 4, d: 2}
{ _id: ObjectId("..."), type: "rect", area: 10, length: 5, width: 2}
// create index
>db.shapes.ensureIndex({radius: 1})
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- hard to find latest comments across all documents
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- hard to find latest comments across all documents
- Embedded tree
- Single document
- Natural
- Hard to query
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- hard to find latest comments across all documents
- Embedded tree
- Single document
- Natural
- Hard to query
- Normalized (2 collections)
- most flexible
- more queries
Many - Many
Example:
Products
Category
- product_id
- category_id
Prod_Categories
- id
- product_id
- category_id
Many - Many
products:
{ _id: ObjectId("4c4ca23933fb5941681b912e"),
name: "Sumatra Dark Roast",
category_ids: [ ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
Many - Many
products:
{ _id: ObjectId("4c4ca23933fb5941681b912e"),
name: "Sumatra Dark Roast",
category_ids: [ ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
categories:
{ _id: ObjectId("4c4ca25433fb5941681b912f"),
name: "Indonesia",
product_ids: [ ObjectId("4c4ca23933fb5941681b912e"),
ObjectId("4c4ca30433fb5941681b9130"),
ObjectId("4c4ca30433fb5941681b913a"]}
Many - Many
products:
{ _id: ObjectId("4c4ca23933fb5941681b912e"),
name: "Sumatra Dark Roast",
category_ids: [ ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
categories:
{ _id: ObjectId("4c4ca25433fb5941681b912f"),
name: "Indonesia",
product_ids: [ ObjectId("4c4ca23933fb5941681b912e"),
ObjectId("4c4ca30433fb5941681b9130"),
ObjectId("4c4ca30433fb5941681b913a"]}
categories:
{ _id: ObjectId("4c4ca25433fb5941681b912f"),
name: "Indonesia",
product_ids: [ ObjectId("4c4ca23933fb5941681b912e"),
ObjectId("4c4ca30433fb5941681b9130"),
ObjectId("4c4ca30433fb5941681b913a"]}
categories:
{ _id: ObjectId("4c4ca25433fb5941681b912f"),
name: "Indonesia"}
Alternative
products:
{ _id: ObjectId("4c4ca23933fb5941681b912e"),
name: "Sumatra Dark Roast",
category_ids: [ ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
categories:
{ _id: ObjectId("4c4ca25433fb5941681b912f"),
name: "Indonesia"}
categories:
{ _id: ObjectId("4c4ca25433fb5941681b912f"),
name: "Indonesia"}
{ comments: [
{ author: “rpb”, text: “...”,
replies: [
{author: “Fred”, text: “...”,
replies: []}
]}
]}
Child Links
- Each node contains the id’s of the children
- Can support graphs (multiple parents / child)
Array of Ancestors
- Store Ancestors of a node
{ _id: "a" }
{ _id: "b", ancestors: [ "a" ], parent: "a" }
{ _id: "c", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "d", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "e", ancestors: [ "a" ], parent: "a" }
{ _id: "f", ancestors: [ "a", "e" ], parent: "e" }
{ _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }
Array of Ancestors
- Store Ancestors of a node
{ _id: "a" }
{ _id: "b", ancestors: [ "a" ], parent: "a" }
{ _id: "c", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "d", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "e", ancestors: [ "a" ], parent: "a" }
{ _id: "f", ancestors: [ "a", "e" ], parent: "e" }
{ _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }
job = db.jobs.findAndModify({
query: {inprogress: false},
sort: {priority: -1),
update: {$set: {inprogress: true,
started: new Date()}},
new: true})
Cool Stuff
- Aggregation
- Capped collections
- GridFS
- Geo
Learn More
• Kyle’s presentation + video:
http://www.slideshare.net/kbanker/mongodb-schema-design
http://www.blip.tv/file/3704083
• Dwight’s presentation
http://www.slideshare.net/mongosf/schema-design-with-mongodb-dwight-
merriman
• Documentation
Trees: http://www.mongodb.org/display/DOCS/Trees+in+MongoDB
Queues: http://www.mongodb.org/display/DOCS/findandmodify+Command
Aggregration: http://www.mongodb.org/display/DOCS/Aggregation
Capped Col. : http://www.mongodb.org/display/DOCS/Capped+Collections
Geo: http://www.mongodb.org/display/DOCS/Geospatial+Indexing
GridFS: http://www.mongodb.org/display/DOCS/GridFS+Specification
Thank You :-)
Download MongoDB
- Think URL
- YDSMV: your driver support may vary
Sample Schema:
nr = {note_refs: [{"$ref" : "notes", "$id" : 5}, ... ]}
Dereferencing:
nr.forEach(function(r) {
printjson(db[r.$ref].findOne({_id: r.$id}));
}
BSON
Mongodb stores data in BSON internally
Typed
boolean, integer, float, date, string, binary, array...