MongoDB
Overview
MongoDB ("humongous") (= huge; enormous.)
- Scalable, high-performance, open source NoSQL database
5 types of NoSQL databases
- Key/value stores
- Column stores
- Graph
- Multi-model database
- Document store
Languages
The languages officially supported by MongoDB.
- C/C++/C#
- Go
- Erlang
- Java
- JavaScript
- Node.js
- Perl
- PHP
- Python
- Ruby
- Scala
- ...
There are many more languages supported by the Open Source community.
Documentation
- Excellent documentation can be found on the MongoDB site.
Features
- Horizontally Scalable Architectures: Sharding and Replication.
- Journaling.
- Map-reduce.
- Full text index.
- "Stored procedures" simple JavaScript.
Limitations
- Giving up on joins and complex transactions.
Performance
- Native binary protocol between client and server.
- Insert does not (necessarily) wait for the write to the disk.
Auto-sharding
The mongos
process knows which shard holds which part of the data.
Sharding
- mongos = mongo Sharding router process (in front of the mongod daemons)
- Config Server = Where data is located (the address book of the shard)
- We shard specific collections based on an existing unique index on the collection.
- Scatter-gather query.
Picking a Shard Key
- Cardinality
- Write Distribution
- Query Isolation
- Hashed shard key
Replica set - high availability and data duability
- They are called primary and secondaries.
- Provides automatic fail-over.
- Primary
- (Operation log, Op-log used to replicate the data to the secondaries)
If the primary goes off-line, the remaining secondary nodes elect a new primary. When the old primary comes back online it will become another secondary.
Write concern
- 0 - (Unacknoledged)
- 1 - (Acknowledged) by the primary when it has it in its own memory.
- N - Acknowledged by N servers.
- "majority" - the majority of the replicas have the information.
Storage Engines
- MMAPV1 (legacy)
- In-memory
- Encrypted
- WiredTiger
MongoDB Data format
Data is kept in BSON format (Binary JSON) which is JSON with some extensions.
- JSON = JavaScript Object
- PHP array
- Python Dictionary
- Perl Hash
- Ruby Hash
Driver Responsibility
- Authentication and Security
- Python, Perl, etc. conversion to BSON
- Error handling and recovery
- Wire Protocol
- Topology Management
- Connection Pool
First steps
SQL to MongoDB terms
SQL MongoDB
-------- --------
database database
table collection
row BSON document
index index
column not really
join Lookup
embedding
Foreign key Reference
primary key _id field
group by aggregation
Multi-table transaction Single document transaction
Embedded documents
A "document" usually refers to a simple, one-level hash. If one of the values is itself a hash we can say "embeded document". Though in other languages we might just talk about a large, multi-level data-structure, without this whole idea that they are somehow embedded in each other.
GUI client for MongoDB
- GUIs
- Campus
Atlas hosted environment
Command line tools
-
mongod - the server process
-
mongo - the MongoDB shell
-
mongodump - Dump the data into BSON files (creating the dump/ subdirectory).
-
mongoexport - Export MongoDB data to CSV, TSV or JSON files.
-
mongofiles -
-
mongoimport - Import CSV, TSV or JSON data into MongoDB.
-
mongooplog -
-
mongoperf -
-
mongorestore - Import BSON files.
-
mongos -
-
mongosniff -
-
mongostat - A bit like top, showing live performance stats.
-
mongotop - Another top-like tool.
MongoDB on the command line
$ mongo
> help
db.help() help on db methods
db.mycoll.help() help on collection methods
sh.help() sharding helpers
rs.help() replica set helpers
help admin administrative help
help connect connecting to a db help
help keys key shortcuts
help misc misc things to know
help mr mapreduce
show dbs show database names
show collections show collections in current database
show users show users in current database
show profile show most recent system.profile entries
with time >= 1ms
show logs show the accessible logger names
show log [name] prints out the last segment of log in memory,
'global' is default
use db_name set current database
db.foo.find() list objects in collection foo
db.foo.find( { a : 1 } ) list objects in foo where a == 1
it result of the last line evaluated;
use to further iterate
DBQuery.shellBatchSize = x set default number of items to display on shell
exit quit the mongo shell
> exit
bye
$
insert
- insert
- find
> show dbs
...
> use try
switched to db try
> show collections
> db.users.insert({ name: 'Foo' })
> show collections
system.indexes
users
> db.users.find()
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }
insert more
> db.users.insert({ name: 'Bar' })
> db.users.insert({ name: 'Moo' })
> db.users.find()
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }
{ "_id" : ObjectId("526b9048a444e81f1ca06ba3"), "name" : "Bar" }
{ "_id" : ObjectId("526ba048a444e81f1ca06ba4"), "name" : "Moo" }
Pretty printing in the MongoDB shell
db.collection.find().limit(1).pretty()
Or add DBQuery.prototype._prettyShell = true to ~/.mongorc.js
Even prettier shell.
find
- find
Searching for documents
> db.users.find({ _id: ObjectId("526b9048a444e81f1ca06ba3") })
{ "_id" : ObjectId("526b9048a444e81f1ca06ba3"), "name" : "Bar" }
> db.users.find({ name: 'Foo' })
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }
> db.users.find({ name: /o/ })
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }
{ "_id" : ObjectId("526ba048a444e81f1ca06ba4"), "name" : "Moo" }
update a document
- update
> db.users.update({ name : 'Bar' }, { name: 'Zorg'})
> db.users.find()
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }
{ "_id" : ObjectId("526b9048a444e81f1ca06ba3"), "name" : "Zorg" }
{ "_id" : ObjectId("526ba048a444e81f1ca06ba4"), "name" : "Moo" }
This will replace the entire docuent keeping only the _id.
Update modifiers for fields
- $set
- $inc
- $rename
- $setOnInsert
- $unset
update - add values using $set
- $se}
> db.users.update({ name : 'Zorg' }, { $set : { child: 'Jane' } })
> db.users.find()
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }
{
"_id" : ObjectId("526b9048a444e81f1ca06ba3"),
"name" : "Zorg",
"child" : "Jane"
}
{ "_id" : ObjectId("526ba048a444e81f1ca06ba4"), "name" : "Moo" }
update - add array using $set
- $set
> db.users.update({ name : 'Zorg' }, { $set : { kids: [ { name: 'Jane' } ] } })
> db.users.find()
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }
{
"_id" : ObjectId("526b9048a444e81f1ca06ba3"), "name" : "Zorg",
"child" : "Jane",
"kids" : [
{
"name" : "Jane"
}
]
}
{ "_id" : ObjectId("526ba048a444e81f1ca06ba4"), "name" : "Moo" }
remove value from document
- $unset
> db.users.update({ name : 'Zorg' }, { $unset : { child : true } } )
> db.users.find()
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }
{
"_id" : ObjectId("526b9048a444e81f1ca06ba3"),
"name" : "Zorg",
"kids" : [
{
"name" : "Jane"
}
]
}
{ "_id" : ObjectId("526ba048a444e81f1ca06ba4"), "name" : "Moo" }
transaction in a document
> db.users.update({ name : 'Zorg' }, { $set : { kids: [ { name: 'Joe' } ] },
$unset : { kid : '' } } )
> db.users.find()
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }
{ "_id" : ObjectId("526b9048a444e81f1ca06ba3"), "name" : "Zorg",
"kids" : [ { "name" : "Joe" } ] }
{ "_id" : ObjectId("526ba048a444e81f1ca06ba4"), "name" : "Moo" }
push
- $push
> db.users.update({ name : 'Zorg' }, { $push : { kids: { name: 'Marry' } } })
> db.users.find()
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }
{ "_id" : ObjectId("526b9048a444e81f1ca06ba3"), "name" : "Zorg",
"kids" : [ { "name" : "Joe" }, { "name" : "Marry" } ] }
{ "_id" : ObjectId("526ba048a444e81f1ca06ba4"), "name" : "Moo" }
Update modifiers for arrays
- $push
- $pop
- $pull
- $pushAll
- $pullAll
- $addToSet
- $ - The single $ sign used as a placeholder
Update array element
> db.users.update({ name: "Zorg", "kids.name" : 'Marry' },
{ $set : { "kids.$.name" : "Mary" } })
> db.users.find()
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }
{ "_id" : ObjectId("526b9048a444e81f1ca06ba3"), "name" : "Zorg",
"kids" : [ { "name" : "Joe" }, { "name" : "Mary" } ] }
{ "_id" : ObjectId("526ba048a444e81f1ca06ba4"), "name" : "Moo" }
Remove array element
- $pull
> db.users.update({ name: "Zorg" },
{ $pull : { kids : { name : "Joe" } } })
> db.users.find()
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }
{ "_id" : ObjectId("526b9048a444e81f1ca06ba3"), "name" : "Zorg",
"kids" : [ { "name" : "Mary" } ] }
{ "_id" : ObjectId("526ba048a444e81f1ca06ba4"), "name" : "Moo" }
_id is unique in a collection
_id
> db.a.insert({ _id : 'a', value : 1 })
Inserted 1 record(s) in 20ms
> db.a.find()
{
"_id": "a",
"value": 1
}
Fetched 1 record(s) in 1ms
> db.a.insert({ _id : 'a', value : 2 })
E11000 duplicate key error index: test.a.$_id_ dup key: { : "a" }
> db.b.insert({ _id : 'a', value : 2 })
Inserted 1 record(s) in 763ms
> db.a.find()
{
"_id": "a",
"value": 1
}
Fetched 1 record(s) in 1ms
> db.b.find()
{
"_id": "a",
"value": 2
}
Fetched 1 record(s) in 1ms
update and save
-
update
-
save
-
save() replaces a single document (or creates a new one)
-
update() updates a single document (or multiple documents)
save()
- save
> db.config.save({ from : "mongodb@edumaven.com" })
> db.config.find()
{ "_id" : ObjectId("5305c8b37ec4c674d24813a8"), "from" : "mongodb@edumaven.com" }
> db.config.save({ _id : 'from', value : 'mongodb@edumaven.com' })
> db.config.find()
{ "_id" : ObjectId("5305c8b37ec4c674d24813a8"), "from" : "mongodb@edumaven.com" }
{ "_id" : "from", "value" : "mongodb@edumaven.com" }
> db.config.save({ _id : 'from', value : 'foo@bar.com' })
> db.config.find()
{ "_id" : ObjectId("5305c8b37ec4c674d24813a8"), "from" : "mongodb@edumaven.com" }
{ "_id" : "from", "value" : "foo@bar.com" }
Remove (delete) document
db.a.remove({ "_id" : ObjectId("52ef998222e9d7ee82000000") })
Count documents
- db.collection.count()
- db.collection.find().count()
Find deeply
> db.people.insert({ 'name' : 'foo', child : { name : 'bar' } })
Inserted 1 record(s) in 11ms
> db.people.find()
{
"_id": ObjectId("53077243ad9a4f5b93fadc64"),
"name": "foo",
"child": {
"name": "bar"
}
}
Fetched 1 record(s) in 1ms
> db.people.find({ name : 'foo' })
{
"_id": ObjectId("53077243ad9a4f5b93fadc64"),
"name": "foo",
"child": {
"name": "bar"
}
}
Fetched 1 record(s) in 1ms
> db.people.find({ name : 'bar' })
Fetched 0 record(s) in 1ms
> db.people.find({ "child.name" : 'bar' })
{
"_id": ObjectId("53077243ad9a4f5b93fadc64"),
"name": "foo",
"child": {
"name": "bar"
}
}
Fetched 1 record(s) in 1ms
sort()
find() returns a cursor instance - lazy retreival!
push
- $push
db.emails.update({ _id: msg._id },
{$push: {tags : "mongodb"}})
Update modifiers
- $each
- $slice
- $sort
Conditional Operators
-
$in
-
$nin
-
$mod
-
$all
-
$size
-
$exists
-
$type
-
$ne
-
$lt
-
$lte
-
$gt
-
$gte
Indexes
Indexes are per collection
-
B-tree indexes
-
_id is indexed (and unique)
-
Single Field
-
Compund-key indexes
-
Multi-key indexes (indexing each element of an array)
-
Geospatial Index
-
Text Indexes (beta)
If there is an array of values (eg. tags) then the specific document will be indexed by each one of the values in that array. "Fake full-text indexing."
Creat Indexes
db.users.createIndex( {"field" : 1 });
db.users.dropIndex({ "field" : 1 });
db.users.getIndexes
Unique Index
Make sure that two documents cannot contain the same value in the given field.
db.collection.ensureIndex({ field : 1 })
Sparse index
When only some of the documents have a certain field it might be useful to set the index to be sparse. That way only docuemnts that actually have that field will be in the index.
> db.collection.ensureIndex({ field : 1 }, {sparse : true})
Index embedded document (or subkey)
> db.messages.ensureIndex({ "From.address" : 1 })
> db.messages.find({ "From.address" : "foo@bar.com" })
Insert complex data structure
db.users.insert({
"name" : "Foo",
"email" : "foo@company.com",
"technologies" : [
"Perl",
"Python",
"JavaScript",
"SQL",
"NoSQL",
"MongoDB"
],
"a" : {
"b" : {
"c" : 42
}
}
})
Display complex data structure
db.users.find({ name: "Foo"})
{
"_id": ObjectId("59895a5d98392edc6c7074ca"),
"name": "Foo",
"email": "foo@company.com",
"technologies": [
"Perl",
"Python",
"JavaScript",
"SQL",
"NoSQL",
"MongoDB"
],
"a": {
"b": {
"c": 42
}
}
}
Fetched 1 record(s) in 3ms
Rename embedded document
db.users.update({ name: "Foo"}, { $rename: { 'a.b.c': 'a.b.d' } })
db.users.find({ name: "Foo"})
{
"_id": ObjectId("59895a5d98392edc6c7074ca"),
"name": "Foo",
"email": "foo@company.com",
"technologies": [
"Perl",
"Python",
"JavaScript",
"SQL",
"NoSQL",
"MongoDB"
],
"a": {
"b": {
"d": 42
}
}
}
Fetched 1 record(s) in 2ms
Remove ($unset) embedded document
db.users.update({ name: "Foo"}, { $unset: { 'a.b.d' : true } })
Updated 1 existing record(s) in 2ms
WriteResult({
"nMatched": 1,
"nUpserted": 0,
"nModified": 1
})
db.users.find({ name: "Foo"})
{
"_id": ObjectId("59895a5d98392edc6c7074ca"),
"name": "Foo",
"email": "foo@company.com",
"technologies": [
"Perl",
"Python",
"JavaScript",
"SQL",
"NoSQL",
"MongoDB"
],
"a": {
"b": {
}
}
}
Fetched 1 record(s) in 2ms
Change element of array
db.users.update({ name: "Foo"}, { $set: { 'technologies.2': 'JS' } })
Updated 1 existing record(s) in 37ms
WriteResult({
"nMatched": 1,
"nUpserted": 0,
"nModified": 1
})
db.users.find({ name: "Foo"})
{
"_id": ObjectId("59895a5d98392edc6c7074ca"),
"name": "Foo",
"email": "foo@company.com",
"technologies": [
"Perl",
"Python",
"JS",
"SQL",
"NoSQL",
"MongoDB"
],
"a": {
"b": {
}
}
}
Fetched 1 record(s) in 11ms
Add embedded document
db.users.update({ name: "Foo"}, { $set: { 'a.b.x': 19 } })
Updated 1 existing record(s) in 7ms
WriteResult({
"nMatched": 1,
"nUpserted": 0,
"nModified": 1
})
> db.users.find({ name: "Foo"})
{
"_id": ObjectId("59895a5d98392edc6c7074ca"),
"name": "Foo",
"email": "foo@company.com",
"technologies": [
"Perl",
"Python",
"JS",
"SQL",
"NoSQL",
"MongoDB"
],
"a": {
"b": {
"x": 19
}
}
}
Fetched 1 record(s) in 3ms
Remove elements of an array by value
db.users.update({ name: "Foo"}, { $pull: { "technologies": "yy" } })
Remove elements of an array by index (trick)
db.users.update({ name: "Foo"}, { $set: { "technologies.2": null } })
db.users.update({ name: "Foo"}, { $pull: { "technologies": null } })
Append one element to an array
db.users.update({ name: "Foo"}, { $push: { "technologies": "yy" } })
Append multiple elements to an array
db.users.update(
{ name: "Foo"},
{ $push: { "technologies": { $each: ["abc", "def"] } } })
Insert multiple elements in an array
db.users.update(
{ name: "Foo"},
{ $push: { "technologies": { $each: ["one", "two"], $position: 4} } })
Drop collection
> use try
> db.users.drop()
true
Drop database
> use try
> db.dropDatabase()
MongoDB shell tools
- db.serverStatus()
- db.stats()
- db.collection.stats()
- db.printReplicationInfo()
- db.printSlaveReplicationInfo()
- db.collection.find().explain()
Resources
- H Waldo Grunenwald Introduction to MongoDB
- Introduction to NoSQL by Martin Fowler
- The People vs. NoSQL Databases: Panel Discussion Round-table with people from several NoSQL databases.
- Introduction to MongoDB (O'Reilly Webcast) A bit old,(from 2010), but a nice introduction to MongoDB.
- MongoDB Schema Design: How to Think NoSQL (O'Reilly Webcast)
- presentations
- MongoDB for Perl Developers by Ynon Perek
- What is NoSQL
- Book: Martin Fowler: NoSQL distilled
- Book: Domain-Driven design by Eric Evans
- Getting started with MongoDB
- Exploring the Aggregation Framework in MongoDB
- dropping ACID with MongoDB
Projection
Second {} in a find() to limit which fields are going to be returned.
.explain("executionStats").executionStats
Execution
COLLSCAN
IXSCAN
FETCH
SHARD_MERGE
Aggregation
Aggregation framework
- OLTP = Online Transaction Processing (sell tickets, registreations)
- OLAP = Online Analytical Procession (which tickets make the most money, how to predict usage)
A Processing Pipeline
Limit -> Match -> Project -> Group -> Sort -> Out
Pipeline Operators
$match Filter documents
$project Reshape documents
$group Summarize documents
$unwind Expand arrays in documents
$sort Order documents
$limit/$skip Paginate documents
$redact Restrict documents
$geoNear Proximity sort documents
$let, $map Define variables
$out
$lookup
Example: Insert data
use demo
db.simple.insert({ "name" : "foo", "cnt" : 3 })
db.simple.insert({ "name" : "foo", "cnt" : 4 })
db.simple.insert({ "name" : "foo", "cnt" : 5 })
db.simple.insert({ "name" : "bar", "cnt" : 5 })
db.simple.insert({ "name" : "bar", "cnt" : 9 })
Show data
db.simple.find()
{
"_id": ObjectId("5988aeb33b04d96991d11159"),
"name": "foo",
"cnt": 3
}
{
"_id": ObjectId("5988aeb33b04d96991d1115a"),
"name": "foo",
"cnt": 4
}
{
"_id": ObjectId("5988aeb33b04d96991d1115b"),
"name": "foo",
"cnt": 5
}
{
"_id": ObjectId("5988aeb33b04d96991d1115c"),
"name": "bar",
"cnt": 5
}
{
"_id": ObjectId("5988aeb43b04d96991d1115d"),
"name": "bar",
"cnt": 9
}
Fetched 5 record(s) in 4ms
Group by name and sum - $sum
- $group
- $sum
db.simple.aggregate( [ { $group: { _id: "$name", total: { $sum : "$cnt" } } } ])
{
"result": [
{
"_id": "bar",
"total": 14
},
{
"_id": "foo",
"total": 12
}
],
"ok": 1
}
Group by name and caluclate average $avg
- $avg
db.simple.aggregate( [ { $group: { _id: "$name", total: { $avg : "$cnt" } } } ] )
{
"result": [
{
"_id": "bar",
"total": 7
},
{
"_id": "foo",
"total": 4
}
],
"ok": 1
}
Group by name $max, $min
Group by name and count
db.simple.aggregate( [ { $group: { _id: "$name", total: { $sum : 1 } } } ] )
{
"result": [
{
"_id": "bar",
"total": 2
},
{
"_id": "foo",
"total": 3
}
],
"ok": 1
}
All the elements - sum
- null
db.simple.aggregate( [ { $group: { _id: null, total: { $sum : "$cnt" } } } ] )
{
"result": [
{
"_id": null,
"total": 26
}
],
"ok": 1
}
All the elements - sum
We can also use an arbitrary string instead of the null
db.simple.aggregate( [ { $group: { _id: "total", total: { $sum : "$cnt" } } } ])
{
"result": [
{
"_id": "total",
"total": 26
}
],
"ok": 1
}
$match (filter)
- $match
- filter
db.simple.aggregate( [ { $match : { name: "foo" } } ] )
{
"result": [
{
"_id": ObjectId("5988aeb33b04d96991d11159"),
"name": "foo",
"cnt": 3
},
{
"_id": ObjectId("5988aeb33b04d96991d1115a"),
"name": "foo",
"cnt": 4
},
{
"_id": ObjectId("5988aeb33b04d96991d1115b"),
"name": "foo",
"cnt": 5
}
],
"ok": 1
}
$match (filter) and the total
db.simple.aggregate( [
{ $match : { name: "foo" } },
{ $group: { _id: "total", total: { $sum : "$cnt" } } }
] )
{
"result": [
{
"_id": "total",
"total": 12
}
],
"ok": 1
}
$match with $gt
db.simple.aggregate( [
{ $match : { cnt: { $gt: 4 } } },
{ $group: { _id: "total", total: { $sum : "$cnt" } } }
] )
{
"result": [
{
"_id": "total",
"total": 19
}
],
"ok": 1
}
$group with $push
db.simple.aggregate( [ { $group: { _id: "$name", total: { $push : "$cnt" } } } ])
{
"result": [
{
"_id": "bar",
"total": [
5,
9
]
},
{
"_id": "foo",
"total": [
3,
4,
5
]
}
],
"ok": 1
}
New data set
db.scores.insert({ "name": "foo", "scores" : [ 2, 4, 6] })
db.scores.insert({ "name": "bar", "scores" : [ 9, 5, 3, 1] })
New data set
db.scores.find()
{
"_id": ObjectId("5988c63dd474ca272db3fb2e"),
"name": "foo",
"scores": [
2,
4,
6
]
}
{
"_id": ObjectId("5988c676d474ca272db3fb2f"),
"name": "bar",
"scores": [
9,
5,
3,
1
]
}
Fetched 2 record(s) in 21ms
$unwind
- $unwind
`` db.scores.aggregate( [ { $unwind: "$scores" } ] );
{ "result": [ { "_id": ObjectId("5988c63dd474ca272db3fb2e"), "name": "foo", "scores": 2 }, { "_id": ObjectId("5988c63dd474ca272db3fb2e"), "name": "foo", "scores": 4 }, { "_id": ObjectId("5988c63dd474ca272db3fb2e"), "name": "foo", "scores": 6 }, { "_id": ObjectId("5988c676d474ca272db3fb2f"), "name": "bar", "scores": 9 }, { "_id": ObjectId("5988c676d474ca272db3fb2f"), "name": "bar", "scores": 5 }, { "_id": ObjectId("5988c676d474ca272db3fb2f"), "name": "bar", "scores": 3 }, { "_id": ObjectId("5988c676d474ca272db3fb2f"), "name": "bar", "scores": 1 } ], "ok": 1 }
<!-- mdbook-embedify [footer] -->
<style> footer { text-align: center; text-wrap: balance; margin-top: 5rem; display: flex; flex-direction: column; justify-content: center; align-items: center; } footer p { margin: 0; }</style><footer><p>Copyright © 2025 • Created with ❤️ by <a href="https://szabgab.com/">Gábor Szabó</a></p></footer>
$unwind and $group
db.scores.aggregate( [
{ $unwind: "$scores" },
{ $group: { _id: "$name", score: { $sum: "$scores" } } }
] )
{
"result": [
{
"_id": "bar",
"score": 18
},
{
"_id": "foo",
"score": 12
}
],
"ok": 1
}
$unwind and $group and $sort
db.scores.aggregate( [
{ $unwind: "$scores" },
{ $group: { _id: "$name", score: { $sum: "$scores" } } },
{ $sort: { "score" : 1 } }
] )
{
"result": [
{
"_id": "foo",
"score": 12
},
{
"_id": "bar",
"score": 18
}
],
"ok": 1
}
$match
- $match
db.scores.aggregate( [ { $match: { name: "bar" } } ] )
{
"result": [
{
"_id": ObjectId("5988c676d474ca272db3fb2f"),
"name": "bar",
"scores": [
9,
5,
3,
1
]
}
],
"ok": 1
}
$match and $unwind and $group
db.scores.aggregate( [
{ $match: { name: "bar" } },
{ $unwind: "$scores" },
{ $group: { _id: "$name", score: { $sum: "$scores" } } }
] )
{
"result": [
{
"_id": "bar",
"score": 18
}
],
"ok": 1
}
Mailbox analyzing
Email messages
> use mboxer
> db.messages.count()
> db.messages.find().limit(1)
find
> db.messages.find({ 'From.address' : 'foo@bar.com' })
> db.messages.find({ 'From.address' : 'foo@bar.com' })
.sort({ Date : -1 })
.skip(20)
.limit(5)
Find by date
Messages before January 1, 2005
db.messages.find({ Date: {$lt: new Date(2005, 0, 1) }})
.limit(10)
Messages that have CC field
db.messages.find( { 'CC' : { $exists : true } } ).count()
Large messages
db.messages.find({ size : { $gt : 10000000 } }).count()
db.messages.find({ size : { $gt : 10000000 } })
List the 10 biggest messages
db.messages.find( ).sort( { size : -1 } ).limit(10)
The total size of the messages - aggregate
db.messages.aggregate( { $group : { _id : '', size : { $sum : '$size' } } } )
{
"result": [
{
"_id": "",
"size": NumberLong("17318623")
}
],
"ok": 1
}
The total size of the messages - mapReduce
Create a mapping function:
map = function() { emit('msize', this.size); }
Create a reduce function:
red = function(k, v) {
return Array.sum(v);
}
Run the mapReduce:
res = db.messages.mapReduce(map, red, {out: "Result"})
Get the result from the temporary collection called res
.
res.find()
{
"_id": "msize",
"value": 17318623
}
MongoDB with Perl
Simple
use strict;
use warnings;
use 5.010;
use MongoDB ();
use Data::Dumper qw(Dumper);
my $client = MongoDB::MongoClient->new(host => 'localhost', port => 27017);
my $db = $client->get_database( 'example_' . $$ . '_' . time );
my $people_coll = $db->get_collection('people');
$people_coll->insert( {
name => 'First',
});
$people_coll->insert( {
name => 'Second',
});
my $people = $people_coll->find;
while (my $p = $people->next) {
print Dumper $p;
}
#$db->drop;
Update
use strict;
use warnings;
use 5.010;
use MongoDB ();
use Data::Dumper qw(Dumper);
my $client = MongoDB::MongoClient->new(host => 'localhost', port => 27017);
my $db = $client->get_database( 'example_' . $$ . '_' . time );
my $people_coll = $db->get_collection('people');
$people_coll->insert( {
name => 'First',
});
$people_coll->insert( {
name => 'Second',
});
$people_coll->update(
{ name => 'Second'},
{ '$set' => {
phone => '1-123',
},
},
);
my $people = $people_coll->find;
while (my $p = $people->next) {
print Dumper $p;
}
END {
$db->drop if $db;
}
Unique
use strict;
use warnings;
use 5.010;
use MongoDB ();
use Data::Dumper qw(Dumper);
my $client = MongoDB::MongoClient->new(host => 'localhost', port => 27017);
my $db = $client->get_database( 'training' );
$db->drop;
#my @collections = $db->collection_names;
# without sparse we can get one document without the indexed field
# with spars=true we can get more such documents
my $users_coll = $db->get_collection('users');
my $res = $users_coll->ensure_index({ username => 1 }, {
unique => boolean::true,
sparse => boolean::false,
});
#die Dumper $res;
$users_coll->insert({ name => 'Foo', username => 'foo' });
eval {
$users_coll->insert({ name => 'Bar', username => 'foo' });
};
print $@;
eval {
$users_coll->insert({ name => 'Zorg' });
};
print $@;
eval {
$users_coll->insert({ name => 'Zorg2' });
};
print $@;
my $us = $users_coll->find;
while (my $user = $us->next) {
print Dumper $user;
}
Increment
use strict;
use warnings;
use 5.010;
use MongoDB ();
use Data::Dumper qw(Dumper);
my $client = MongoDB::MongoClient->new(host => 'localhost', port => 27017);
my $db = $client->get_database( 'example_' . time );
my $collection = $db->get_collection('counters');
$collection->insert( { _id => 'posts', seq => 0 });
say inc('posts');
say inc('posts');
say inc('posts');
say inc('posts');
say inc('posts');
sub inc {
my ($field) = @_;
$collection->find_and_modify( {
query => { _id => $field },
update => { '$inc' => { seq => 1 } },
new => 1, # to return the new value
# by default it returns the old value
} )->{seq}
}
$db->drop;
A counter with autoincremented field http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/
Phonebook
use 5.010;
use Moo;
use MooX::Options;
use MongoDB ();
use Data::Dumper qw(Dumper);
option add => (is => 'ro');
option list => (is => 'ro');
option update => (is => 'ro');
option delete => (is => 'ro');
option name => (is => 'ro', format => 's');
option nickname => (is => 'ro', format => 's');
option phone => (is => 'ro', format => 's');
sub run {
my ($self) = @_;
my $client = MongoDB::MongoClient->new(host => 'localhost', port => 27017);
my $db = $client->get_database( 'phonebook' );
my $people_coll = $db->get_collection('people');
say 'Processing ...';
if ($self->add) {
die "--name is needed" if not $self->name;
die "--phone is needed" if not $self->phone;
$people_coll->insert( {
name => $self->name,
phone => $self->phone,
nickname => $self->nickname,
});
} elsif ($self->list) {
my %query;
if ($self->name) {
$query{name} = $self->name;
}
if ($self->phone) {
$query{phone} = $self->phone;
}
my $people = $people_coll->find(\%query);
while (my $p = $people->next) {
printf "%s %s", $p->{name}, $p->{phone};
if ($p->{nickname}) {
print " ($p->{nickname})";
}
print "\n";
}
} elsif ($self->update) {
die "--name is needed" if not $self->name;
die "--phone is needed" if not $self->phone;
$people_coll->update(
{ name => $self->name },
{ '$set' => { phone => $self->phone }},
{ multiple => 1 },
);
} elsif ($self->delete) {
die "--name is needed" if not $self->name;
$people_coll->remove({ name => $self->name });
} else {
die "Missing -h, --add, --list or --update\n";
}
}
main->new_with_options->run;
MongoDB with Python
Simple
import sys
from pymongo import Connection
from pymongo.errors import ConnectionFailure
def main():
try:
c = Connection(host="localhost", port=27017)
except ConnectionFailure as e:
sys.stderr.write("Could not connect to MongoDB: {}\n".format(e))
exit()
db = c["users"]
student = {
"name" : "Foo Bar",
"email" : "foo@bar.com",
}
db.users.insert(student, w=1)
db.users.insert({ "name" : "other" }, w=1)
for user in db.users.find():
print(user)
main()