Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

MongoDB

Overview

MongoDB ("humongous") (= huge; enormous.)

  • Scalable, high-performance, open source NoSQL database

5 types of NoSQL databases

  • Key/value stores
  • Column stores
  • Graph
  • Multi-model database
  • Document store

Languages

The languages officially supported by MongoDB.

  • C/C++/C#
  • Go
  • Erlang
  • Java
  • JavaScript
  • Node.js
  • Perl
  • PHP
  • Python
  • Ruby
  • Scala
  • ...

There are many more languages supported by the Open Source community.

Documentation

Features

  • Horizontally Scalable Architectures: Sharding and Replication.
  • Journaling.
  • Map-reduce.
  • Full text index.
  • "Stored procedures" simple JavaScript.

Limitations

  • Giving up on joins and complex transactions.

Performance

  • Native binary protocol between client and server.
  • Insert does not (necessarily) wait for the write to the disk.

Auto-sharding

The mongos process knows which shard holds which part of the data.

Sharding

  • mongos = mongo Sharding router process (in front of the mongod daemons)
  • Config Server = Where data is located (the address book of the shard)
  • We shard specific collections based on an existing unique index on the collection.
  • Scatter-gather query.

Picking a Shard Key

  • Cardinality
  • Write Distribution
  • Query Isolation
  • Hashed shard key

Replica set - high availability and data duability

  • They are called primary and secondaries.
  • Provides automatic fail-over.
  • Primary
  • (Operation log, Op-log used to replicate the data to the secondaries)

If the primary goes off-line, the remaining secondary nodes elect a new primary. When the old primary comes back online it will become another secondary.

Write concern

  • 0 - (Unacknoledged)
  • 1 - (Acknowledged) by the primary when it has it in its own memory.
  • N - Acknowledged by N servers.
  • "majority" - the majority of the replicas have the information.

Storage Engines

  • MMAPV1 (legacy)
  • In-memory
  • Encrypted
  • WiredTiger

MongoDB Data format

Data is kept in BSON format (Binary JSON) which is JSON with some extensions.

  • JSON = JavaScript Object
  • PHP array
  • Python Dictionary
  • Perl Hash
  • Ruby Hash

Driver Responsibility

  • Authentication and Security
  • Python, Perl, etc. conversion to BSON
  • Error handling and recovery
  • Wire Protocol
  • Topology Management
  • Connection Pool

First steps

SQL to MongoDB terms

SQL                       MongoDB
--------                  --------
database                  database
table                     collection
row                       BSON document

index                     index
column                    not really
join                      Lookup
                          embedding
Foreign key               Reference
primary key               _id field
group by                  aggregation
Multi-table transaction   Single document transaction

Embedded documents

A "document" usually refers to a simple, one-level hash. If one of the values is itself a hash we can say "embeded document". Though in other languages we might just talk about a large, multi-level data-structure, without this whole idea that they are somehow embedded in each other.

GUI client for MongoDB

Atlas hosted environment

Command line tools

  • mongod - the server process

  • mongo - the MongoDB shell

  • mongodump - Dump the data into BSON files (creating the dump/ subdirectory).

  • mongoexport - Export MongoDB data to CSV, TSV or JSON files.

  • mongofiles -

  • mongoimport - Import CSV, TSV or JSON data into MongoDB.

  • mongooplog -

  • mongoperf -

  • mongorestore - Import BSON files.

  • mongos -

  • mongosniff -

  • mongostat - A bit like top, showing live performance stats.

  • mongotop - Another top-like tool.

MongoDB on the command line

$ mongo
> help
  db.help()                    help on db methods
  db.mycoll.help()             help on collection methods
  sh.help()                    sharding helpers
  rs.help()                    replica set helpers
  help admin                   administrative help
  help connect                 connecting to a db help
  help keys                    key shortcuts
  help misc                    misc things to know
  help mr                      mapreduce

  show dbs                     show database names
  show collections             show collections in current database
  show users                   show users in current database
  show profile                 show most recent system.profile entries
                               with time >= 1ms
  show logs                    show the accessible logger names
  show log [name]              prints out the last segment of log in memory,
                               'global' is default
  use db_name                  set current database
  db.foo.find()                list objects in collection foo
  db.foo.find( { a : 1 } )     list objects in foo where a == 1
  it                           result of the last line evaluated;
                               use to further iterate
  DBQuery.shellBatchSize = x   set default number of items to display on shell
  exit                         quit the mongo shell
> exit
bye
$

insert

  • insert
  • find
> show dbs
...

> use try
switched to db try

> show collections

> db.users.insert({ name: 'Foo' })
 
> show collections
system.indexes
users

> db.users.find()
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }

insert more

> db.users.insert({ name: 'Bar' })
> db.users.insert({ name: 'Moo' })
> db.users.find()
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }
{ "_id" : ObjectId("526b9048a444e81f1ca06ba3"), "name" : "Bar" }
{ "_id" : ObjectId("526ba048a444e81f1ca06ba4"), "name" : "Moo" }

Pretty printing in the MongoDB shell

db.collection.find().limit(1).pretty()

Or add DBQuery.prototype._prettyShell = true to ~/.mongorc.js

Even prettier shell.

find

  • find

Searching for documents


> db.users.find({ _id: ObjectId("526b9048a444e81f1ca06ba3") })
{ "_id" : ObjectId("526b9048a444e81f1ca06ba3"), "name" : "Bar" }

> db.users.find({ name: 'Foo' })
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }


> db.users.find({ name: /o/ })
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }
{ "_id" : ObjectId("526ba048a444e81f1ca06ba4"), "name" : "Moo" }

update a document

  • update
> db.users.update({ name : 'Bar' }, { name: 'Zorg'})
> db.users.find()
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }
{ "_id" : ObjectId("526b9048a444e81f1ca06ba3"), "name" : "Zorg" }
{ "_id" : ObjectId("526ba048a444e81f1ca06ba4"), "name" : "Moo" }

This will replace the entire docuent keeping only the _id.

Update modifiers for fields

  • $set
  • $inc
  • $rename
  • $setOnInsert
  • $unset

update - add values using $set

  • $se}
> db.users.update({ name : 'Zorg' }, { $set : { child: 'Jane' } })

> db.users.find()
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }
{
    "_id" : ObjectId("526b9048a444e81f1ca06ba3"),
    "name" : "Zorg",
    "child" : "Jane"
}
{ "_id" : ObjectId("526ba048a444e81f1ca06ba4"), "name" : "Moo" }

update - add array using $set

  • $set
> db.users.update({ name : 'Zorg' }, { $set : { kids: [ { name: 'Jane' } ] } })

> db.users.find()
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }
{
    "_id" : ObjectId("526b9048a444e81f1ca06ba3"), "name" : "Zorg",
    "child" : "Jane", 
    "kids" : [
        {
            "name" : "Jane"
        }
    ]
}
{ "_id" : ObjectId("526ba048a444e81f1ca06ba4"), "name" : "Moo" }

remove value from document

  • $unset
> db.users.update({ name : 'Zorg' }, { $unset : { child : true } } )

> db.users.find()
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }
{
     "_id" : ObjectId("526b9048a444e81f1ca06ba3"),
     "name" : "Zorg",
     "kids" : [
         {
             "name" : "Jane"
         }
      ]
}
{ "_id" : ObjectId("526ba048a444e81f1ca06ba4"), "name" : "Moo" }

transaction in a document

> db.users.update({ name : 'Zorg' }, { $set : { kids: [ { name: 'Joe' } ] }, 
                                       $unset : { kid : '' } } )

> db.users.find()
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }
{ "_id" : ObjectId("526b9048a444e81f1ca06ba3"), "name" : "Zorg",
     "kids" : [ { "name" : "Joe" } ] }
{ "_id" : ObjectId("526ba048a444e81f1ca06ba4"), "name" : "Moo" }


push

  • $push
> db.users.update({ name : 'Zorg' }, { $push : { kids: { name: 'Marry' } } })

> db.users.find()
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }
{ "_id" : ObjectId("526b9048a444e81f1ca06ba3"), "name" : "Zorg",
     "kids" : [ { "name" : "Joe" }, { "name" : "Marry" } ] }
{ "_id" : ObjectId("526ba048a444e81f1ca06ba4"), "name" : "Moo" }

Update modifiers for arrays

  • $push
  • $pop
  • $pull
  • $pushAll
  • $pullAll
  • $addToSet
  • $ - The single $ sign used as a placeholder

Update array element

> db.users.update({ name: "Zorg", "kids.name" : 'Marry' },
         { $set : { "kids.$.name" : "Mary" } })

> db.users.find()
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }
{ "_id" : ObjectId("526b9048a444e81f1ca06ba3"), "name" : "Zorg",
     "kids" : [ { "name" : "Joe" }, { "name" : "Mary" } ] }
{ "_id" : ObjectId("526ba048a444e81f1ca06ba4"), "name" : "Moo" }

Remove array element

  • $pull
> db.users.update({ name: "Zorg" },
         { $pull : { kids : { name : "Joe" } } })

> db.users.find()
{ "_id" : ObjectId("526b8fdba444e81f1ca06ba2"), "name" : "Foo" }
{ "_id" : ObjectId("526b9048a444e81f1ca06ba3"), "name" : "Zorg",
     "kids" : [ { "name" : "Mary" } ] }
{ "_id" : ObjectId("526ba048a444e81f1ca06ba4"), "name" : "Moo" }

_id is unique in a collection

  • _id
> db.a.insert({ _id : 'a', value : 1 })
Inserted 1 record(s) in 20ms

> db.a.find()
{
  "_id": "a",
  "value": 1
}
Fetched 1 record(s) in 1ms

> db.a.insert({ _id : 'a', value : 2 })
E11000 duplicate key error index: test.a.$_id_  dup key: { : "a" }

> db.b.insert({ _id : 'a', value : 2 })
Inserted 1 record(s) in 763ms

> db.a.find()
{
  "_id": "a",
  "value": 1
}
Fetched 1 record(s) in 1ms

> db.b.find()
{
  "_id": "a",
  "value": 2
}
Fetched 1 record(s) in 1ms

update and save

  • update

  • save

  • save() replaces a single document (or creates a new one)

  • update() updates a single document (or multiple documents)

save()

  • save
> db.config.save({ from : "mongodb@edumaven.com" })

> db.config.find()
{ "_id" : ObjectId("5305c8b37ec4c674d24813a8"), "from" : "mongodb@edumaven.com" }

> db.config.save({ _id : 'from', value : 'mongodb@edumaven.com' })

> db.config.find()
{ "_id" : ObjectId("5305c8b37ec4c674d24813a8"), "from" : "mongodb@edumaven.com" }
{ "_id" : "from", "value" : "mongodb@edumaven.com" }

> db.config.save({ _id : 'from', value : 'foo@bar.com' })

> db.config.find()
{ "_id" : ObjectId("5305c8b37ec4c674d24813a8"), "from" : "mongodb@edumaven.com" }
{ "_id" : "from", "value" : "foo@bar.com" }
 

Remove (delete) document

db.a.remove({  "_id" : ObjectId("52ef998222e9d7ee82000000") })

Count documents

  • db.collection.count()
  • db.collection.find().count()

Find deeply

> db.people.insert({ 'name' : 'foo', child : { name : 'bar' } })
Inserted 1 record(s) in 11ms

> db.people.find()
{
  "_id": ObjectId("53077243ad9a4f5b93fadc64"),
  "name": "foo",
  "child": {
    "name": "bar"
  }
}
Fetched 1 record(s) in 1ms

> db.people.find({ name : 'foo' })
{
  "_id": ObjectId("53077243ad9a4f5b93fadc64"),
  "name": "foo",
  "child": {
    "name": "bar"
  }
}
Fetched 1 record(s) in 1ms

> db.people.find({ name : 'bar' })
Fetched 0 record(s) in 1ms

> db.people.find({ "child.name" : 'bar' })
{
  "_id": ObjectId("53077243ad9a4f5b93fadc64"),
  "name": "foo",
  "child": {
    "name": "bar"
  }
}
Fetched 1 record(s) in 1ms

sort()

find() returns a cursor instance - lazy retreival!

push

  • $push
db.emails.update({ _id: msg._id },
                 {$push: {tags : "mongodb"}})

Update modifiers

  • $each
  • $slice
  • $sort

Conditional Operators

  • $in

  • $nin

  • $mod

  • $all

  • $size

  • $exists

  • $type

  • $ne

  • $lt

  • $lte

  • $gt

  • $gte

Indexes

Indexes are per collection

  • B-tree indexes

  • _id is indexed (and unique)

  • Single Field

  • Compund-key indexes

  • Multi-key indexes (indexing each element of an array)

  • Geospatial Index

  • Text Indexes (beta)

If there is an array of values (eg. tags) then the specific document will be indexed by each one of the values in that array. "Fake full-text indexing."

Creat Indexes

db.users.createIndex( {"field" : 1 });
db.users.dropIndex({ "field" : 1 });
db.users.getIndexes

Unique Index

Make sure that two documents cannot contain the same value in the given field.

db.collection.ensureIndex({ field : 1 })

Sparse index

When only some of the documents have a certain field it might be useful to set the index to be sparse. That way only docuemnts that actually have that field will be in the index.

> db.collection.ensureIndex({ field : 1 }, {sparse : true})

Index embedded document (or subkey)

> db.messages.ensureIndex({ "From.address" : 1 })

> db.messages.find({ "From.address" : "foo@bar.com" })

Insert complex data structure

db.users.insert({
    "name" : "Foo",
    "email" : "foo@company.com",
    "technologies" : [
        "Perl",
        "Python",
        "JavaScript",
        "SQL",
        "NoSQL",
        "MongoDB"
    ],
    "a" : {
        "b" : {
            "c" : 42
        }
    }
})

Display complex data structure

db.users.find({ name: "Foo"})

{
  "_id": ObjectId("59895a5d98392edc6c7074ca"),
  "name": "Foo",
  "email": "foo@company.com",
  "technologies": [
    "Perl",
    "Python",
    "JavaScript",
    "SQL",
    "NoSQL",
    "MongoDB"
  ],
  "a": {
    "b": {
      "c": 42
    }
  }
}
Fetched 1 record(s) in 3ms

Rename embedded document

db.users.update({ name: "Foo"}, { $rename: { 'a.b.c': 'a.b.d' } })
db.users.find({ name: "Foo"})
{
  "_id": ObjectId("59895a5d98392edc6c7074ca"),
  "name": "Foo",
  "email": "foo@company.com",
  "technologies": [
    "Perl",
    "Python",
    "JavaScript",
    "SQL",
    "NoSQL",
    "MongoDB"
  ],
  "a": {
    "b": {
      "d": 42
    }
  }
}
Fetched 1 record(s) in 2ms

Remove ($unset) embedded document

db.users.update({ name: "Foo"}, { $unset: { 'a.b.d' : true } })
Updated 1 existing record(s) in 2ms
WriteResult({
  "nMatched": 1,
  "nUpserted": 0,
  "nModified": 1
})
db.users.find({ name: "Foo"})
{
  "_id": ObjectId("59895a5d98392edc6c7074ca"),
  "name": "Foo",
  "email": "foo@company.com",
  "technologies": [
    "Perl",
    "Python",
    "JavaScript",
    "SQL",
    "NoSQL",
    "MongoDB"
  ],
  "a": {
    "b": {

    }
  }
}
Fetched 1 record(s) in 2ms

Change element of array

db.users.update({ name: "Foo"}, { $set: { 'technologies.2': 'JS' } })

Updated 1 existing record(s) in 37ms
WriteResult({
  "nMatched": 1,
  "nUpserted": 0,
  "nModified": 1
})
db.users.find({ name: "Foo"})

{
  "_id": ObjectId("59895a5d98392edc6c7074ca"),
  "name": "Foo",
  "email": "foo@company.com",
  "technologies": [
    "Perl",
    "Python",
    "JS",
    "SQL",
    "NoSQL",
    "MongoDB"
  ],
  "a": {
    "b": {
    }
  }
}
Fetched 1 record(s) in 11ms

Add embedded document

db.users.update({ name: "Foo"}, { $set: { 'a.b.x': 19 } })
Updated 1 existing record(s) in 7ms
WriteResult({
  "nMatched": 1,
  "nUpserted": 0,
  "nModified": 1
})
> db.users.find({ name: "Foo"})
{
  "_id": ObjectId("59895a5d98392edc6c7074ca"),
  "name": "Foo",
  "email": "foo@company.com",
  "technologies": [
    "Perl",
    "Python",
    "JS",
    "SQL",
    "NoSQL",
    "MongoDB"
  ],
  "a": {
    "b": {
      "x": 19
    }
  }
}
Fetched 1 record(s) in 3ms

Remove elements of an array by value

db.users.update({ name: "Foo"}, { $pull: { "technologies": "yy" } })

Remove elements of an array by index (trick)

db.users.update({ name: "Foo"}, { $set: { "technologies.2": null } })
db.users.update({ name: "Foo"}, { $pull: { "technologies": null } })

Append one element to an array

db.users.update({ name: "Foo"}, { $push: { "technologies": "yy" } })

Append multiple elements to an array

db.users.update(
    { name: "Foo"},
    { $push: { "technologies": { $each: ["abc", "def"] } } })

Insert multiple elements in an array

db.users.update(
   { name: "Foo"},
   { $push: { "technologies": { $each: ["one", "two"], $position: 4} } })

Drop collection

> use try
> db.users.drop()
true

Drop database

> use try
> db.dropDatabase()

MongoDB shell tools

  • db.serverStatus()
  • db.stats()
  • db.collection.stats()
  • db.printReplicationInfo()
  • db.printSlaveReplicationInfo()
  • db.collection.find().explain()

Resources

Projection

Second {} in a find() to limit which fields are going to be returned.
.explain("executionStats").executionStats

Execution
COLLSCAN
IXSCAN
FETCH
SHARD_MERGE

Aggregation

Aggregation framework

  • OLTP = Online Transaction Processing (sell tickets, registreations)
  • OLAP = Online Analytical Procession (which tickets make the most money, how to predict usage)

A Processing Pipeline

Limit -> Match -> Project -> Group -> Sort -> Out

Pipeline Operators

$match          Filter documents
$project        Reshape documents
$group          Summarize documents
$unwind         Expand arrays in documents
$sort           Order documents
$limit/$skip    Paginate documents
$redact         Restrict documents
$geoNear        Proximity sort documents
$let, $map      Define variables
$out
$lookup

Example: Insert data

use demo
db.simple.insert({ "name" : "foo", "cnt" : 3 })
db.simple.insert({ "name" : "foo", "cnt" : 4 })
db.simple.insert({ "name" : "foo", "cnt" : 5 })
db.simple.insert({ "name" : "bar", "cnt" : 5 })
db.simple.insert({ "name" : "bar", "cnt" : 9 })

Show data

db.simple.find()

{
  "_id": ObjectId("5988aeb33b04d96991d11159"),
  "name": "foo",
  "cnt": 3
}
{
  "_id": ObjectId("5988aeb33b04d96991d1115a"),
  "name": "foo",
  "cnt": 4
}
{
  "_id": ObjectId("5988aeb33b04d96991d1115b"),
  "name": "foo",
  "cnt": 5
}
{
  "_id": ObjectId("5988aeb33b04d96991d1115c"),
  "name": "bar",
  "cnt": 5
}
{
  "_id": ObjectId("5988aeb43b04d96991d1115d"),
  "name": "bar",
  "cnt": 9
}
Fetched 5 record(s) in 4ms

Group by name and sum - $sum

  • $group
  • $sum
db.simple.aggregate( [ { $group: { _id: "$name", total: { $sum : "$cnt" } } } ])

{
  "result": [
    {
      "_id": "bar",
      "total": 14
    },
    {
      "_id": "foo",
      "total": 12
    }
  ],
  "ok": 1
}

Group by name and caluclate average $avg

  • $avg
db.simple.aggregate( [ { $group: { _id: "$name", total: { $avg : "$cnt" } } } ] )

{
  "result": [
    {
      "_id": "bar",
      "total": 7
    },
    {
      "_id": "foo",
      "total": 4
    }
  ],
  "ok": 1
}

Group by name $max, $min

Group by name and count

db.simple.aggregate( [ { $group: { _id: "$name", total: { $sum : 1 } } } ] )

{
  "result": [
    {
      "_id": "bar",
      "total": 2
    },
    {
      "_id": "foo",
      "total": 3
    }
  ],
  "ok": 1
}

All the elements - sum

  • null
db.simple.aggregate( [ { $group: { _id: null, total: { $sum : "$cnt" } } } ] )

{
  "result": [
    {
      "_id": null,
      "total": 26
    }
  ],
  "ok": 1
}

All the elements - sum

We can also use an arbitrary string instead of the null

db.simple.aggregate( [ { $group: { _id: "total", total: { $sum : "$cnt" } } } ])

{
  "result": [
    {
      "_id": "total",
      "total": 26
    }
  ],
  "ok": 1
}

$match (filter)

  • $match
  • filter
db.simple.aggregate( [ { $match : { name: "foo" } } ] )

{
  "result": [
    {
      "_id": ObjectId("5988aeb33b04d96991d11159"),
      "name": "foo",
      "cnt": 3
    },
    {
      "_id": ObjectId("5988aeb33b04d96991d1115a"),
      "name": "foo",
      "cnt": 4
    },
    {
      "_id": ObjectId("5988aeb33b04d96991d1115b"),
      "name": "foo",
      "cnt": 5
    }
  ],
  "ok": 1
}

$match (filter) and the total

db.simple.aggregate( [
    { $match : { name: "foo" } },
    { $group: { _id: "total", total: { $sum : "$cnt" } } }
] )

{
  "result": [
    {
      "_id": "total",
      "total": 12
    }
  ],
  "ok": 1
}

$match with $gt

db.simple.aggregate( [
    { $match : { cnt: { $gt: 4 } } },
    { $group: { _id: "total", total: { $sum : "$cnt" } } }
] )

{
  "result": [
    {
      "_id": "total",
      "total": 19
    }
  ],
  "ok": 1
}

$group with $push

db.simple.aggregate( [ { $group: { _id: "$name", total: { $push : "$cnt" } } } ])

{
  "result": [
    {
      "_id": "bar",
      "total": [
        5,
        9
      ]
    },
    {
      "_id": "foo",
      "total": [
        3,
        4,
        5
      ]
    }
  ],
  "ok": 1
}

New data set

db.scores.insert({ "name": "foo", "scores" : [ 2, 4, 6] })
db.scores.insert({ "name": "bar", "scores" : [ 9, 5, 3, 1] })

New data set

db.scores.find()

{
  "_id": ObjectId("5988c63dd474ca272db3fb2e"),
  "name": "foo",
  "scores": [
    2,
    4,
    6
  ]
}
{
  "_id": ObjectId("5988c676d474ca272db3fb2f"),
  "name": "bar",
  "scores": [
    9,
    5,
    3,
    1
  ]
}
Fetched 2 record(s) in 21ms

$unwind

  • $unwind

`` db.scores.aggregate( [ { $unwind: "$scores" } ] );

{ "result": [ { "_id": ObjectId("5988c63dd474ca272db3fb2e"), "name": "foo", "scores": 2 }, { "_id": ObjectId("5988c63dd474ca272db3fb2e"), "name": "foo", "scores": 4 }, { "_id": ObjectId("5988c63dd474ca272db3fb2e"), "name": "foo", "scores": 6 }, { "_id": ObjectId("5988c676d474ca272db3fb2f"), "name": "bar", "scores": 9 }, { "_id": ObjectId("5988c676d474ca272db3fb2f"), "name": "bar", "scores": 5 }, { "_id": ObjectId("5988c676d474ca272db3fb2f"), "name": "bar", "scores": 3 }, { "_id": ObjectId("5988c676d474ca272db3fb2f"), "name": "bar", "scores": 1 } ], "ok": 1 }




<!-- mdbook-embedify [footer]  -->
<style> footer { text-align: center; text-wrap: balance; margin-top: 5rem; display: flex; flex-direction: column; justify-content: center; align-items: center; } footer p { margin: 0; }</style><footer><p>Copyright © 2025 • Created with ❤️ by <a href="https://szabgab.com/">Gábor Szabó</a></p></footer>

$unwind and $group

db.scores.aggregate( [
    { $unwind: "$scores" },
    { $group: { _id: "$name", score: { $sum: "$scores" } } }
] )

{
  "result": [
    {
      "_id": "bar",
      "score": 18
    },
    {
      "_id": "foo",
      "score": 12
    }
  ],
  "ok": 1
}

$unwind and $group and $sort

db.scores.aggregate( [
    { $unwind: "$scores" },
    { $group: { _id: "$name", score: { $sum: "$scores" } } }, 
    { $sort: { "score" : 1 } }
] )

{
  "result": [
    {
      "_id": "foo",
      "score": 12
    },
    {
      "_id": "bar",
      "score": 18
    }
  ],
  "ok": 1
}

$match

  • $match
db.scores.aggregate( [ { $match: { name: "bar" } } ] )

{
  "result": [
    {
      "_id": ObjectId("5988c676d474ca272db3fb2f"),
      "name": "bar",
      "scores": [
        9,
        5,
        3,
        1
      ]
    }
  ],
  "ok": 1
}

$match and $unwind and $group

db.scores.aggregate( [
    { $match: { name: "bar" } },
    { $unwind: "$scores" },
    { $group: { _id: "$name", score: { $sum: "$scores" } } }
] )

{
  "result": [
    {
      "_id": "bar",
      "score": 18
    }
  ],
  "ok": 1
}

Mailbox analyzing

Email messages

> use mboxer
> db.messages.count()
> db.messages.find().limit(1)

find

> db.messages.find({ 'From.address' : 'foo@bar.com' })

> db.messages.find({ 'From.address' : 'foo@bar.com' })
         .sort({ Date : -1 })
         .skip(20)
         .limit(5)

Find by date

Messages before January 1, 2005

db.messages.find({ Date: {$lt: new Date(2005, 0, 1) }})
         .limit(10)

Messages that have CC field

db.messages.find( { 'CC' : { $exists : true } } ).count()

Large messages

db.messages.find({ size : { $gt : 10000000 } }).count()
db.messages.find({ size : { $gt : 10000000 } })

List the 10 biggest messages

db.messages.find( ).sort( { size : -1 } ).limit(10)

The total size of the messages - aggregate

db.messages.aggregate( { $group : { _id : '', size : { $sum : '$size' } } } )

{
  "result": [
    {
      "_id": "",
      "size": NumberLong("17318623")
    }
  ],
  "ok": 1
}

The total size of the messages - mapReduce

Create a mapping function:

map = function() { emit('msize', this.size); }

Create a reduce function:

red = function(k, v) {
  return Array.sum(v);
}

Run the mapReduce:

res = db.messages.mapReduce(map, red, {out: "Result"})

Get the result from the temporary collection called res.

res.find()

{
  "_id": "msize",
  "value": 17318623
}

MongoDB with Perl

Simple

use strict;
use warnings;
use 5.010;

use MongoDB ();
use Data::Dumper qw(Dumper);

my $client = MongoDB::MongoClient->new(host => 'localhost', port => 27017);
my $db   = $client->get_database( 'example_' . $$ . '_' . time  );

my $people_coll = $db->get_collection('people');

$people_coll->insert( {
    name => 'First',
});

$people_coll->insert( {
    name => 'Second',
});

my $people = $people_coll->find;
while (my $p = $people->next) {
    print Dumper $p;
}
#$db->drop;

Update

use strict;
use warnings;
use 5.010;

use MongoDB ();
use Data::Dumper qw(Dumper);

my $client = MongoDB::MongoClient->new(host => 'localhost', port => 27017);
my $db   = $client->get_database( 'example_' . $$ . '_' . time  );

my $people_coll = $db->get_collection('people');

$people_coll->insert( {
    name => 'First',
});

$people_coll->insert( {
    name => 'Second',
});

$people_coll->update(
    { name => 'Second'},
    { '$set' => {
            phone => '1-123',
        },
    },
);

my $people = $people_coll->find;
while (my $p = $people->next) {
    print Dumper $p;
}

END {
    $db->drop if $db;
}


Unique

use strict;
use warnings;
use 5.010;

use MongoDB ();
use Data::Dumper qw(Dumper);

my $client = MongoDB::MongoClient->new(host => 'localhost', port => 27017);
my $db   = $client->get_database( 'training' );
$db->drop;
#my @collections = $db->collection_names;


# without sparse we can get one document without the indexed field
# with spars=true we can get more such documents
my $users_coll = $db->get_collection('users');
my $res = $users_coll->ensure_index({ username => 1 }, {
    unique => boolean::true,
    sparse => boolean::false,
});
#die Dumper $res;
$users_coll->insert({ name => 'Foo', username => 'foo' });
eval {
    $users_coll->insert({ name => 'Bar', username => 'foo' });
};
print $@;
eval {
    $users_coll->insert({ name => 'Zorg' });
};
print $@;
eval {
    $users_coll->insert({ name => 'Zorg2' });
};
print $@;


my $us = $users_coll->find;
while  (my $user = $us->next) {
    print Dumper $user;
}



Increment

use strict;
use warnings;
use 5.010;

use MongoDB ();
use Data::Dumper qw(Dumper);

my $client = MongoDB::MongoClient->new(host => 'localhost', port => 27017);
my $db   = $client->get_database( 'example_' . time  );

my $collection = $db->get_collection('counters');

$collection->insert( { _id => 'posts', seq => 0 });
say inc('posts');
say inc('posts');
say inc('posts');
say inc('posts');
say inc('posts');

sub inc {
    my ($field) = @_;
    $collection->find_and_modify( {
        query  => { _id => $field },
        update => { '$inc' => { seq => 1 } },
        new  => 1,  # to return the new value
                    # by default it returns the old value
    } )->{seq}
}

$db->drop;

A counter with autoincremented field http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/

Phonebook

use 5.010;
use Moo;
use MooX::Options;

use MongoDB ();
use Data::Dumper qw(Dumper);

option add   => (is => 'ro');
option list  => (is => 'ro');
option update => (is => 'ro');
option delete => (is => 'ro');
option name  => (is => 'ro', format => 's');
option nickname  => (is => 'ro', format => 's');
option phone => (is => 'ro', format => 's');

sub run {
    my ($self) = @_;

    my $client = MongoDB::MongoClient->new(host => 'localhost', port => 27017);
    my $db   = $client->get_database( 'phonebook' );
    my $people_coll = $db->get_collection('people');

    say 'Processing ...';
    if ($self->add) {
        die "--name is needed" if not $self->name;
        die "--phone is needed" if not $self->phone;
        $people_coll->insert( {
            name => $self->name,
            phone => $self->phone,
            nickname => $self->nickname,
        });
    } elsif ($self->list) {
        my %query;
        if ($self->name) {
            $query{name} = $self->name;
        }
        if ($self->phone) {
            $query{phone} = $self->phone;
        }
        my $people = $people_coll->find(\%query);
        while (my $p = $people->next) {
            printf "%s  %s", $p->{name}, $p->{phone};
            if ($p->{nickname}) {
                print " ($p->{nickname})";
            }
            print "\n";
        }
    } elsif ($self->update) {
        die "--name is needed" if not $self->name;
        die "--phone is needed" if not $self->phone;
        
        $people_coll->update(
            { name => $self->name },
            { '$set' => { phone => $self->phone }},
            { multiple => 1 },
        );
    } elsif ($self->delete) {
        die "--name is needed" if not $self->name;
        $people_coll->remove({ name => $self->name });
    } else {
        die "Missing -h, --add,  --list or --update\n";
    }
}
 
main->new_with_options->run;



MongoDB with Python

Simple

import sys
from pymongo import Connection
from pymongo.errors import ConnectionFailure

def main():

    try:
       c = Connection(host="localhost", port=27017)
    except ConnectionFailure as e:
        sys.stderr.write("Could not connect to MongoDB: {}\n".format(e))
        exit()

    db = c["users"]

    student = {
        "name"  : "Foo Bar",
        "email" : "foo@bar.com",
    }

    db.users.insert(student, w=1)

    db.users.insert({ "name" : "other" }, w=1)

    for user in db.users.find():
        print(user)

main()