Makai's Blog

RSS

MongoDB: An Introduction (Part 1)

Document-Oriented Data Stores

A document-oriented data store extends the key-value pair model by providing a structure that the data store understands.[1] Document-oriented data stores are inspired by Lotus Notes[2] and the simplicity of the JavaScript Object Notation (JSON) format. The current leading document-oriented data stores are MongoDB, CouchDB, and Riak.

MongoDB

MongoDB is an open source document-oriented data store favoring the Consistency and Availability principles of the CAP Theorem. The term ‘Mongo’ comes from ‘humongous,’ as in the amount of data MongoDB allows you to store in its non-relational structure.

The company 10gen actively leads development on MongoDB and coordinates open source contributions. Core MongoDB functionality is written in C++ and official drivers are available for Java, C++, Python, Ruby, Scala and several other languages.[3] Drivers for Clojure, Groovy, R, Erlang, and many other languages are supported by community efforts. 10gen also provides commercial training and services to generate revenue which is partially reinvested in the data store’s development.[4]

MongoDB should only be run under 64-bit operating systems because of the way it addresses the data store. The limitation stems from MongoDB’s storage implementation using memory mapped files for performance reasons.[5] Running on 32-bit systems will work, but MongoDB will only be able to store about 2.5 gigabytes total - fine for some local development work but not for most production software.

Lingo

There are several terms commonly used in MongoDB literature:

  1. Collection - roughly equivalent to a table in a relational database in that it contains zero to many documents.
  2. Document - roughly equivalent to a row in a relational database in that it contains a logical grouping of data elements. Documents contain key-value pairs that represent stored data in MongoDB.
  3. Schemaless - documents stored in the same MongoDB collection can have varying fields and elements within a document are not held to the same structure.

Data Storage Structure

Data are stored and represented as documents in Binary JavaScript Object Notation (BSON).[6] The BSON notation is identical to standard JavaScript Object Notation (JSON) for most structures. For example, here is an order for a single coffee at a cafe:

{

“_id” : ObjectId(“4de2fefcfe376e36c3bc620b”),

“coffee” : “Americano”,

“room_for_milk” : false,

“price” : 3.95

}

In the preceding example there are four keys: “_id”, “coffee”, “room_for_milk”, and “price”. Each of the keys has a single corresponding value: ObjectId(“4de2fefcfe376e36c3bc620b”), “Americano”, false, and 3.95, respectively. Each value has a data type. ObjectId(“4de2fefcfe376e36c3bc620b”) is an object identifier that is automatically generated by the MongoDB data store upon insertion of the document. “Americano” is a string. false is a Boolean. 3.95 is a float (note that floats should not be used to store monetary values in a production setting because of inaccuracies in rounding). The four keys and values are wrapped in curly braces and the resulting structure is called a document.

There are six basic JSON data types as well as several additional data types in MongoDB. The original six JSON data types are:

  1. null - represents both a null value and a nonexistent field
  2. boolean - two values, true and false
  3. numeric - 32-bit integer, 64-bit integer, and 64-bit floating point handled automatically by MongoDB
  4. string - a UTF-8 string of characters
  5. array - lists or sets of values that can be heterogeneous in type
  6. object - a JSON object

MongoDB’s extended types beyond the basic JSON data types are:

  1. embedded document
  2. JavaScript code
  3. minimum value
  4. maximum value
  5. object id
  6. date
  7. regular expression
  8. symbol
  9. binary data
  10. undefined

MongoDB’s schema-less design allows the creation of documents with variable structure. The variable structure works well for rapid prototyping and prevention of having to alter tables to add new attributes to documents. However, the schema-less design also prevents the creation of constraints to standardize data found in SQL databases.

There is less normalization involved in a typical MongoDB set up because there are no server-side joins.[7] Instead of joining separate relational tables, embedded objects can be inserted inside documents.

Inserting

Data manipulation in MongoDB can be performed through the shell and its JavaScript syntax. For example, here is the syntax for inserting a document into the “mydb” collection:

> db.mydb.insert({“coffee” : “Latte”, “price” : 4.95, “notes” : “customer wants room for milk”})

Inserts are non-blocking by default and do not wait for a response from the server. You can also specify “safe inserts” that wait for a response value from the server indicating whether the operation was successful or had an error.

Batch inserts are much faster than incremental data insertion. The MongoDB team recommends preallocating space with blank documents when performing numerous inserts of a predefined size.

Querying

10gen also touts MongoDB’s dynamic query language as a core feature and critical to accelerate of the development process.# The query language is not SQL, instead it is based on key and value matching. For example, here is a query to find all the documents with a value of “Latte” for the “coffee” key in the mydb collection: 

> db.mydb.find({“coffee” : “Latte”})

The result of this command after executing the insertion from the previous section is:

{ “_id” : ObjectId(“4df75a03d30a7515a35f5942”), “coffee” : “Latte”, “price” : 4.95, “notes” : “customer wants room for milk” }

Note that querying on keys and values is case sensitive. If you instead used the following command…

> db.mydb.find({“COFFEE” : “latte”})

… the mydb collection would return no matching documents.

That covers MongoDB’s background information, basic inserting, and querying. Next post I’ll cover updating, deleting, capped collections, and a few other things.

[1] http://stackoverflow.com/questions/3046001/what-does-document-oriented-vs-key-value-mean-when-talking-about-mongodb-vs-ca

[2] http://blogs.neotechnology.com/emil/2009/11/nosql-scaling-to-size-and-scaling-to-complexity.html

[3] http://www.mongodb.org/display/DOCS/Drivers

[4] http://www.10gen.com/

[5] http://blog.mongodb.org/post/137788967/32-bit-limitations

[6] http://www.mongodb.org/display/DOCS/BSON

[7] http://www.mongodb.org/display/DOCS/Schema+Design#SchemaDesign-Embedvs.Reference