Posts Tagged ‘document database’

Playing with mongoDB

Friday, December 25th, 2009

It’s been a long time I hear about mongoDB and its awesomeness, so now’s the time to play with it! :)

In this article I’ll try to summarize what’s mongoDB, why it’s so cool, and how to start playing with it.

What is mongoDB?

mongoDB is one of the new NO-SQL databases. It means it’s not record-oriented like relational databases. Instead, it’s schema-free and collection oriented. And also, it’s a document database.

So what’s a document? It’s the first unit of data in a mongoDB. A document is simply an array of key-value pairs like this:


{
     name: "John Doe",
     age: 40
}

Note: we call this notation BSON, which stands for “Binary Serialized dOcument Notation”.

Back to the document. This “document” is similar to a record in a usual relational database. You can use Strings, Integers and many other data types, including arrays and other Documents. You have the ability to nest Documents, like, Person -> Children -> Toys.

So what’s a table then? Collections act more or like tables. A collection holds one or more documents.

And ultimately a database is a group of collections. Each collection has a unique name inside a database.

With the document stuff out of the way, let’s see why this database seems to cool.

Why is mongoDB so cool?

Working with a no-SQL database means you have several advantages over a traditional database. Here are some of them:

  • there is no schema. You don’t have to use Rails migrations for creating tables and columns. You simply start using them, and the database creates them on the fly. When using the gem mongomapper, for example, you simply declare the keys inside your model, and that’s it. Simple like that;
  • the data is formatted using a JSON-like format, giving greater flexibility and at the same time simplicity. New data types can be added, depending on how your format your Document before saving it into the database;
  • storage of binary files such as videos and photos on the database is possible and more important, efficient;

Basically, mongoDB bridges the gap between key-value stores (which are highly scalable) and traditional RDBMS (which provide structured schemas and powerful queries).

Other than these things, mongoDB also supports some types of database replication. It also offers auto-sharding, a feature that allows one to build a large horizontally scalable database cluster that can incorporate additional machines dinamically. It also supports map-reduce.

Installing mongoDB

Simply download the binaries. Unpack the tar.gz to /usr/local/bin/mongo and add ‘/usr/local/bin/mongo/bin’ to your PATH. Create the directory to which the database files will be saved to:

mkdir -p /data/db

Start the server with:

mongod

It listens for connections on port 27017 by default. Open another shell and start the mongo shell with:

mongo

Interesting enough, the mongo shell uses Javascript as its language :)

No need to create a database

One important note. You never have to create a database or collection.

The moment you try to access a database or collection, the underlying database and/or collection is created automatically.

Let’s add some data

In your mongo shell, type the following:


> a = { brand: "Toyota", model: "Corolla" };
> b = { name: "John", age: 40 }
> db.things.save(a);
> db.things.save(b);

Now to query all saved documents inside the collection things, run:

> db.things.find();
{ "_id" : ObjectId("4b33dee3844fab562308bb5f"), "brand" : "Toyota", "model" : "Corolla" }
{ "_id" : ObjectId("4b33dee5844fab562308bb60"), "name" : "John", "age" : 40 }

Important things to note:

  • the collection things is created automatically;
  • the documents inside a collection may have different scructures, as you can see these 2 documents have different fields;
  • upon being inserted into the database, objects are assigned an object ID in the field _id;
  • when you run these commands above, the object IDs will be different.

Adding more data

You noticed by now we are using Javascript inside the mongo client/shell. This means you can use something you already know to interact with mongoDB.


> for( var i = 1; i < 10; i++ ) db.things.save( { x:4, j:i } );
> db.things.find();
{ "_id" : ObjectId("4b33dee3844fab562308bb5f"), "brand" : "Toyota", "model" : "Corolla" }
{ "_id" : ObjectId("4b33dee5844fab562308bb60"), "name" : "John", "age" : 40 }
{ "_id" : ObjectId("4b33df5b844fab562308bb61"), "x" : 4, "j" : 1 }
{ "_id" : ObjectId("4b33df5b844fab562308bb62"), "x" : 4, "j" : 2 }
{ "_id" : ObjectId("4b33df5b844fab562308bb63"), "x" : 4, "j" : 3 }
{ "_id" : ObjectId("4b33df5b844fab562308bb64"), "x" : 4, "j" : 4 }
{ "_id" : ObjectId("4b33df5b844fab562308bb65"), "x" : 4, "j" : 5 }
{ "_id" : ObjectId("4b33df5b844fab562308bb66"), "x" : 4, "j" : 6 }
{ "_id" : ObjectId("4b33df5b844fab562308bb67"), "x" : 4, "j" : 7 }
{ "_id" : ObjectId("4b33df5b844fab562308bb68"), "x" : 4, "j" : 8 }
{ "_id" : ObjectId("4b33df5b844fab562308bb69"), "x" : 4, "j" : 9 }

Iterating the data using the cursor

When we ran db.things.find(); in the last example, the shell automatically showed all data from the collection. But if we assign a variable to the find() method, we can iterate the data it as we wish:


var cursor = db.things.find();
> cursor.next()
{
        "_id" : ObjectId("4b33dee3844fab562308bb5f"),
        "brand" : "Toyota",
        "model" : "Corolla"
}
> cursor.next()
{
        "_id" : ObjectId("4b33dee5844fab562308bb60"),
        "name" : "John",
        "age" : 40
}
> cursor.next()
{ "_id" : ObjectId("4b33df5b844fab562308bb61"), "x" : 4, "j" : 1 }
> cursor.next()
{ "_id" : ObjectId("4b33df5b844fab562308bb62"), "x" : 4, "j" : 2 }

Or we can use iterate programmatically:


> var cursor = db.things.find();
> while (cursor.hasNext()) {print (tojson(cursor.next())); }
{
	"_id" : ObjectId("4b33dee3844fab562308bb5f"),
	"brand" : "Toyota",
	"model" : "Corolla"
}
{
	"_id" : ObjectId("4b33dee5844fab562308bb60"),
	"name" : "John",
	"age" : 40
}
{ "_id" : ObjectId("4b33df5b844fab562308bb61"), "x" : 4, "j" : 1 }
{ "_id" : ObjectId("4b33df5b844fab562308bb62"), "x" : 4, "j" : 2 }
{ "_id" : ObjectId("4b33df5b844fab562308bb63"), "x" : 4, "j" : 3 }
{ "_id" : ObjectId("4b33df5b844fab562308bb64"), "x" : 4, "j" : 4 }
{ "_id" : ObjectId("4b33df5b844fab562308bb65"), "x" : 4, "j" : 5 }
{ "_id" : ObjectId("4b33df5b844fab562308bb66"), "x" : 4, "j" : 6 }
{ "_id" : ObjectId("4b33df5b844fab562308bb67"), "x" : 4, "j" : 7 }
{ "_id" : ObjectId("4b33df5b844fab562308bb68"), "x" : 4, "j" : 8 }
{ "_id" : ObjectId("4b33df5b844fab562308bb69"), "x" : 4, "j" : 9 }

Or use it like an array:


> var cursor = db.things.find();
> cursor[5]
{ "_id" : ObjectId("4b33df5b844fab562308bb64"), "x" : 4, "j" : 4 }

Important note: mongoDB cursors are not snapshots. For instance, if your cursor has 10 documents, and another user removes one of them from the collection, your cursor will return only 9 documents. You have to use explicit locking to prevent this.

How to query data

Let's see how to query the database for specific things we need to find.

The important thing to know is that queries, in a mongoDB database, are documents themselves. Let's have a look:

SELECT * FROM things WHERE model='Corolla';

will be:


> db.things.find(
     {model:'Corolla'}).forEach(
     function(x) { print (tojson(x));});
{
        "_id" : ObjectId("4b33dee3844fab562308bb5f"),
        "brand" : "Toyota",
        "model" : "Corolla"
}

Now, if we want to specify what fields we need, instead of running a "SELECT *" query, mongoDB lets you return "partial documents". To do this, you supply a second argument to the find() method, specifying what elements you need it to return:

SELECT brand FROM things WHERE model='Corolla';


> db.things.find(
     {model:'Corolla'}, {brand:true}).forEach(
     function(x) { print (tojson(x));});
{ "_id" : ObjectId("4b33dee3844fab562308bb5f"), "brand" : "Toyota" }

As you see, the query only returned the field 'brand' along with the object ID.

So, how to limit how many documents the database will return? Use the limit() method:

SELECT j FROM things WHERE x=4 LIMIT 2;


> db.things.find({x:4}, {j:true}).limit(2).forEach(
     function(x) { print (tojson(x));});
{ "_id" : ObjectId("4b33df5b844fab562308bb61"), "j" : 1 }
{ "_id" : ObjectId("4b33df5b844fab562308bb62"), "j" : 2 }

Or you can use the helper method findOne, if you want to return only 1 document:


> db.things.findOne({x:4})

{ "_id" : ObjectId("4b33de25ba7e40276b83ad53"), "x" : 4, "j" : 1 }

Note that in this case, all fields will be returned.

Conclusion

mongoDB is easy to learn. Its features are useful for web apps, specially having the need to build something scalable right from the beginning. Being a document database, it provides greater flexibility never before offered by traditional databases. mongoDB and its friends like CouchDB are definitely worth a look.

In the next article we'll see how to start playing with mongoDB in Rails. Stay tuned! :)