Posts Tagged ‘database scaling’

Reasons to use a no-sql database like mongoDB

Saturday, December 26th, 2009

In this article I’ll talk about reasons to use mongoDB and other no-sql databases.

Relational databases are often used in many web apps. But usually when the time comes to scale the app to a few millions users, you have to make choices on your architecture.

For instance, a common practice to handle high load is to put one or more of the most used tables in separate servers. In order to use this technique, a developer will have to resolve the issue of join queries. How do you query tables that aren’t in the same location and yet have high performance?

One solution is to denormalize data, thus duplicating content in all tables that may have a query that needs to know about it. Think of 2 tables: contacts and users. The tables are separated into 2 different servers. And you have a feature in your app, in which you show all contacts from a specific user. You need a join query, but you can’t use it. So you duplicate some of the fields from the contacts table, right inside your users table. For instance, you create a field called contact_names, in which you put only the names, of all contacts from that user, separated by commas. It’s a easy way to solve the problem, but it comes with a cost. You have to worry about syncing the contacts in all tables that know something about contacts.

Bottom line? You started developing your app with join queries, but at some point you had to give up on it.

So, if using a traditional database forces you to stop using some of its features somewhere down the road, why not start with a kind of database that avoids the things that are not scalable and sustainable in the long run?

In mongoDB a solution for this problem would be creating a document Contacts, and embed it inside the document Users. So, each user will have its contacts right there, inside each one of the User records. No need to use join queries.

However, there are times when you need to have a model that is connected to several others.

For example, let’s say you need to relate Contacts to several models such as Clients, Suppliers and Employees. So you create 4 collections: Clients, Suppliers, Employees and Contacts. You connect them all together via a db reference. This acts like a foreign key. But, this is not the mongoDB way to do things. Performance will penalized.

So the general question should always be “Why can’t I embed this document?“. Or even better: “Does this object merit its own collection, or rather should it embed in objects in other collections?“.

There are some general rules on when to embed, and when to reference (grabbed from mongodb website):

  • “First class” objects, that are at top level, typically have their own collection;
  • Line item detail objects typically are embedded;
  • Objects which follow an object modelling “contains” relationship should generally be embedded;
  • Many to many relationships are generally by reference;
  • Collections with only a few objects may safely exist as separate collections, as the whole collection is quickly cached in application server memory;
  • Embedded objects are harder to reference than “top level” objects in collections, as you cannot have a DBRef to an embedded object (at least not yet);
  • It is more difficult to get a system-level view for embedded objects. For example, it would be easier to query the top 100 scores across all students if Scores were not embedded;
  • If the amount of data to embed is huge (many megabytes), you may reach the limit on size of a single object;
  • If performance is an issue, embed;

The way I see it, you can still have more or less the best of both worlds: the flexibility of documents and the performance of embedded documents. And you still have a way to emulate foreign keys, like a relational database – but not without a penalty on performance. I don’t know how mongoDB and MySQL compare to each other in the long run, for the usual web app. It’d be cool if someone did some benchmarks on this subject.

Read more about mongoDB database schema design.