We are surrounded by documents without realizing its ubiquity. We fill forms in places like hospitals, car rentals and job interviewes. These paper documents with flexible schema are the equivalent of document oriented databases. Document oriented DBs are a vital part of NoSQL movement and there are several popular implementations of it like CouchDB and MongoDB. Wikipedia defines MongoDB (from “humongous”) as an Open source Document Oriented DB. While traditional RDBMS store data as rows and columns, MongoDB uses the Binary JSON format (called BSON) to store data as an expresive document. The BSON format ensures dynamic schema and is suited for several general purpose applications.

Features: MongoDB documents are essentially key-value pairs of data expressed in JSON format. MongoDB documents map to various programming language data types and enbedded documents, arrays reduce needs for joins. With the rising popularity of HTML5 and Javascript frameworks, it is possible to directly display JSON data from MongoDB on the frontend without any processing. The schemaless nature allows flexibility of application design. MongoDB does away with joins and instead relies on de-normalization (aggregation) of information and achieves speed through indexing. MongoDB supports indexing of the embedded documents making reads/writes very fast. MongoDB has support for nearly all major programming languages through easy to use drivers.

Another killer feature of MongoDB is the High Availability, Replication and load balancing support from ground up. MongoDB uses sharding to horizontally partition the DB and it allows for adding new machines to a running DB. GridFS is a specification for storing large files which allows storing large objects like videos efficiently. MongoDB also has support for batch processing and aggregation though Map-Reduce. MongoDB supports two-dimensional geospatial indexes.

MongoDB Data model:

  • A Mongo system (standalone or replicated through sharding) holds a set of databases.
  • A database holds a set of collections.
  • A collection holds a set of documents.
  • A document is a set of fields.
  • A field is a key-value pair.
  • A key is a name (string).
  • A value is a:
    • basic type like string, integer, float, timestamp, binary, etc.,
    • a document, or
    • an array of values

Use cases:

  • Content management systems: MongoDB’s extensible schema and document oriented structure makes it lucrative for CMS.
  • Real time analytics: MongoDB is very good at real time updates, inserts and queries, making it an ideal candidate for real time analytics.
  • High volume traffic: Querying traditional DB might be too expensive in case of complex data. Developers can use MongoDB’s flexible schema to custom design a file system instead of using XML or flat files.
  • Mobile and Gaming: MongoDB’s geospatial indexing is a desirable feature making it useful for mobile/game development.
  • Ecommerce: Many sites use MongoDB in combination with RDBMS as a core of their infrastructure.
  • Archives and logs: Documents oriented nature of MongoDB is a good match for logging and archival of application data.

MongoDB is the latest buzzword in the DB world, and several startups (E.g. Foursquare) are actively using it. There are some limitations on it however. MongoDB document can’t handle document sizes more than 4MB. The recommended approach to de-normalize everything also sits at odd with the time-tested theory of SQL Databases. MongoDB is an interesting product, and whether it proves to be a robust solution, only time will tell.