__________________________

___________________________

WHEN SHALL I USE MONGODB?

Nicolas Herry
___________________________

2017/12/02

1 When shall I use MongoDB?
===========================

When to pick MongoDB over a relational database can be a tricky
question. Luckily, I'm here to help with a quick FAQ. Guaranteed
100% bias free.

1.1 Does MongoDB guarantee data persistence?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MongoDB is not ACID-compliant, which means that data can be
lost. You should only pick MongoDB when you intend to store data
you don't mind losing.

1.2 Does MongoDB scale well?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

With MongoDB, data is stored duplicated. This means that indexes,
when they are used, cannot be as effective as that of a
relational database. Also, since data is duplicated, this means
that databases tend to occupy more space on the disk than with a
relational database. However, MongoDB scales through sharding,
which means that data is spread across multiple servers to both
balance the load and provide some redundancy. So if scaling is
limited on one server, one can always add more servers to cope
with the load. As for writing, MongoDB offer two approaches: with
the MMAPv1 engine, it locks collections when a document is being
saved, disallowing concurrent writing to different
documents. This in part comes from the fact that MongoDB lacks
transaction isolation (ACIDity). The second option involves the
new default engine, WiredTiger, which relies on the optimistic
concurrency model, assuming very low data contention. When a
conflict occurs, transactions are rolled back and need to be run
again. This is fine if transactions deal with different pieces of
data; and very costly when multiple writes happen at the same
time on the same document.

1.3 Is MongoDB more flexible than old cranky relational databases?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MongoDB gives developers the possibility to store data as
documents, that is without any defined schema. In a collection,
documents can actually look very different. This means the shape
of the data doesn't have to be anticipated from the beginning,
and fields can be added later. However, as a result, there is no
way to alter a collection as a whole in one go: each record has
to be fetched, edited, and saved back in the database. On large
datasets, this can take hours. One also should think if their
scenario really implies going on production without a clear idea
of what the data looks like.

1.4 Does MongoDB makes the life easier for application developers?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Since MongoDB allows application developers to start working
without thinking about the schema or the data itself, things are
indeed easy. However, since the approach revolves only around
data storage and not data itself (shape, type, constraints,
etc.), developers have to write the validation part themselves,
on the application side. This means more code to write, maintain,
debug.

1.5 Is MongoDB the quickest engine with flexible, modern JSON data?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MongoDB makes JSON and binary JSON data its sole data format and
tries to be effective at storing and reading it back. However,
relational databases can also manage JSON and binary JSON data,
and can be pretty efficient with it. Actually, PostgreSQL
outperformed MongoDB is a series of benchmarks. Benchmarks being
benchmarks, one can consider that at the very least, it's not
clear which of MongoDB and PostgreSQL is the most
efficient. However, it is clear that with PostgreSQL, one also
gets to benefit from all the rest it has to offer: ACIDity, data
validation, indexes, etc.

1.6 To sum up
~~~~~~~~~~~~~

One should use MongoDB when they want to store a moderate
quantity of data they don't mind losing, don't know much about
and don't mind locking for basic maintenance tasks. It's stunning
how much cutting edge in 2017 resembles cutting edge in 1997,
when startups were trying to circumvent the shortcomings of
MyISAM by cobbling together piles of Perl code, ending up with
half-working data processing pipelines.