Cassandra vs MongoDB in 2020 is an in depth comparison of two of the most popular NoSQL databases that are so similar in functionality yet so different under the hood. Both are schema-less and promote a flexible data model meaning that you can add entities and attributes on the go. And when it comes to distributed, both Cassandra and MongoDB are ready to take on the world by allowing you to put data anywhere in the world.
With so much similarity, how does one choose the right database for their application? Read on and see which one of these two databases is the right choice.
Cassandra vs MongoDB: Key differences
Cassandra and MongoDB are both NoSQL databases or sometimes referred to as Not Only SQL. But as reality dictates, they are very different in architecture and access. They have very different strengths and value propositions – so any comparison has to be a nuanced one. Let us start with initial requirements and see what each one offers and lacks.
Here are the key differences between the two:
- Cassandra allows you to setup multiple masters vs MongoDB only allows one master.
- Cassandra data structure is more table like vs MongoDB structure is JSON documents.
- Cassandra has its own query language that resembles SQL vs MongoDB uses JSON formatting fragments.
- Cassandra is more rigid than MongoDB meaning that if you want greater flexibility, MongoDB is probably a better choice.
High Availability: MongoDB and Cassandra
High availability in MongoDB requires deploying ReplicaSets. A replica set is a group of mongod processes that maintain the same data set. Replica sets provide redundancy and high availability and are the basis for production ready systems. Maintaining copies of data in different data centers can increase data availability for distributed applications.
Cassandra implements a peer-to-peer model when it comes to a distributed database. Each node is capable of reading and writing and no node is more superior than the other. In other words, each node is a master and there are no slaves. Unlike MongoDB where there is a single master and many slaves, Cassandra is the opposite. This allows Cassandra to provide continuous availability and data consistency.
Write Speed – MongoDB vs Cassandra
Cassandra allows multiple masters to be setup. This allows continuous availability when a master fails, there is instantly another master to take over. When it comes to writing data, Cassandra employs parallel writes across its nodes. Also Cassandra employs the commit log which is nice because if a node goes down, it replays the commit log after it comes up and it will restore all the lost in cache writes.
With MongoDB, there can only be one master. When the master fails, one of the slaves becomes the new master. Writing data only occurs on the primary while all secondary servers are for reads only. Essentially write scalability is limited by the number of servers you have in your cluster.
Query Language difference in MongoDB vs Cassandra
The query language used in Cassandra is CQL and resembles SQL. With CQL, you can create keyspaces and tables, insert and query tables. With relational databases, you normally have both DDL and SQL. The DDL is used to maintain the database and provides commands to create and alter tables, create indexes and replace objects. Cassandra’s CQL combines both DDL with a query language to create, maintain and query Cassandra’s tables. With CQL 3, you can even query JSON data. You can also create user defined functions using CQL.
MongoDB does not have its own query language. Instead, it uses JSON formatting or fragments to query data.
Indexes
An index in Cassandra allows access to data using non-primary keys fields. The benefit is fast and efficient searching of fields that match a condition. Index data in Cassandra are maintained in a separate table and may consist of any number of columns.
An index in MongoDB is maintained in a data structure that can store collection data set in a form that is easy to traverse and will speed up a search. The use of indexes will prevent a total collection scan which is simply a linear search. Take care when choosing fields for indexing because if an update occurs on an indexed field, all indexes containing these fields will also be updated.
Deployment – MongoDB vs Cassandra
When the development effort is complete, it is time to deploy your application and database. With MongoDB, it starts with deciding whether you are going to host it yourself or by using a cloud host. If the latter is chosen, there are numerous cloud hosts to choose from that have a pre-installed MongoDB database. ClusterControl, RackSpace, ObjectControl and MongoStitch are all great choices. All provide fully-inclusive services and tools needed to support any enterprise level application and database.
When planning for a production ready environment, make sure there is sufficient RAM. It is critical for MongoDB to have a lot of RAM to store collections and indexes in. Without sufficient space, performance will suffer and searches will slow down to a complete halt.
Sharding is the process of partitioning data on multiple servers. It is not recommended to shard a new MongoDB database until the data sets become very large. Balancing of data across shards are automatic.
Cassandra is a highly available, highly performant NoSQL database and deployment can be quite tricky. Before deploying Cassandra, it is vital to understand how Cassandra works from an architectural view. It requires you to install Cassandra on each node, naming the cluster, getting the IP address of each node, determining which nodes are seed nodes, determining the snitch and replication strategy and determining the naming strategy for the rack. (Keep it simple such as RAC1, RAC2 for instance).
Other details involve opening the port on each node so that the nodes can communicate with one another. Each configuration file must be filled in with appropriate values such as the cluster name, the listen address, the rpc address to just name a few.
Use Cases: MongoDB vs Cassandra
At the time of writing, there are over 38,000 companies using MongoDB as their preferred database. The following use cases are only a sample of what companies are using MongoDB or Cassandra.
Netflix processes over 10 Million transactions per second. After failing to deliver constant uptime to its subscribers with Oracle, Netflix flipped their data model to Cassandra. Since switching, Netflix has never been down. Today, Netflix has all their non-program data such as movie rating, bookmarks and viewing histories stored in data sets.
Reddit is currently switching from Postgres to Cassandra after scaling its billion user community. Reddit uses its key-value store to maintain threads and discussions so the data stores are still in Postgres.
Lyft was using MongoDB to service its world-class app but earlier in 2019, Lyft was moving away from it. Before then, the company loved MongoDB and it provided the app with a fast, responsive web sharing access to its massive data set.
Shutterfly maintains a massive 6 billion image database and serves up almost 10,000 operations per second. It uses MongoDB to store all of its images using a JSON data structure and indexes it with tags, photo categories and captions.
Disclosure of Material Connection: Some of the links in the post above are “affiliate links.” This means if you click on the link and purchase the item, I will receive an affiliate commission. Regardless, I only recommend products or services I use personally and believe will add value to my readers.