MongoDB - Some Thoughts on Sharding

Post date: Jan 19, 2014 9:31:36 AM

Choosing a Sharding Key

It is vital to choose from the beginning the right sharding key because once chosen, you cannot change it anymore. Since the sharding principle of MongoDB is to consider several dictinct, not overlapping ranges of data within a shard, it is important to add a key that reflects optimally data growth trends and its associated

access patterns pertaining to your specific application. The driving thought under this important decision is to void the creation of hot spots within any single shard.

What to consider as shard key: preferable would be a compound key based on a coarsely ascending field and a highly searchable field in your application. The searchable field should have a decent cardinality. A good example would be a combination between month and user ID.

What to avoid as shard key: low cardinality fields (those having a small set of possible values), strictly ascending fields (such as classical DB sequences or timestamps) and random keys. The rationality behind this recommendation is usually based on the potential of those sharding keys to create hot spots but there might be some cases when you can control the amount of data and such types of sharding keys might make sense.

Are There Any Limits?

When sharding an existent collection that already contains data, you need to consider the creation of an index on the sharding key at first. This index is created

automatically if and only if the collection that you are sharding is new.

If in your existing collection you have more than 256 GB (sometimes 400 GB) then the initial sharding process might leave you with a single chunk at first. Only when more documents are being added, multiple shards might be created and then moved. This will take a while. The exact boundaries for these collection sizes are a function of the document and shard sizes, respectively.

Scaling Out

A shard represents a collection of servers (a replica set). Scaling one shard is to be done at replica set level where the number of servers can be increased or decreased as usually inside MongoDB replica sets.

Config servers are better run as three in order to guarantee better availability and failover.

Performance Considerations

Sharding involves the loading of the sharding key based index in memory and the inspection of the entire collection to be sharded. This puts a strain on resources (CPU and memory) which means that such an operation should be run when there is less activity on the MongoDB cluster.

Balancing shards might put a strain on your RAM memory so it is advisable to add shards early and often, continously and not when you are approaching RAM boundaries. Normally balancing triggered by shard addition or removel should not affect normal traffic if done in a timely manner, on a regular basis. RAM limitations will cause disk seeks on both the application and balancing process sides which might increase response times.

Sharding collections of small documents is more expensive and slower as sharding collections of large documents.

Backup & Restore

Taking backups for individual shards and restoring them in the event of specific shard failures might cause missing or duplicate information because of on going

migrations that might have taken place while the backup was ongoing. To control the process it is advisable to check the config servers and the associated chunk

collections is order to get to know the association between shards and their corresponding chunks.

If a whole cluster is to be backed up then the balancer has to be turned off in order to avoid chunk migration and then an fsync while blocking and snapshoting slaves needs to be performed. Restoration from the whole cluster back up involves unblocking slaves and restarting the balancer.

Another important concern for backup and restore is related to the config servers which should always be run in a redundant configuration and their backups are

usually performed quickly since they do not store large amounts of data.

Some Aspects on Interacting With A Cluster

Counting documents in a sharded collection might not lead to accurate results if this operation is issued during migration of chunks from one shard to another.

Uniqueness can be guaranteed if and only if all shards are locked when writing and as a consequence of this only the shard key can ensure uniqueness. Updating single

items is subject to the restriction above, the sharding key needs to be supplied in order to identify the document which is to be updated.

Map-Reduce jobs are performed usually faster than on a single server but they are not meant for real time computations.

Asynchronous communication is advisable both on the input and on the output stream to a MongoDB cluster both for smoothing traffic bursts and to handle planned or unplanned outages. Amazon SQS and even a MongoDB capped collection (though stored on another MongoDB instance than the cluster it protects) are recommended solutions.

Handling Failover

Whole shard unavailability causes errors when reading or writing. Handle them gracefully by either returning partial results or performing targeted queries to available shards. In order to minimize the probability of losing entire shards, set up replica sets for a shard in an intelligent manner for handling failover.

If one config server goes down, even in the context of redundant setup, cluster configuration cannot be changed. Although normally the impact of such an event on cluster operation is not significant, no configuration changes at cluster level can be performed while the server is down and if chunk migrations take place the overhead associated might be noticeable because of the performance impact associated with subsequent tries of copying chunks from one shard to another.

The mongos stateless process represents the communication bridge between any application level and a MongoDB sharded cluster setup. It is advisable to consider the setup of redundant mongos in order to increase availability and to prepare for graceful failover and at application level to specify a list of mongos process instances to connect to so that the specific language Mongo driver at application level has multiple chances to handle this failover.