Optimizing MongoDB: Lessons Learned at Localytics

This article is talking about how to optimizing MongoDB. I don’t understand what it says becuase I am not familiar with MongoDB. It shows that optimizing is inevitably involve the implementation of an abstraction layer. It is hard to learn, but it is rewarding.

Document Design

In term of SQL, it is table design. The main idea is to reduce the document size. It suggest to shorten the field’s name, use binary to store id instead of strings and remove extra index. This will make the document difficult to understand. We need better doucmentation and introduce a layer for translating the data.

Index tricks

The idea is to reduce index size by partition the index. It is useful when the field doesn’t have many possible values. A similar technique can be applied in MySQL. It is a trade off between the space required for the index and the cardinality of the index.

Fragmentation

It is similar to fragmentation on file system. When we delete a document, there will be a ‘hole’ in the disk. To reduce the ‘hole’, we may need to break down the new document into small pieces. A read action will cost more disk operation. MongoDB provide a repair command to fix this problem, but it is very slow.

Chunk migration

Chunks are a logical construct in MongoDB, the documentation are not neccessary to be stored in the same place. The solution is to migrate chunk to shard.

Using a better shard keys can avoid bad migrations. If we know our key distribution, we can pre-create chucks and assign shard keys. One suggestion is to include time in the shard key, but beware of write hotspot

Hardware optimization

The performance is very bad if the data is not in RAM

  • Do a read operation before update to warm up the doc in RAM
    • Read can run in parallel, but update hold a lock

Use a shard per core instead of shard per server

  • Overcome write locks when writes per second matter
    • As MongoDB use a big write lock
      The article is written on 2011, I am not sure this technique is still valid. As mongoDB seems have lots of change on write lock.

Conclusion

When we need to do optimization

  1. Having basic understanding on the implementation of abstraction layers
  2. Search related resource on optimization
  3. If we cannot understand why the optimization works, it means our understanding on the abstraction layer is not deep enough. Better to do more research on the implementation of abstraction layers
  4. After getting enough context and knowledge of the abstraction layers, we can try to apply optimization skills
  5. Review the performance afte optimization, go back to step 3 if the result is not good enough.

Reference

Optimizing MongoDB: Lessons Learned at Localytics