Project Proposal: Discord’s Storage Handling
Last Updated: November 23rd, 2023
Project Proposal: Discord’s Storage Handling
Last Updated: November 23rd, 2023
Overview:
Today Discord is one of the most popular and fast-growing messaging and VoIP platforms on the internet with over 500 million registered users. It hosts an estimated 850 billion messages sent on the platform daily. These messages vary from very few private DMs between users to massive servers with dozens of text channels receiving tens of thousands of messages a day. This is an incomprehensible scale of growth from when it was released just 8 years ago, far greater than the initial versions of Discord were ever created to handle. To support such scalability, the platform has undergone multiple mass migrations of its message storage handling and overall architecture changes since its release in 2015. All of these designs and changes which were independently designed and documented by Discord as it navigates the intricacies of data management, learning it to perform at a scale comparable to some of the largest data companies in the world while effectively addressing their specific needs.
Topics:
The goal of this research project is to explore the history of Discord's storage handling past and present and deep-dive into the underlying complexities of databases at scale. I plan to cover their original and past implementations of data storage, most notably MongoDB and Cassandra and their current architecture (ScyllaDB), the difference between these solutions and why these these database migrations were necessary. We will also cover the topic of mass data migration between designs, which allows Discord to efficiently transition storage for trillions of messages with virtually no downtime. Finally, we will analyze the current complex architectural design used by Discord today, and better understand how it allows them to achieve its immense storage and performance capabilities.
Deliverables:
This proposal was resubmitted on October 29th. As a result, many of the early deliverable due dates have been condensed to accommodate for missed time and allow for the Project Update to remain on schedule for November 6th. Progress updates will all be available in the Project Updates tab of the website. A schedule of estimated dates for upcoming deliverables is available in the Project Schedule tab.