[DEC 9] Project Report Released
The Project Report is posted and can be found in the Project Report tab on the website.
[DEC 9] Project Report Released
The Project Report is posted and can be found in the Project Report tab on the website.
[DEC 4] Project Demo Released
The Project Demo is posted and can be found in the Project Demo tab on the website.
[NOV 20] Bi-weekly Project Update
After my mid-term project update, I was given feedback to try to strengthen the relevance of my topic to scope it more to multimedia systems. I did some research on how Discord stores different media files but did not find the answers I was looking for. However, between my limited findings in my research and some investigating of the application itself, I suspect that I have a relatively strong understanding as to how Discord handles its multimedia storage.
It is known that Discord is a Google Cloud customer. This can be shown through this case study along with the response from its attachment endpoint. Discord has also stated in its storing of billions of messages blog post that one of their database requirements which led them to choosing Cassandra is that it is not a blob store. As such it seems reasonable to assume that they do not store their images directly in blob format. Furthermore, when analyzing images and messages within the application, you are able to copy and view a link to a message and multimedia files. Standard messages are shown in the following format:
discord.com/channels/<STRING OF NUMBERS #1>/<STRING OF NUMBERS #2>
Upon simple inspection, this clearly represents the two known key parts of their database key when filtering messages - the channel ID and message ID. However, attachments such as images and videos are shown in the following format:
cdn.discordapp.com/attachments/<STRING OF NUMBERS #1>/<STRING OF NUMBERS #2>/<FILE NAME>
Immediately, it is clear that unlike messages, Discord uses a CDN when dealing with attachments - in this case they use CloudFlare. As mentioned before, it is clear that this endpoint points towards Google Cloud. This leads me to believe that their multimedia files are stored and accessed via a similar three-part key partition within a Google Cloud storage.
The question still remains how does discord store and read these messages though. In their blog posts, they show snippets of CQL statements used, including their messages table which shows content being stored as text. In discord, when sending a message you can enter the direct link of supported multimedia file types, and rather than displaying the link, it will showcase the multimedia content itself such as an image or gif. This applies to both media hosted by Discord's storage/CDN and external files. This must require discord to parse messages prior to displaying them within the application, allowing it to replace the text with the file. Since Discord images are read through text-based URLs, it seems a reasonable conclusion that when saving messages, the reverse of this process is also followed. Newly attached media within messages sent are more than likely stored separately within their Google Cloud storage, with the message being saved given the necessary information as text to be able to point to the media file (e.g. a direct link or some means of partition key) so that when it is read from storage as text, it can match back to the corresponding multimedia file and shown upon loading the message within the application.
[NOV 6] Mid-Term Project Update
Brief Coverage Since Previous Update:
It has only been a few days since my previous update due to changing topics but I have since made some minor changes to my topic to put more emphasis on the upscaling of Databases as demonstrated through Discords progression and less of an emphasis on Discord from a biographical view. I briefly did a bit more research and discovered that ScyllaDB which is currently used by Discord is used by many giant corporations such as Comcast, Epic Games, Zillow, Opera GX and many more. I look forward into further technical research of the performance of ScyllaDB to see how it provides such impressive Latencies and Scalability.
Proposal and Website:
The proposal has been updated today (November 6th) to adjust the overall scope of the project slightly. The website has also aesthetically improved been updated accordingly, primarily in the Home page where it focuses less on content from the proposal.
Progress With Regards to the Schedule:
Despite changing project topics last week, I am now back on schedule with this Mid-term project update. From this point forward, I expect to be able to meet all of the upcoming deadlines without issue.
Technical Challenges: Met, Solved, Remaining:
One minor technical challenge I encountered with my initial project proposal was that much of what I hoped to research and accomplish had already been achieved in the two Discord blog posts from my previous update. As a result of this, I decided to slightly shift the focus of my project as mentioned above. I have done limited technical research of my topic as of this update so I have not encountered any technical updates in terms of my project yet, though I do not expect to have many due to the abundance of available information online on the topic.
Additional Help Needed If Any:
Overall, I am currently satisfied with the state and scope of my project and do not require any help.
Adjustment Requested to the Proposal:
[UPDATE NOV 20]
I was asked by the professor to strengthen the relevance of my topic to scope it more to multimedia systems (i.e. how different media types are stored rather than generic "messages"). I am unsure how much information discord has available on the storage of specific media types, though I will try to provide a better idea in my update above to the best of my abilities.
[NOV 2] Biweekly Update
My primary objective of this project is to find information about Discords storage architectures. I was uncertain of how difficult of a task this would be to accomplish, but much to my surprise Discord is an extremely transparent company. The blog section of their website is very well maintained and even has a section dedicated to engineering.
Among these blog posts, I found two posts that were more informative and helpful than I ever anticipated. There are two blog posts, first in January 2017, and secondly very recently in March 2023 which go in depth about their journey of migrating from their previous data storing methodology to a newer and move improved data handling solution. Both blog posts provide insight in many of the troubles they previously experienced, and how they were able to overcome such issues through different data storage technologies. It provided an excellent baseline for my research. It was particularly interesting having two blog posts to compare. The initial post talks about the future short-term and long-term plans for Discords data handling solutions, while this years blog post follows up very well on that topic, discussing many of the related successes and failures experienced throughout the years.
Having now found ample Information on much of the history and technical challenges and solutions of Discord. I hope to do further research into better understanding the technical interworking of Cassandra and ScyllaDB, learning how Scylla is able to provide Discord such improvements in performance and latency over Cassandra, likewise with Cassandra and MongoDB.
2017 Blog Post - 2023 Blog Post
[OCT 29] Changed Project Topics / New Project Proposal
My original topic idea for the course was The Evolution of Computer Graphical Performance. After considering more closely what I wanted my project to cover, I determined the scope was a bit large and that covering a reasonable scope of the topic would be quite difficult to accomplish. After browsing the ideas of other classmates, I concluded I wanted to find a new topic with a much narrower scope that would allow me to better focus more in-depth on a smaller scope. I settled on the idea of researching how the popular communication platform Discord manages to handle its data storage. As a long-term Discord user, I remember hearing in the past about Discord making major changes to its systems and decided to research further on this topic. The new Project Proposal is now completed and is available under the Project Proposal tab.