Algorithms and Analysis for Large-Scale Cloud Computing Systems
Time: Tuesday and Thursday 10:30 AM - 11:50 AM
Location: GHC 4303
Instructor: Weina Wang weinaw "at" cs "dot" cmu "dot" edu
Office hours: anytime, please send an email first
Prerequisite: undergraduate probability, mathematical maturity in general
Large-scale cloud computing systems are at the heart of today’s emerging technologies. A key challenge in advancing the state-of-the-art is to design systems with stringent performance guarantees such as high-throughput and low-delay. To tackle this challenge, it is important to understand how algorithms in various components of the system have an impact on the performance. This course introduces theories and techniques from applied probability and stochastic processes for building models and analyzing algorithms. The goal is to address questions such as: How should computation tasks be assigned to the large number of servers in a cloud computing system? How should the bandwidth in a datacenter network be allocated to achieve fast data transfer? How should data be distributed and how does that affect the efficiency of computation?
Topics covered include:
- Load-Balancing: Join-the-shortest-queue, power-of-d-choices
- Data transfer in datacenter networks: flow-level modeling, multi-commodity flow algorithms
- Datacenter switch: data packet dynamics, matching algorithms
- Data storage and caching: distributed storage, data locality
- Markov chains, Foster-Lyapunov theorem, queueing theory, mean-field analysis
After successfully completing this course, you will
- gain or improve your skills in stochastic modeling of modern computing systems;
- learn advanced tools for designing and analyzing algorithms for stochastic systems;
- have opportunities to identify and collaborate on new research problems.
- R. Srikant and Lei Ying. Communication networks: an optimization, control, and stochastic networks perspective. Cambridge University Press, 2013.
- Bruce Hajek. Random Processes for Engineers, Cambridge University Press, 2015. (Downloadable from Hajek's website)
- Robert Gallager's course notes. Downloadable on MIT OpenCourseWare. Putting these notes together we actually get his book.
- Mor Harchol-Balter. Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press, 2013.
9/4 Introduction: A datacenter from the modeling perspective
9/6 Discrete-time Markov chain (DTMC) (References: Chapter 3.3 of Srikant and Ying's book, Chapter 2 & 6 of Hajek's book, Chapter 5 of Gallager's book/notes)
9/11 Foster-Lyapunov stability criterion with application in load-balancing
9/13 Applying Foster-Lyapunov to the giant switch abstraction of datacenter networks
9/18 Moment bounds given by the drift
9/20 The drift method (I)
9/25 The drift method (II)
9/27 The drift method (III)
10/2 Continuous-time Markov chain (CTMC)
10/4 No class
10/9 Extending Foster-Lyapunov and the drift method to CTMC
10/11 Course project presentation
10/16 Course project presentation
10/18 Mean-field analysis: Cavity method and differential equations method
10/23 Mean-field analysis: Kurtz's theorem example
10/25 Mean-field analysis: Kurtz's theorem proof and more examples
10/30 Mean-field analysis: Interchange of limits, asymptotic independence
11/1 Mean-field analysis: asymptotic independence, neural network example
11/6 No class
11/8 Data management in datacenters: Data locality and data transfer
11/13 A connection-level model for data transfer, multi-commodity flow algorithms
11/14 (Makeup class) Open problems in modern large-scale computing systems
11/15 Course project presentation
11/20 Course project presentation
11/27 Course project presentation
11/29 Course project presentation
You will have homework every other week.
Course project: 40%
You will choose a topic of your interest and read up papers in the literature. I will suggest some topics and papers. But you are free to choose any topic that you find particularly interesting as long as it is related to this course. You can work by yourself or together with another student. Your team will submit a brief report every four weeks to document your progress, and a final report at the end of the semester. Of course you are also welcome to have discussions with me anytime. Your team will also give a 40 min presentation to show your findings. You are encouraged to identify and solve problems beyond the existing work in the literature -- there will be bonus in your grade and I am happy work with you if desired to help you submit your work to a conference.
- Academic integrity
- Homework: feel free to discuss with your fellow classmates. But you should write your own answers.
- Project: if you work together as a team of two, please indicate the contribution of each of you in the reports.
- CMU academic integrity policy
- Accommodations for students with disabilities: If you have a disability and require accommodations, please contact Catherine Getchell, Director of Disability Resources, 412-268-6121, email@example.com. If you have an accommodations letter from the Disability Resources office, I encourage you to discuss your accommodations and needs with me as early in the semester as possible. I will work with you to ensure that accommodations are provided as appropriate.
- Student wellness: As a student, you may experience a range of challenges that can interfere with learning, such as strained relationships, increased anxiety, substance use, feeling down, difficulty concentrating and/or lack of motivation. These mental health concerns or stressful events may diminish your academic performance and/or reduce your ability to participate in daily activities. CMU services are available, and treatment does work. You can learn more about confidential mental health services available on campus at: http://www.cmu.edu/counseling/. Support is always available (24/7) from Counseling and Psychological Services: 412-268-2922.