Instructor: Ling Ren
TA: Sourav Das
Lectures: Wed/Fri 12:30 PM - 1:45 PM, 1109 Siebel Center for Computer Science.
Office hours: Wed 1:45 PM - 2:30 PM or by appointment, 4312 Siebel Center for Computer Science.
Fault-tolerant distributed computing is the paradigm where multiple computing nodes or participants jointly carry out a computational task to avoid any single point of failure or trusted central authority. A fault-tolerant distributed algorithm is expected to deliver its intended results or guarantees despite that a number of participants experience failures or behave maliciously. This course covers classic results and recent advances in fault-tolerant distributed algorithms. The course studies important problems in distributed algorithms and cryptography, including broadcast, agreement, state machine replication, blockchains, shared registers, and threshold cryptography; different models and assumptions regarding shared memory vs. message passing, message delivery and delay, fault pattern, cryptography, and setup; optimal fault-tolerant thresholds, efficiency bounds, and impossibility results for different problems under various combinations of settings; common building blocks and algorithm design techniques including cryptographic primitives, leader-based approach, randomization, quorum systems; influential and state-of-the-art algorithms such as Paxos, PBFT, Nakamoto's Bitcoin protocol.
40% Homework
30% Final Exam
30% Project (20% presentation + 10% report)
Each student will complete a course project. You are encouraged to work in teams of two. Project topics from your own research are encouraged as long as they are related to course contents. Alternatively, you can choose to do a literature study project. You are expected to finalize your project topic by March 9 and you are encouraged to discuss project ideas with me early in the semester. You will present your results or teach your surveyed topic (expect 10-15 minutes per person) and submit a written report towards the end of the semester.
Additional important papers
Time, Clocks, and the Ordering of Events in a Distributed System
Impossibility of Distributed Consensus with One Faulty Process
Consensus in the Presence of Partial Synchrony
The State Machine Approach: A Tutorial
Textbooks
Distributed Computing: Fundamentals, Simulations, and Advanced Topics 2nd Edition by Hagit Attiya and Jennifer Welch
Introduction to Reliable and Secure Distributed Programming 2nd Edition by Christian Cachin, Rachid Guerraoui, and Luís Rodrigues
Distributed Algorithms by Nancy Lynch
Online resources