Symbol * indicates hard homeworks (i.e., you can still expect something similar in an exam). Symbols ** and *** indicate really hard, extremely hard homeworks (I will not put something similar in an exam), you should try to solve them just for fun. Homeworks are not graded.
This year 2024/2025 the streaming and recording of lectures is not granted. The regulations of Sapienza explicitly say that lectures are physical and must be done in presence. To help students that work I made available at this link the recording of the lectures of the year 2021/2022.
Material to read: Main Book (1, 2.1 and 2.2. 2.4.1-2.4.2), H. Attiya, J.L. Welch. 2004 (2.1).
Key concepts: System Model, Processes, Uniqueness of the identifies, Local View, Delivery event vs Message creation, Asynchronous System, indistinguishability of the views. Fairness of a scheduler.
Model 2: Failures (slides)
Links and Abstractions: Safety and Liveness. Fair-Lossy (slides)
Links: Stubborn, Perfect and FIFO LINK (slides)
Material to read: Main Book (1, 2.1 and 2.2. 2.4.1-2.4.2), H. Attiya, J.L. Welch. 2004 (2.1).
Key concepts: Failures: Crash Stop vs Byzantine, Abstraction and properties; Safety and Liveness. The Fair-Lossy Link, The Stubborn and The Perfect link
Lecture syllabus:
Links: Perfect Formal Proof and FIFO LINK (slides)
Time 1 : Asynchronous, Synchronous, Partially (or Eventually) synchronous. (slides);
Time 1.2: Importance of time, Physical clocks limitation, UTC, Internal and external synchronization (slides);
Time 1.3: Synchronization in complete graphs: Christian (slides);
Exercises on Links and FIFO Link (slides).
Material to read: H. Attiya, J.L. Welch. 2004 (2.1). Time 1 is in the Main book (2.5); Topics of Time 1.2-1.3 can be found on Maarten van Steen et al. 2008 (Chapter 6.1 - the book is free)
Key concepts: Synchronous and Eventually-synchronous systems, physical clock drift, Internal and External synchronization, Christian and Berkley, NTP (for us it is just a hierarchy of servers that use christian-like approach), Local Skew vs Global Skew.
Optional reads: A detailed description of Berkley is here;
Lecture syllabus:
Time 1.3: Synchronization in complete graphs: Berkley, NTP (slides);
Time 2: Happened-before, Logical (also Scalar) and Vector Clocks. (slides)
Material to read: An alternative explanation of logical and vector clocks is here. Logical and vector clocks can be found on Maarten van Steen et al. 2008 (Chapter 6.2. -the book is free) Chapter 6.2 contains also a description of Lamport's ME. The main book describes also logical clock and happened-before in Chapter 2.5.1 and Chapter 3 Causal Broadcast.
Optional Material: Leslie Lamport on logical clock: https://www.youtube.com/watch?v=gY9VwiPTa60 Relativity and causality violation: https://plus.maths.org/content/violating-causality. Original paper in which Lamport proposed the logical clock: here.
Key concepts: Happened-before relationship, Logical Clock, Vector Clocks, Lamport's ME.
FIFO Link - Algorithm.
Time 2 - An application of logical clocks: Lamport's Mutex. (slides)
Exercises on Links and FIFO Link (slides).
Material to read: Chapter 6.2 of Maarten van Steen et al. 2008 (Chapter 6.2. -the book is free) contains a description of Lamport's ME.
Key concepts: Lamport's ME.
Time 3 - Encapsulating timing assumption with Failures Detectors: The Perfect Failure Detector P (slides).
Time 3 - The models of Fail-stop, Fail-noisy and Fail-silent and their relationship with Synchronous, Eventually Synchronous and Asynchronous. (slides).
Material to read: Failure Detectors and Leader Electors are in Main Book (Chapter 2.6).
Key concepts: Failure Detectors, P (the algorithm exclude on timeout and the round-based one), Round-based computation and its equivalency with synchrony.
Optional Material: The Splendors and Miseries of Rounds, Adam Shimi, https://dl.acm.org/citation.cfm?id=3364635 (access from Sapienza or using the proxy). Original paper of Chandra and Toueg on Failure Detector (here). The round model for synchrony is folklore and can be found in multiple books (e.g., Distributed Computing - H. Attiya, J.L. Welch. 2004 , pp. 12-13)
Please note that the room is different from the usual one (map with the building)
Updated slides: some students pointed out that the slides I showed in the room were slightly different. Please find here the PDF version of the slides actually presented (Time2, Time3)
Time 3 - Encapsulating timing assumption with Failures Detectors: The eventually perfect Failure Detector Diamond P (slides).
2. Time 3 - The leader election (LE) primitives: Perfect LE and Eventual LE (Omega). (slides).
3. Time 3 - The models of Fail-stop, Fail-noisy and Fail-silent and their relationship with Synchronous, Eventually Synchronous and Asynchronous. (slides).
4. Time 3 - Applications of the FD and LE: making lamport's ME Fault-tolerant and a really simple ME algorithm using LE. (slides).
Material to read: Failure Detectors and Leader Electors are in Main Book (Chapter 2.6).
Key concepts: Eventually perfect Failure Detector, LE, and Eventual LE. Fail-stop, Fail-noisy and Fail-silent. Applications of FD and LE.
Broadcast 1- Best Effort Broadcast and Regular Reliable Broadcast (slides)
Broadcast: Uniform Reliable Broadcast (Majority-ack) Impossibility of uniform broadcast with n/2 or more failures (slides)
Material to read: Broadcast is in Main Book (Chapter 3.1-3.4). Complexity measures: messages and communication steps; are in Main Book (Chapter 2.7.4).
Key concepts: BEB Broadcast, Regular Reliable Broadcast: Lazy and Eager, Uniform Realiable Broadcast: specification. Complexity measures: message complexity and communication steps (also called message delays).
Broadcast: Uniform Reliable Broadcast (Majority-ack) Impossibility of uniform broadcast with n/2 or more failures (slides)
Broadcast 2: Uniform Reliable Broadcast (Quorum Based) (slides).
Ordered Broadcasts: FIFO and Causal Broadcast (slides)
Exercises (slide). ME Algorithm with a LE and PF (solution).
Leader Election on a Ring with unkown processes and IDs unkown (exercise Q1 of the slide, original paper: https://dl.acm.org/doi/pdf/10.1145/359104.359108) Algorithm seen in class: pdf.
Material to read: Broadcast is in Main Book (Chapter 3.1-3.4), Quorums in Main Book (Chapter 2.7.3). Ordered Broadcast FIFO and Casual are in Main Book (Chapter 3.9).
Key concepts: The impossibility proof of URB, FIFO Broadcast, Causal Broadcast, Orthogonality of the ordering concept and reliability properties.
Ordered Broadcasts: Causal Broadcast (slides)
Shared Memory: What is Shared Register? Register Semantic, Regular Consistency: (1,N) with fail-stop (read one write all) and fail silent (majority voting) (slide)
Material to read: roadcast is in Main Book (Chapter 3.1-3.4), Quorums in Main Book (Chapter 2.7.3). Ordered Broadcast FIFO and Casual are in Main Book (Chapter 3.9).
Key concepts: The impossibility proof of URB, Causal Broadcast, Orthogonality of the ordering concept and reliability properties.
Material to read: Registers are in Main Book (Chapter 4.1,4.2).
Optional Reads: The book of H. Attiya, J.L. Welch. 2004 has a section on shared memories (Chapters 9.1-3, 10.1-3). The book of Van Steen et al. (Chapter 7) describes a panoply of additional consistency properties.
Key concepts: Waiting Causal Brodacast, Shared Memories, Register definitions, Consistency properties (Regular), Regular register (1,N), Fail-stop.
Regular Consistency: (1,N) with fail-stop (read one write all) and fail silent (majority voting) (slide)
Register Semantic: Sequential and Atomic: (slides)
Material to read: Registers are in Main Book (Chapter 4.1,4.2).
Optional Reads: Interesting paper on consistency properties (paper). Website with a map of consistency properties (map)
Atomic Registers algorithm using regular registers: (1,1)-Atomic Register (slides)
Atomic Registers algorithms for (1,N)-Atomic Register using (1,N Regular) (slides)
Atomic Registers algorithms for (1,N)-Atomic Register using message assing (slides)]
Material to read: Atomic Registers are in Main Book (Chapter 4.3,4.4).
Optional Reads: Interesting paper on consistency properties (paper). Website with a map of consistency properties (map)
Atomic Registers algorithms for (1,N)-Atomic Register using message passing without P (slides)]
Synchronous Consensus 1: Hierarchical Consensus (slide)
Material to read: Consensus in Main Book (Chapter 5.1-5.2).
Optional Reads: The book of H. Attiya, J.L. Welch. 2004 shows a lower bound on the step complexity of regular consensus in synchronous systems (Chapters 5.1.4).
Key concepts: Regular vs Uniform Consensus definitions. Hierarchical Consensus algorithms, uniform and regular variants.
Lecture syllabus.
Synchronous Consensus 1: Hierarchical Consensus (slide)
Synchronous Consensus 2: Uniform Hierarchical Consensus (slide)
Material to read: Consensus in Main Book (Chapter 5.1-5.2).
Optional Reads: The book of H. Attiya, J.L. Welch. 2004 shows a lower bound on the step complexity of regular consensus in synchronous systems (Chapters 5.1.4).
Key concepts: Regular vs Uniform Consensus definitions. Hierarchical Consensus algorithms, uniform and regular variants.
FLP: impossibility of solving consensus in asynchronous systems (slide)
Consensus 3: Paxos consensus algorithm in eventually-synchronous systems: Explanation of paxos following paxos made simple. (slide)
Material to read: Paxos is not in the main book. The lecture will follow the structure of the paper ``Paxos made simple" of Leslie Lamport (link) another good paper is this one (link). The difficulties of implementing paxos are described in this paper (link).
Optional reads: here there is a video of Leslie Lamport explaining Paxos (https://www.youtube.com/watch?v=tw3gsBms-f8). The video is quite difficult, probably is better to also read this paper (link).
Key concepts: Paxos. Nb: People say that Paxos is notoriously difficult to be understood. For the above, I strongly suggest to read Lamport's paper ``Paxos made simple".
Consensus 3-bis: Paxos consensus algorithm in eventually-synchronous systems: Explanation of paxos using the pseducode (from the paper A Simpler Proof for Paxos and Fast Paxos) , examples on paxos, difficulties of implementing paxos on real systems. (slide)
Exercises on paxos (slide)
Total Order Broadcast. (slides)
Lecture syllabus.
Software Replication - Motivations (slides)
Software Replication - Consistency criteria: linearizability (slides)
Software Replication - Passive (also called primary backup) (slides)
Software Replication - Active (slides)
Exercises
Material to read: Replication is not in the main book (the content of the slides is enough).
Key concepts: Primary-backup vs Active-replication, Replicated State Machine.
Additional reads: Rachid Guerraoui and André Schiper: “Fault-Tolerance by Replication in Distributed Systems”. In Proceedings of the 1996 Ada-Europe International Conference on Reliable Software Technologies (Ada-Europe '96).
Software Replication in Eventually- Synchronous systems: Raft (slides).
An analysis of Raft in the real world.
Key concepts: Activate Replication, RAFT
Material to read: For Raft there is the original paper (here).
Optional reads: A recent example (11/2021) of a liveness stall in which a lot of money has been lost due to Raft going south: https://blog.cloudflare.com/a-byzantine-failure-in-the-real-world/
Optional material: Lecture on raft by the authors (youtube)
Students are encouraged to watch the exercise videos below. They should pause each video, try to solve the exercise on their own, and only after spending an adequate amount of effort attempting the solution should they resume the video to see the answer
Exercises: Simulated exam video with the text and the solution: https://drive.google.com/file/d/1hpNHkCq4H5IOxDC9WWKivfcJHn0Ngexj/view?usp=drive_link
Other exercises:
https://drive.google.com/file/d/17X-stfK9l1bH_3ZjHl7P0wZmTWHphgA4/view?usp=drive_link