Week 1. Intro
Ghosh, Sukumar, and Safari, an O'Reilly Media Company. Distributed Systems, 2nd Edition (2014).
Chapter 1
Library link: https://onesearch.library.northeastern.edu/permalink/f/365rt0/NEU_ALMA51284197480001401
Additional resources
M. van Steen and A.S. Tanenbaum, Distributed Systems, 3rd ed., distributed-systems.net, 2017.
Chapter 1
Online link: https://www.distributed-systems.net/index.php/books/ds3/
Week 2. Concurrency
Processes and threads
Chapters 3.0, 3.1, and 3.3 Distributed Systems, 3rd edition, 2017, Maarten van Steen, Andrew S. Tanenbaum
C++ concurrency
Chapters 1 - 5, C++ Concurrency in Action," 2nd Edition, Williams, Anthony, Manning Publications
This book is accessible online through the library
This book is a good reference in general for C++ multi-threaded programming.
SEDA: an architecture for well-conditioned scalable Internet services, ACM Symposium on Operating System Principles, 2001, Matt Welsh (UC Berkeley, now at OctoML), David Culler (UC Berkeley) and Eric Brewer (UC Berkeley)
Additional resources
Why threads are a bad idea (for most purposes), USENIX Technical Conference, invited talk, 1995, John Ousterhout (Sun Microsystem Labs; now at Stanford)
Why events are a bad idea (for high-concurrency servers), USENIX Workshop on Hot Topics in Operating Systems, 2003, Rob von Behren (UC Berkeley, now at Google), Jeremy Condit (UC Berkeley, now at Google), and Eric Brewer (UC Berkeley)
Week 3 Communications
Network in general
Chapter 1, Computer Networks, 5th Edition by Bruce S. Davie; Larry L. Peterson, Published by Morgan Kaufmann, 2011 (available online through Northeastern library)
Socket programming
Beej’s Guide to Network Programming, https://beej.us/guide/bgnet/
RPC
Chapter 6.0-6.11, Kenneth P. Birman, Guide to Reliable Distributed Systems, 1st edition, Springer (available online through Northeastern library)
Birrell and Nelson, Implementing remote procedure call, ACM Transactions on Computer Systems, Vol. 2, No. 1, 1984
Logical clocks
Leslie Lamport. 1978. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 7 (July 1978)
Additional resources:
If you want to learn more about computer networks, you can read this book. It is a widely used textbook for computer networks course and covers a variety of topics in depth.
Computer Networks, 5th Edition by Bruce S. Davie; Larry L. Peterson, Published by Morgan Kaufmann, 2011 (available online through Northeastern library)
Friedemann Mattern, "Virtual Time and Global States of Distributed Systems", Workshop on Parallel and Distributed Algorithms, Elsevier, pp. 215–226, 1989
Colin J. Fidge, "Timestamps in Message-Passing Systems That Preserve the Partial Ordering". 11th Australian Computer Science Conference. pp. 56–66. 1988.
Week 4 Virtualization, containers, cloud, and data centers
Cloud computing
Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. A view of cloud computing. Commun. ACM 53, 4 (April 2010), 50–58, 2010.
Virtualization
Understanding Full Virtualization, Paravirtualization, and Hardware Assist
Cluster management
Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale cluster management at Google with Borg. In Proceedings of the Tenth European Conference on Computer Systems
Additional resources:
Data centers
Luiz André Barroso; Urs Hölzle; Parthasarathy Ranganathan; Margaret Martonosi, The Datacenter as a Computer: Designing Warehouse-Scale Machines, Third Edition , Morgan & Claypool, 2018.
(Available online through Northeastern library)
Virtualization
Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. 2003. Xen and the art of virtualization. In Proceedings of the nineteenth ACM symposium on Operating systems principles
Week 5 Replication
Replication
Chapter 16.0 – 16.5, Ghosh, Sukumar, and Safari, an O'Reilly Media Company. Distributed Systems, 2nd Edition (2014).
(Available online through Northeastern library)
Weak consistency
Doug terry, Replicated Data Consistency Explained Through Baseball, MSR Technical Report, October 2011
Chain-replication
Robbert van Renesse and Fred B. Schneider. 2004. Chain replication for supporting high throughput and availability. In Proceedings of the 6th conference on Symposium on Operating Systems Design and Implementation.
Additional resources
Douglas B. Terry, Vijayan Prabhakaran, Ramakrishna Kotla, Mahesh Balakrishnan, Marcos K. Aguilera, and Hussam Abu-Libdeh. 2013. Consistency-based service level agreements for cloud storage. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Alan Demers, Dan Greene, Carl Hauser, Wes Irish, John Larson, Scott Shenker, Howard Sturgis, Dan Swinehart, Doug Terry, Epidemic algorithms for replicated database maintenance, in 6th ACM Symposium on Principles of distributed computing (PODC), August 1987, pages 1--12.
https://dl.acm.org/doi/10.1145/41840.41841 (Available through Northeastern library: access ACM digital library)
Jeff Terrace and Michael J. Freedman. 2009. Object storage on CRAQ: high-throughput chain replication for read-mostly workloads. In Proceedings of the 2009 conference on USENIX Annual technical conference (USENIX'09). USENIX Association, USA, 11.
Week 6, 7 Consensus and distributed transactions
Consensus in general
Chapter 13.1 – 13.2, Ghosh, Sukumar, and Safari, an O'Reilly Media Company. Distributed Systems, 2nd Edition (2014).
(Available online through Northeastern library)
Paxos
Pages 438-449, M. van Steen and A.S. Tanenbaum, Distributed Systems, 3rd ed., distributed-systems.net, 2017.
(Online link: https://www.distributed-systems.net/index.php/books/ds3/ )
Raft
2PC/3PC
Chapter 10.3.0 - 10.3.2, Kenneth P. Birman, Guide to Reliable Distributed Systems, 1st edition, Springer
(Available online through Northeastern library)
Additional resources:
Leslie Lamport, Paxos made simple, 2001
Robbert Van Renesse and Deniz Altinbuken. 2015. Paxos Made Moderately Complex. ACM Comput. Surv. 47, 3, Article 42
Mike Burrows. 2006. The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the 7th symposium on Operating systems design and implementation
Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. 1985. Impossibility of distributed consensus with one faulty process. J. ACM 32, 2 (April 1985)
(Log in to ACM digital library through Northeastern library to access the article)
Week 8 NoSQL databases
CAP theorem
E. Brewer, "CAP twelve years later: How the "rules" have changed" in Computer, vol. 45, no. 02, pp. 23-29, 2012.
NoSQL vs SQL
Google Bigtable
Amazon Dynamo
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: amazon's highly available key-value store. SIGOPS Oper. Syst. Rev. 41, 6 (December 2007), 205–220.
(Log in to ACM digital library through Northeastern library to access the article)
Additional reading
D. Pritchett. BASE: An ACID Alternative. ACM Queue, July 28, 2008.
Google Files System
Week 9 In-memory systems
RAMCloud
John Ousterhout, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Seo Jin Park, Henry Qin, Mendel Rosenblum, Stephen Rumble, Ryan Stutsman, and Stephen Yang. 2015. The RAMCloud Storage System. ACM Trans. Comput. Syst. 33, 3, Article 7
DMA
Chapter 12.2 I/O Hardware, Galvin, Peter, Silberschatz, Abraham, and Gagne, Greg. Operating System Concepts Essentials. 1st ed. Wiley, 2010.
(Available through Northeastern library)
RDMA
FaRM: fast remote memory
Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, and Miguel Castro. 2014. FaRM: fast remote memory. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI’14)
Additional reading:
Week 10 Load balancing, caching, and contents delivery network
Load balancer
https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/elb-ug.pdf
CDN
https://www.imperva.com/learn/performance/cdn-guide/
Facebook's caching systems
Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. 2013. Scaling Memcache at Facebook. In Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Qi Huang, Ken Birman, Robbert van Renesse, Wyatt Lloyd, Sanjeev Kumar, and Harry C. Li. 2013. An analysis of Facebook photo caching. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Week 11 Data analytics
MapReduce
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (January 2008), 107–113.
https://research.google/pubs/pub62/
Stream processing
Tyler Akidau, Slava Chernyak, Reuven Lax, Streaming Systems, O'Reilly Media, Inc.
Chapter 1 and Chapter 10
(Available through NEU library)
Spark
Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, and Ion Stoica. 2016. Apache Spark: a unified engine for big data processing. Commun. ACM 59, 11 (November 2016), 56–65.
Additional resources
Spark RDD
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation (NSDI'12).
https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf
Spark streaming
Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized streams: fault-tolerant streaming computation at scale. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). Association for Computing Machinery, New York, NY, USA, 423–438.
https://cs.stanford.edu/~matei/papers/2013/sosp_spark_streaming.pdf
Week 13 Microservices, serverless computing, SDN, and blockchains
Microservices:
Sam Newman, Building Microservices, 2nd Edition, O'Reilly Media
Chapter 1
Serverless computing:
Jason Katzer, Learning Serverless, 2020, O'Reilly Media, Inc.
Chapter 1
Software defined networks:
Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson, Jennifer Rexford, Scott Shenker, and Jonathan Turner. 2008. OpenFlow: enabling innovation in campus networks. SIGCOMM Comput. Commun. Rev. 38, 2 (April 2008), 69–74.
http://ccr.sigcomm.org/online/files/p69-v38n2n-mckeown.pdf
Blockchains:
Satoshi Nakamoto, Bitcoin: A Peer-to-Peer Electronic Cash System
https://bitcoin.org/bitcoin.pdf
Additional reading:
Microservices
Chris Richardson, Microservices Patterns, 2018, Manning Publications
Chapter 1
SwitchKV
Xiaozhou Li, Raghav Sethi, Michael Kaminsky, David G. Andersen, and Michael J. Freedman. 2016. Be fast, cheap and in control with SwitchKV. In Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation (NSDI'16). USENIX Association, USA, 31–44.
NetCache
Xin Jin, Xiaozhou Li, Haoyu Zhang, Robert Soulé, Jeongkeun Lee, Nate Foster, Changhoon Kim, and Ion Stoica. 2017. NetCache: Balancing Key-Value Stores with Fast In-Network Caching. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP '17). Association for Computing Machinery, New York, NY, USA, 121–136.