Class overview
In this course, students will learn the fundamentals behind large-scale systems used for data science. We will cover the issues involved in scaling up (to many processors) and out (to many nodes) parallelism in order to perform fast analyses on large datasets. These include locality and data representation, concurrency, distributed databases and systems. We will explore the details of existing and emerging data science platforms, including map-reduce and data analytics systems like Hadoop and Apache Spark, and systems for machine learning and deep learning.
Class meetings: MoWe 4:00PM - 5:15PM
Sep 6, 2022-Dec 12, 2022
Instructor: Hui Guan
TAs:
Nathan (kwanhong@umass.edu)
Qizheng Yang (qizhengyang@umass.edu)
Prerequisites:
This course is a MS-level CS course. And it is also open to senior undergraduates majoring in CS and PhDs. A solid systems background and substantial programming experience is required. Programming languages to be used would include C/C++/Java and Python. You are also expected to have a computer to work on programming assignment, quiz, and exams.
COMPSCI 377, COMPSCI 445.
Familiar with linux/unix-like systems.
Familiar with C/C++ or Java, and Python Programming.
Comfortable with reading research papers.
Credits: 3
Required Texts: This is an emerging topic so we will read and review recent technical papers, which represent the reading material for the exams. The course slides will be made available before exams. Here are some general background resources.
Operating Systems: Three Easy Pieces is a classical book on operating systems. It is a lot of material but you might want to go back to it if some concepts are not clear.
Distributed Systems is another classical reference book, this time on distributed systems. Like with the previous book, it is a lot of material but you can use it as a reference.
Pointers and Memory is a great writeup describing pointers, memory management (heap, stack) and how to write safe code. Absolutely recommended read, even just to refresh your knowledge.
The Red Book is an organized collection of classical papers covering various aspects of database management systems design.
Deep Learning book is a comprehensive material for the deep learning field.
Students are expected to attend the lectures and study the material presented during the lectures. They will also have to participate to the following activities:
Reviews: 10%
Homework: 15%
Quizzes: 15%
Projects: 40%
Midterm exam: 20%
The resulting grade in a 0/100 scale will be mapped to a grade using the following thresholds:
A: [90,100]
A-: [85,90)
B+: [80,85)
B: [75,80)
B-: [70,75)
C+: [65,70)
C: [60,65)
F: [0,60)
The University of Massachusetts Amherst is committed to providing an equal educational opportunity for all students. If you have a documented physical, psychological, or learning disability on file with Disability Services (DS), you may be eligible for reasonable academic accommodations to help you succeed in this course. If you have a documented disability that requires an accommodation, please notify me within the first two weeks of the semester so that we may make appropriate arrangements.
Since the integrity of the academic enterprise of any institution of higher education requires honesty in scholarship and research, academic honesty is required of all students at the University of Massachusetts Amherst. Academic dishonesty is prohibited in all programs of the University. Academic dishonesty includes but is not limited to: cheating, fabrication, plagiarism, and facilitating dishonesty. Appropriate sanctions may be imposed on any student who has committed an act of academic dishonesty. Instructors should take reasonable steps to address academic misconduct. Any person who has reason to believe that a student has committed academic dishonesty should bring such information to the attention of the appropriate course instructor as soon as possible. Instances of academic dishonesty not related to a specific course should be brought to the attention of the appropriate department Head or Chair. Since students are expected to be familiar with this policy and the commonly accepted standards of academic integrity, ignorance of such standards is not normally sufficient evidence of lack of intent (http://www.umass.edu/dean_students/codeofconduct/acadhonesty/ ).