COMPSCI  532 -

Systems for Data Science

Instructor: Peter F. Klemperer, pklemperer@umass.edu

Course Overview 

In this course, students will learn the fundamentals behind large-scale systems in the context of data science. We will cover the issues involved in scaling up (to many processors) and out (to many nodes) parallelism in order to perform fast analyses on large datasets. These include locality and data representation, concurrency, distributed databases and systems, performance analysis and understanding. We will explore the details of existing and emerging data science platforms, including MapReduce-Hadoop, Spark, and more. This course counts as a CS Elective for the CS Major. Undergraduate Prerequisite: COMPSC 377 and COMPSCI 445. 3 credits.

Required Text 

See reading list in the syllabus.