Instructor: Dr. Alexandra (Sasha) Fedorova
Time: TTh 10:30-12
Location: Orchard Commons 3016
According to the ex-CEO of Google Eric Schmidt, "every two days now we create as much information as we did from the dawn of civilization up until 2003". Massive amounts of data at our disposal and exponential advances in computing technology enabling us to process it, ushered in the era of "big data". Businesses, healthcare industry and governments alike are keen on leveraging big data for improving their products and services, for discovering trends in disease proliferation, for personal health improvement and other goals. Big data is everywhere.
In this class we will learn about key technologies underlying the processing, storage and analysis of big data. We will begin with the review of core systems and principles in the areas of databases and distributed systems. We will then read about and discuss the most prominent big data technologies, such as Google's GFS, Spark, the RAD stack (Kafka, Storm, Druid, Zookeeper) and many others.
The students must have taken a course on data structures and algorithms and on computer systems (e.g., computer architecture or operating systems). It is recommended to have background in distributed systems and databases, but not required.
This course will require that students read research articles selected by the instructor on a weekly basis.We will follow a "partially flipped" approach to learning. The students will be responsible for reading one or two articles prior to the day on which these articles are discussed in class, and also take a short online quiz testing basic familiarity with the material. The online quiz grades will comprise a part of the final grade. In class one student will present the summary of the assigned material and then we will have a discussion, delving deeper into the paper’s material, understanding the technology, challenges and solutions. A significant portion of the course grade will be based on class participation. At the end of the term the students will complete a final project, which may be done in groups. The project may involve evaluation of a big data system or design of a new one.
This course will emphasize good principles for technical writing and presentations in conjunction wit the core subject matter.
- 20% micro-quizzes
- 30% in-class and take-home quizzes
- 30% final project (paper + presentation)
- 10% paper presentation
- 10% class participation