BIG DATA
EECE 571B Fall 2015




Instructor: Dr. Alexandra (Sasha) Fedorova

Time: TTh 10:30-12
Location: Orchard Commons 3016

Course description:

According to the ex-CEO of Google Eric Schmidt, "every two days now we create as much information as we did from the dawn of civilization up until  2003". Massive amounts of data at our disposal and exponential advances in computing technology enabling us to process it, ushered in the era of "big data". Businesses, healthcare industry and governments alike are keen on leveraging big data for improving their products and services, for discovering trends in disease proliferation, for personal health improvement and other goals. Big data is everywhere.

In this class we will learn about key technologies underlying the processing, storage and analysis of big data. We will begin with the review of core systems and principles in the areas of databases and distributed systems. We will then read about and discuss the most prominent big data technologies, such as Google's GFS, Spark, the RAD stack (Kafka, Storm, Druid, Zookeeper) and many others.

The students must have taken a course on data structures and algorithms and on computer systems (e.g., computer architecture or operating systems). It is recommended to have background in distributed systems and databases, but not required.

Course organization:

This course will require that students read research articles selected by the instructor on a weekly basis.We will follow a "partially flipped" approach to learning. The students will be responsible for reading one or two articles prior to the day on which these articles are discussed in class, and also take a short online quiz testing basic familiarity with the material. The online quiz grades will comprise a part of the final grade. In class one student will present the summary of the assigned material and then we will have a discussion, delving deeper into the paper’s material, understanding the technology, challenges and solutions. A significant portion of the course grade will be based on class participation. At the end of the term the students will complete a final project, which may be done in groups. The project may involve evaluation of a big data system or design of a new one.

This course will emphasize good principles for technical writing and presentations in conjunction wit the core subject matter.

Grading formula

  • 20% micro-quizzes
  • 30% in-class and take-home quizzes
  • 30% final project (paper + presentation)
  • 10% paper presentation
  • 10% class participation

Course rules

  • Carefully read and understand the rules on micro-quizzes. You are responsible for following them. 
  • You are responsible for reading the material assigned for a given day before the class on that day unless otherwise indicated. 
  • Bring a name plate like this to every class.