Algorithms for Data Science


Class overview

Big Data brings us to interesting times and promises to revolutionize our society from business to government, from healthcare to academia. As we walk through this digitized age of exploded data, there is an increasing demand to develop unified toolkits for data processing and analysis. In this course our main goal is to rigorously study the mathematical foundation of big data processing, develop algorithms and learn how to analyze them. Specific Topics to be covered include (please check schedule for more details):

  • Clustering

  • Estimating Statistical Properties of Data

  • Data Streaming Algorithms

  • Near Neighbor Search

  • Data Compression

  • Parallel Algorithms

  • Learning Algorithms

  • Randomized Algorithms

Instructor: Barna Saha, Office Hours: After class on Tue.

Co-Instructor: Dominik Kempa (Postdoctoral Researcher), Lecture Responsibility (5 classes): Oct 20th - Nov 3rd, Office Hours: Wednesdays 11:00 am - noon over zoom.

Supporting Teaching Staff:

Han Feng (Graduate Student Instructor), Office Hours: Thursdays 5:00 - 6:00 pm over zoom.

Time & Venue: Time : 2:00-3:30 pm Tue/Thur over zoom. The zoom link will be emailed to the registered students. The first day of the class will be on August 27th. The students must attend the first day of class to remain enrolled in the course.

Piazza: All class related discussions will happen through piazza. Please sign up using the following link.

https://piazza.com/berkeley/fall2020/indeng290

Grading:

  • Homework (about 4) - 50%
    -- Will consist of mathematical problems and programming assignments.

  • Paper Presentation - 30%

  • Class Participation -20%

Books We will use reference materials from the following books. Both can be downloaded for free.

  • Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman and Jeff Ullman.

  • Foundations of Data Science, a book in preparation by Avrim Blum, John Hopcroft and Ravi Kannan

  • [For Probability Review] Probability and Computing: Randomized Algorithms and Probabilistic Analysis by Michael Mitzenmacher and Eli Upfal

Intended Audience: The courses is intended for upper-level undergraduate students and graduate students.

Prerequisite: All Students require proper background in algorithm design and basic probability. If you do not satisfy the prerequisite but want to take the course, please talk to the instructor.