Information
Course: Introduction to Big Data Algorithms
Instructors: Emilio Cruciani, Aditi Dudeja
Teaching Assistant: Antonis Skarlatos
When: Tuesdays 12:00-13:30; Thursdays 10:00-11:00, from 05/03 to 27/06
Where: Room T02
Office Hours: By Appointment (by email)
Description
Modern datasets are massive. With more than 500 million tweets and over 8.5 billion Google searches per day, how can one make sense of all the data that is generated? This course will present an introduction to advanced algorithmic tools for the analysis of such large datasets.
Exams
The exam will be oral. Algorithms and proofs will be asked. Questions similar to the assignments will be asked. Submitting the assignments gives bonus points.
Assignments
Assignments will be given every Tuesday and can be found at this link. If you have answers about them or find mistakes, please email any of the instructors. Collaboration is allowed, but write-up should be done individually. Assignments must be sent before the next Tuesday to Antonis at antonis.skarlatos@plus.ac.at
Syllabus (Tentative)
5/3/2024: Course Introduction: Big Data, Pitfalls; Review of Probability: Probability Space, Union Bound, Independence, Conditional Probability
Slides/Notes: Course Introduction, Review of Probability
References: Chapters 1,2,3,4 [PC]
7/3/2024: Tutorial
12/3/2024: Review of Probability: Law of Total Probability, Random Variables, Expectation, Concentration Bounds (Markov, Chebyshev, Chernoff)
Slides/Notes: Review of Probability; Lecture Notes
References: Chapters 1,2,3,4 [PC]
14/3/2024: Tutorial
19/3/2024: Graph Sparsification: Karger's Min-Cut Algorithm
Slides/Notes: Lecture Notes
21/3/2024: NO LECTURE
EASTER BREAK
9/4/2024: Cut Sparsification: Benczur-Karger's Algorithm
11/4/2024: Tutorial
16/4/2024: Cut Sparsification: Benczur-Karger's Algorithm
18/4/2024: Tutorial
23/4/2024: Finding Similar Items; Preliminaries: Metrics; Euclidean, Jaccard, Hamming, Edit, Cosine distances
References: Chapter 3.1, 3.5 [MMDS]
25/4/2024 (Room T.03): Tutorial
30/4/2024: Hash functions (uniform, k-wise independent, universal); Shingles; Minhashing
References: Section 2.3, 2.3.1 [AMD]; Chapter 3.2, 3.3 [MMDS]
2/5/2024: Tutorial
7/5/2024: Finding Similar Items: Locality Sensitive Hashing
References: Chapter 3.4, 3.6, 3.7 [MMDS]
9/5/2024: NO LECTURE
14/5/2024:
16/5/2024: Tutorial
21/5/2024: NO LECTURE
23/5/2024: Tutorial
28/5/2024:
30/5/2024: NO LECTURE
4/6/2024:
6/6/2024: Tutorial
11/6/2024:
13/6/2024: Tutorial
18/6/2024:
20/6/2024: Tutorial
25/6/2024:
27/6/2024: Tutorial
References
Books:
[PC] Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis by M. Mitzenmacher, and E. Upfal (Cambridge University Press, 2nd edition, 2017)
[MMDS] Mining of Massive Datasets by J. Leskovec, A. Rajaraman, and J. Ullman (Cambridge University Press, 3rd edition, 2019) [website]
[AMD] Algorithms for Massive Data by Nicola Prezza (arXiv, version 12) [arXiv]