DSC 206
Algorithms for Data Science
Tue / Thu: 8am -- 9:20am, FAH 1101
Welcome to DSC 206 in Winter 2024!
Updates:
Homework 3 grades have been published. Download the solution here.
Homework 4 has been updated again, there's a change in Problem 4 (c).
Homework 4 has been updated to change the initialization point in Problem 6.
Midtem grades have been published. Download the solution here.
Homework 3 has been released and due on Feb 29th 11:59 PM.
Homework 2 grades have been published.
Homework 2 solution has been posted.
Homework 2 has been updated to fix a typo in Problem 2.
Lecture 4 updated.
Homework 1 grades have been published and the solution has been posted.
Homework 2 has been released and due on Feb 8th 11:59 PM
Homework 1 has been updated, please download the new file.
Homework 1 has been released and due on Jan 25th 11:59 PM
Instructor:
Prof. Yusu Wang (Office: HDSI 446)
email: yusuwang@ucsd.edu
TA:
Yashi Shukla (email: yshukla@ucsd.edu)
Jesse He (email: jeh020@ucsd.edu)
Libin Zhu (email: l5zhu@ucsd.edu)
Lectures:
The lecture slides will also be posted below, as well as in Canvas.
Topic 1: Dimensionality Reduction
Lecture 1: Introduction and review of SVD. Lecture notes here.
Lecture 2: Best rank-k approximation via SVD, Power method. Lecture notes here.
Lecture 3: Power method. The HITS algorithm for page search. Lecture notes here.
Lecture 4: Markov chains, and Page rank algorithm. Lecture notes here.
Topic 2: Clustering
Lecture 5: Center-based Clustering. Lecture notes here.
Lecture 6: k-means, hierarchical, and spectral clustering. Lecture notes here.
Lecture 7: More on spectral clustering. Lecture notes here.
Topic 3: Finding Similar Items
Lecture 8: Similarity search: shingling+intro to miniHash. Lecture notes here.
Lecture 9: More on minHashing, intro to Locality Sensitive Hashing. Lecture notes here.
Lecture 10: Locality Sensitive Hashing. Lecture notes here.
Topic 4: Data Streaming
Lecture 11: Selection / filtering in a streaming manner here.
Lecture 12: Counting distinct elements here.
Lecture 13: Counting distinct elements, and heavy hitters here.
Lecture 14: Finish Heavy Hitter. Intro to Supervised learning here.
Topic 5: Supervised learning and online learning
Lecture 15: Perceptron algorithm: batch and online versions here.
Lecture 16: Support Vector Machine (SVM) here.
Topic 6: Optimization
Lecture 17: Convex functions and convex optimization here.
Lecture 18: Gradient Descent here.
Lecture 19: Stochastic Gradient Descent here.
Homeworks:
Homework 1: Download here (Updated) Deadline: 01/25/2024 11:59 PM Download Solution
Homework 2: Download here (Updated) Deadline: 02/08/2024 11:59 PM Download Solution
Homework 3: Download here Deadline: 02/29/2024 11:59 PM Download Solution
Homework 4: Download here (Updated) Deadline: 03/17/2024 11:59 PM Download Solution
Getting Started:
To get started in DSC 206, you'll need to set up accounts on a couple of websites.
Campuswire
We'll be using Campuswire as our course message board. Campuswire is like Piazza, but unlike Piazza, Campuswire does not sell student data to third parties. You should have received an invitation via email, but if not you should get in touch with a course staff member as soon as possible, as we'll be making all course announcements via Campuswire.
If you have a question about anything to do with the course — if you're stuck on a homework problem, want clarification on the logistics, or just have a general question about data science — you can make a post on Campuswire. We only ask that if your question includes some or all of an answer, please make your post private so that others cannot see it. You can also post anonymously if you would prefer.
Course staff will regularly check Campuswire and try to answer any questions that you have. You're also encouraged to answer a question asked by another student if you feel that you know the answer.
Gradescope
We'll be using Gradescope for homework submission and grading. Most of the assignments will be a mixture of math and coding, and the coding parts are usually autograded via Gradescope., You should have received an email invitation for Gradescope, but if not please let us know as soon as possible (preferably via Campuswire).
Canvas
For those of you who prefer it, lecture notes and homework will also be available in course Canvas.
Course Materials
No materials are required for this course; we'll uselectures as the main resource. That said, here are some books that you might find useful.
Avrim Blum, John Hopcroft, and Ravindran Kannan: Foundations of Data Science
Jure Leskovec, Anand Rajaraman, Jeff Ullman, Mining of Massive Datasets
Course Syllabus
You can download the syllabus here.
Office Hours:
Prof. Wang's Office Hour is Tuesdays 9:30am -- 10:30am PDT in HDSI 446.
Yashi Shukla's Office Hour is Monday 12pm - 1pm PDT on Zoom
Jesse He's Office Hour is Thursday 3pm - 4pm PDT in HDSI 432
Libin Zhu's Office Hour is Wednesday 4pm - 5pm PDT in HSDI 343
Exams:
There will be two exams.
Midterm: Feb 15 (Thu), 2024, in class (no regular class on that day)
Final: March 21, 2024, 8am -- 10am, (University Registra Scheduled final exam time).
Homework:
You are not allowed to use LLMs (e.g., chatGPT) to help with your homework. You are allowed to discuss with your classmates. However, it is important that you have to write your solutions independently.
Slip Days: Each student has 2 slip days for the entire quarter. Each slip day allows you to submit your homework 24 Hours later than the deadline. There are no other exceptions than the Slip Days.
Grading:
The final grading will be done based on the following breakdown of contribution.
Homework: 32%
Midterm: 30%
Final Exam: 38%
Announcements:
February 6th, Tuesday's class will be conducted via Zoom due to weather. Check CampusWire and Canvas for details