Data Mining

CS 6720, Jan - May, 2020

Computer Science and Engineering

Indian Institute of Technology Madras

Overview

This course on data mining will seek to blend two aspects of data mining. On the one hand, we will focus on some established data mining problems & techniques. Currently, I plan to teach link analysis (e.g., PageRank), clustering, and social network analysis. On the other hand, we will also seek to bring out the various computational models like the classical RAM model, streaming, Massively Parallel Computation, k-machine model, and other distributed computing models.

This course will err on the side of fewer topics that we study in depth than attempting any form of comprehensive coverage. There will be an emphasis on algorithm design and mathematical analysis.

Prerequisites.

You must have be comfortable with programming, algorithms, and data structures. Since this course will emphasize mathematical analysis, you must be comfortable with proof techniques (esp., in the context of algorithm design) and probability theory. If in doubt, email me. More details will be presented in the first day of class.

Course Details

Instructor: John Augustine (email: augustine at cse)

Lectures: Room CS34, L Slot (Thursday 2 - 3.15 PM and Friday 3.25 - 4.40 PM). First class meeting is on Thursday Jan 16.

Office Hours:

    • Thursday 3.30 - 4.30 PM, Friday 4.45 - 5.45 (basically, an hour after each lecture).
    • Virtual Tutorial Hours. I will frequently set aside time to answer questions about assignments. During that time, I will make my self available on the instant messaging service discussed below. When possible, I will also make myself physically available at my office.
    • Otherwise, by appointment via email.

Textbooks: We will follow two textbooks. Latest versions are available freely in their respective web pages.

Online Interactions: You have the following options to interact online in this course.

    • Course web page. I will be updating material at regular intervals.
    • Email me directly for things that pertain to you (marks, etc.).
    • Instant Messaging: Post in the room mean't for this course hosted in the matrix server (address: #2020-CS6720:matrix.org ). The easiest way to do this is to create an account at Riot and send a message to me (address: @j.e.augustine:matrix.org) with your name and roll number. Then, I will add you to the course.
    • Moodle. Assignments will be posted in Moodle. You will also have to submit via Moodle. So if you are not enrolled in the course at Moodle, please fill this form: https://forms.gle/oPGqAbVyiFDr3CY68

Tentative Topics

Problems and Techniques

    • Link Analysis (PageRank, etc.)
    • Social Network Analysis
    • Clustering (k-means, etc.)

Computational Models

    • Streaming
    • Distributed Computing

Tentative Grading Scheme

I reserve the right to modify the grading scheme based on final class size and availability of TAs. Update on Feb 6: Quiz 1 is worth 15 (down from 20) and the final is now worth 30 (up from 25).

    • Two quizzes (15+20 = 35marks)
    • Final exam (30 marks)
    • Assignments (5 marks times three = 15 marks)
    • Short quizzes (2.5 marks times (best 8 out of 10) = 20 marks)
Lectures