DSAA5002
Data Mining and Knowledge Discovery in Data Science (Spring 2024)
Instructor: LI,Jia (office hour: We, 3:30PM -4:30pm)
TA: Chen, Xiaolong & Zhao, Haihong & Li, Yuhan
Lecture Time: Mon 3:00 PM - 5:50PM
Venue: W1 101
Introduction
This course, will cover some basic algorithms in data mining and data science, including data preprocessing, classification (decision tree classifier, support vector machine, ensemble), clustering (k-means, hierarchical clustering, spectral clustering), anomaly detection, graph analytics, Association Analysis, PageRank, dimensionality reduction, EM algorithm, etc.
Announcements
Feb. 19 the lecture will resume on Feb. 19.
Jan. 22 hi all.
Grading
Mid exam: 50%
Project: 50%
Reference and Handouts
Reference books and courses for extra reading:
[1] Introduction to Data Mining. Pang-Ning Tan, Michael Steinbach, and Vipin Kumar.
[2] Data Mining: Concepts and Techniques. Jiawei Han, Micheline Kamber, and Jian Pei.
[3] Foundations of Data Science. Avrim Blum, John Hopcroft, and Ravindran Kannan.
[4] Mining of Massive Datasets. Jure Leskovec, Anand Rajaraman, Jeff Ullman.
[5] CMPSC1 689: Machine Learning. Subhransu Maji.
Handout 1: Introduction & Data Preprocessing
Handout 2: Decision Tree
Handout 3: Linear Models Andrew Ng's Notes
Handout 5: K-means
Handout 6: Hierarchical Clustering
Handout 7: Graph Analytics
Handout 8: Spectral Clustering
Handout 9: Apriori
Mid-term Exam (April 1)
Handout 10: Anomaly Detection
Handout 11: PageRank
Handout 12: HITS and SimRank
Handout 13: PCA
Handout 14: EM
Handout 15: HMM
Exercises
Exam
See this as a demo.
Project
Each one chooses a research topic related to course material, e.g., data preprocessing, classification, clustering, anomaly detection. The report should follow ACM format with strict 6 pages limitation, including reference and appendix, see the following for reference https://kdd.org/kdd2021/calls/view/call-for-research-track-papers. Here are some tips:
The report should at least consist of introduction, related work, methodology and experiment. Theoretical deviation is not a necessity but encouraged.
Use concise and clear language.
Clearly declare your difference with previous works.
If there is any theoretical deviation, check your assumption and make sure it is non-fragile.
Resources
Anomaly detection
Outlier Detection and Description (KDD'21 workshop)
Graph analytics
Tools for large graph mining: structure and diffusion (WWW'08 tutorial)
Graph learning publications