DSAA5002

Data Mining and Knowledge Discovery in Data Science (Spring 2024)

Instructor: LI,Jia (office hour: We, 3:30PM -4:30pm)

TA: Chen, Xiaolong  & Zhao, Haihong & Li, Yuhan

Lecture Time: Mon 3:00 PM - 5:50PM

Venue: W1 101


Introduction

This course, will cover some basic algorithms in data mining and data science, including data preprocessing, classification (decision tree classifier, support vector machine, ensemble), clustering (k-means, hierarchical clustering, spectral clustering), anomaly detection, graph analytics, Association Analysis, PageRank, dimensionality reduction, EM algorithm, etc. 

Announcements

Feb. 19   the lecture will resume on Feb. 19

Jan. 22   hi all

Grading 

Mid exam: 50%

Project: 50%

Reference and Handouts

Reference books and courses for extra reading:

[1] Introduction to Data Mining. Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. 

[2] Data Mining: Concepts and Techniques. Jiawei Han, Micheline Kamber, and Jian Pei. 

[3] Foundations of Data Science. Avrim Blum, John Hopcroft, and Ravindran Kannan.

[4] Mining of Massive Datasets. Jure Leskovec, Anand Rajaraman, Jeff Ullman.  

[5] CMPSC1 689: Machine Learning. Subhransu Maji.  



Exercises

Exercise list1   Solution1

Exercise list2

Exercise list3   Solution3



Exam

See this as a demo.



Project

Each one chooses a research topic related to course material, e.g., data preprocessing, classification, clustering, anomaly detection. The report should follow ACM format with strict 6 pages limitation, including reference and appendix, see the following for reference https://kdd.org/kdd2021/calls/view/call-for-research-track-papers.  Here are some tips:



Resources


Anomaly detection

Outlier Detection and Description (KDD'21 workshop) 


Graph analytics

Tools for large graph mining: structure and diffusion (WWW'08 tutorial)


Graph learning publications

Graph based deep learning literature