TICS411 - Data Mining

General Information (1/2023)

Classroom: -

Day: Wednesday from 15:30 to 18:10 (Data Mining - Section 2)

Load: 45 hours (6 credits)

My room: UAI/FIC, Santiago, D-322. Daniel Leite (daniel.furtado@uai.cl)

Assistance from me: By email (any day/time) & By Zoom Meeting (Friday from 14:00 to 18:00; send me an email to schedule)

Course Overview

01 – Introduction to Data Mining

02 – Data types, data quality, preliminary concepts

03 – Data pre-processing: features, similarities, anomalies, overfitting, validation, evaluation, hyper-parameters

04 – Unsupervised learning: crisp, probabilistic and fuzzy clustering methods (KM, FCM, SOM, DBSCAN, …)

05 – Supervised learning: rule-based classifiers, nearest neighbor, naive Bayes, logistic regression, neural networks, SVM, ...

06 – Ensemble learning: bagging, boosting, random forest, ...

07 – Potential topics: PCA, recursion, auto-encoders, online learning, etc.


Objectives of the Course

01 – Introduce foundations and concepts of data mining and machine learning

02 – Students should become familiar with traditional unsupervised and supervised data mining frameworks

03 – Discuss issues on data pre-processing, parameter optimization, and model generalization and validation – “best practices”

04 – Be actively involved in two hands-on projects

05 – Solve problems in the context of data-driven modeling for classification and prediction using Python or R


Approximated Timeline

Week 1: Introduction to Data Mining

Week 2: Issues on data pre-processing and attribute selection

Week 3: Exploratory data analysis. Introduction to unsupervised learning

Week 4: Partition-based clustering: K-Means

Week 5: Hard density-based clustering: DBSCAN

Week 6: Hierarchical agglomerative clustering. Validation indices. Topics: Fuzzy C-Means, Mountain and Subtractive clustering

Week 7: Midterm Test (T1)

Week 8: Introduction to supervised learning (predictive modeling)

Week 9: K-Nearest Neighbors, and Naive Bayes classifier

Week 10: Decision Trees

Week 11: Ensemble Methods

Week 12: Multi-Layer Perceptron neural network

Week 13: Logistic Regression or Support Vector Machines

Week 14: Synthesis of the predictive models. Questions and answers to the final test

Week 15: Final Test (T2)

Week 16: Examen (E)

Evaluation and Report Due Dates

----

T1: Midterm Test (30%) - April 12, 2023

P1: Project 1 - Unsupervised methods (20%) - April 16, 2023

T2: Final Test (30%) - June 14, 2023

P2: Project 2 - Supervised methods (20%) - June 18, 2023

E: Exam - June 28, 2023 (Not mandatory)

----

After linear (straighforward) conversion from the 0%-100% range to the 1.0-7.0 scale, the final score is:

FinalScore = 0.3 T1 + 0.2 P1 + 0.3 T2 + 0.2 P2 

Approved if (FinalScore >= 4.0) and (Class Attendance >=70%)

If (FinalScore < 4.0)

The score on a Written Exam (E) covering all course content replaces the worst of the two test scores (T1 or T2). A new FinalScore is calculated, which should be greater or equal 50% for approval

PS. The Exam is not mandatory. Anyone can decide to do the Exam. However, students that decide to do it will have his/her final score necessarily replaced

Main Textbooks

[1] Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar. Introduction to Data Mining. 2nd edition, Pearson, 2019

[2] Charu Aggarwal. Data Mining: The Textbook. Springer, 2015

Other Textbooks

[3] Cristopher Bishop, Pattern Recognition and Machine Learning. Springer, 2006


Datasets for Downloading

Kaggle (Link), UCI Machine Learning Repository (Link), Google Dataset Search (Link)


Supporting Material

Class Notes (available below)


1/2023 (Updated class notes will be gradually available below along the course)

1/2023 - Attachments: General Instructions

DataMining - 00.pdf
TICS 411 - About Project 1.pdf
TICS 411 - About Project 2.pdf

1/2023 - Attachments: Part 1 (Unsupervised Learning: Data Clustering)

DataMining - 01.pdf

1/2023 - Attachments: Part 2 (Supervised Learning: Classification and Prediction)

...


2/2022 (The class notes and tests below will be deleted in a week...)

2/2022 - Attachments: Part 1 (Unsupervised Learning)

DM - 00.pdf
DM - 01.pdf
DM - 02.pdf
DM - 03.pdf
DM - 04.pdf
DM - 05.pdf
DM - 06.pdf
TICS411 - Test 1.pdf
TICS411 - Scores - Test 1.pdf

2/2022 - Attachments: Part 2 (Supervised Learning)

DM - 07.pdf
DM - 08.pdf
DM - 09.pdf
DM - 10.pdf
DM - 11.pdf
TICS411 - Test 2.pdf
TICS 411 - Scores - Test 2.pdf
TICS 411 - Exam.pdf
TICS 411 - Answers.pdf