TICS411 - Data Mining
General Information (1/2023)
Classroom: -
Day: Wednesday from 15:30 to 18:10 (Data Mining - Section 2)
Load: 45 hours (6 credits)
My room: UAI/FIC, Santiago, D-322. Daniel Leite (daniel.furtado@uai.cl)
Assistance from me: By email (any day/time) & By Zoom Meeting (Friday from 14:00 to 18:00; send me an email to schedule)
Course Overview
01 – Introduction to Data Mining
02 – Data types, data quality, preliminary concepts
03 – Data pre-processing: features, similarities, anomalies, overfitting, validation, evaluation, hyper-parameters
04 – Unsupervised learning: crisp, probabilistic and fuzzy clustering methods (KM, FCM, SOM, DBSCAN, …)
05 – Supervised learning: rule-based classifiers, nearest neighbor, naive Bayes, logistic regression, neural networks, SVM, ...
06 – Ensemble learning: bagging, boosting, random forest, ...
07 – Potential topics: PCA, recursion, auto-encoders, online learning, etc.
Objectives of the Course
01 – Introduce foundations and concepts of data mining and machine learning
02 – Students should become familiar with traditional unsupervised and supervised data mining frameworks
03 – Discuss issues on data pre-processing, parameter optimization, and model generalization and validation – “best practices”
04 – Be actively involved in two hands-on projects
05 – Solve problems in the context of data-driven modeling for classification and prediction using Python or R
Approximated Timeline
Week 1: Introduction to Data Mining
Week 2: Issues on data pre-processing and attribute selection
Week 3: Exploratory data analysis. Introduction to unsupervised learning
Week 4: Partition-based clustering: K-Means
Week 5: Hard density-based clustering: DBSCAN
Week 6: Hierarchical agglomerative clustering. Validation indices. Topics: Fuzzy C-Means, Mountain and Subtractive clustering
Week 7: Midterm Test (T1)
Week 8: Introduction to supervised learning (predictive modeling)
Week 9: K-Nearest Neighbors, and Naive Bayes classifier
Week 10: Decision Trees
Week 11: Ensemble Methods
Week 12: Multi-Layer Perceptron neural network
Week 13: Logistic Regression or Support Vector Machines
Week 14: Synthesis of the predictive models. Questions and answers to the final test
Week 15: Final Test (T2)
Week 16: Examen (E)
Evaluation and Report Due Dates
----
T1: Midterm Test (30%) - April 12, 2023
P1: Project 1 - Unsupervised methods (20%) - April 16, 2023
T2: Final Test (30%) - June 14, 2023
P2: Project 2 - Supervised methods (20%) - June 18, 2023
E: Exam - June 28, 2023 (Not mandatory)
----
After linear (straighforward) conversion from the 0%-100% range to the 1.0-7.0 scale, the final score is:
FinalScore = 0.3 T1 + 0.2 P1 + 0.3 T2 + 0.2 P2
Approved if (FinalScore >= 4.0) and (Class Attendance >=70%)
If (FinalScore < 4.0)
The score on a Written Exam (E) – covering all course content – replaces the worst of the two test scores (T1 or T2). A new FinalScore is calculated, which should be greater or equal 50% for approval
PS. The Exam is not mandatory. Anyone can decide to do the Exam. However, students that decide to do it will have his/her final score necessarily replaced
Main Textbooks
[1] Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar. Introduction to Data Mining. 2nd edition, Pearson, 2019
[2] Charu Aggarwal. Data Mining: The Textbook. Springer, 2015
Other Textbooks
[3] Cristopher Bishop, Pattern Recognition and Machine Learning. Springer, 2006
Datasets for Downloading
Kaggle (Link), UCI Machine Learning Repository (Link), Google Dataset Search (Link)
Supporting Material
Class Notes (available below)
1/2023 (Updated class notes will be gradually available below along the course)
1/2023 - Attachments: General Instructions
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
1/2023 - Attachments: Part 1 (Unsupervised Learning: Data Clustering)
![](https://www.google.com/images/icons/product/drive-32.png)
1/2023 - Attachments: Part 2 (Supervised Learning: Classification and Prediction)
...
2/2022 (The class notes and tests below will be deleted in a week...)
2/2022 - Attachments: Part 1 (Unsupervised Learning)
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
2/2022 - Attachments: Part 2 (Supervised Learning)
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)