ESM4111: Data Analytics and Machine Learning (Fall 2022)
Notice
Overview
This course provides an elementary introduction to basic concepts and techniques of machine learning and data mining for industrial data analytics. Topics covered include data exploration, classification, cluster analysis, anomaly detection, and association analysis.
* This course is equivalent to "ESM5204: Industrial Artificial Intelligence". Do not sign up if already taken.
** Prerequisites: Linear Algebra, Applied Statistics II, Data Mining, and Programming for Data Science
*** You must have taken the prerequisite courses (or equivalent) before taking this course.
General Information
Class Time: Thu 15:00~17:45
Location: 26421 (Engineering Building 2)
Language: English (except Q&A)
Instructor
Prof. Seokho Kang
Office: 27408B (Engineering Building 2)
E-mail: s.kang@skku.edu
Office Hours: by appointment via e-mail
Teaching Assistant
Ms. Jisu Yoo
Office: 27407 - Data Mining Lab. (Engineering Building 2)
E-mail: yoo171084@gmail.com
References
Pang-Ning Tan, Michael Steinbach, Anuj Karpatne & Vipin Kumar, Introduction to Data Mining (2nd ed.), Pearson, 2019. (https://www-users.cs.umn.edu/~kumar001/dmbook)
Jiawei Han, Micheline Kamber & Jian Pei, Data Mining: Concepts and Techniques (3rd ed.), Morgan Kaufmann Publishers, 2011. (http://hanj.cs.illinois.edu/bk3)
Evaluation
Attendance (10%)
Assignments (10%)
Presentation (10%)
Mid-term Exam (30%)
Final Exam (40%)
Total (100% + a)
Schedule (Subject to Change)
Syllabus [download]
Lecture Notes + Further Readings (will be updated soon)
* Most of the lecture slides are adapted from Pang-Ning Tan’s materials, which are available at https://www-users.cs.umn.edu/~kumar001/dmbook
** Futher readings are not mandatory but strongly recommended.
[9/1] Course Introduction [download]
Liao, S. H., Chu, P. H., & Hsiao, P. Y. (2012). Data mining techniques and applications–A decade review from 2000 to 2011. Expert Systems with Applications, 39(12), 11303-11311.
Wu, X., et al. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1-37.
[9/8] 1. Data [download]
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157-1182.
Lin, W. C., & Tsai, C. F. (2019). Missing value imputation: a review and analysis of the literature (2006–2017). Artificial Intelligence Review, 1-23.
[9/15] 2. Classification: Basic Concepts [download]
Kuhn, M., & Johnson, K. (2013). Over-fitting and model tuning. In Applied Predictive Modeling (pp. 61-92). Springer, New York, NY.
Domingos, P. (2000). A unified bias-variance decomposition and its applications. In Proceedings of International Conference on Machine Learning (pp. 231-238).
Cawley, G. C., & Talbot, N. L. (2010). On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research, 11(Jul), 2079-2107.
[9/22] 3. Classification: Algorithms - Part 1 [download]
Bottou, L., & Vapnik, V. (1992). Local learning algorithms. Neural computation, 4(6), 888-900.
Garcia, S., Derrac, J., Cano, J., & Herrera, F. (2012). Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), 417-435.
[9/29] 4. Classification: Algorithms - Part 2 [download]
Burges, C. J. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121-167.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85-117.
[10/6] 5. Classification: Algorithms - Part 3 [download]
Rokach, L. (2010). Ensemble-based classifiers. Artificial Intelligence Review, 33(1-2), 1-39.
Krawczyk, B. (2016). Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5(4), 221-232.
Kang, S. (2020). Model validation failure in class imbalance problems. Expert Systems with Applications, 113190.
Santos, M. S., Soares, J. P., Abreu, P. H., Araujo, H., & Santos, J. (2018). Cross-validation for imbalanced datasets: Avoiding overoptimistic and overfitting approaches. IEEE Computational Intelligence Magazine, 13(4), 59-76.
[10/13] Review Session (Lectures 1-5, in Korean, optional)
[10/20] Mid-Term Exam
[10/27] 6. Cluster Analysis - Part 1 [download]
Jain, A. K. (2010). Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8), 651-666.
Dhillon, I. S., Guan, Y., & Kulis, B. (2004). Kernel k-means: spectral clustering and normalized cuts. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data mining (pp. 551-556).
[11/3] 7. Cluster Analysis - Part 2 [download]
Kriegel, H. P., Kröger, P., Sander, J., & Zimek, A. (2011). Density‐based clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(3), 231-240.
Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395-416.
Vega-Pons, S., & Ruiz-Shulcloper, J. (2011). A survey of clustering ensemble algorithms. International Journal of Pattern Recognition and Artificial Intelligence, 25(3), 337-372.
[11/10] 8. Anomaly Detection [download]
Tax, D. M. (2001). One-class classification. PhD thesis, Delft University of Technology.
Tax, D. M., & Duin, R. P. (2004). Support vector data description. Machine Learning, 54(1), 45-66.
Ruff, L. et al. (2021). A unifying review of deep and shallow anomaly detection. Proceedings of the IEEE, 109(5), 756-795.
[11/17] 9. Association Analysis [download]
Han, J., Cheng, H., Xin, D., & Yan, X. (2007). Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery, 15(1), 55-86.
Geng, L., & Hamilton, H. J. (2006). Interestingness measures for data mining: A survey. ACM Computing Surveys, 38(3), 9-es.
[11/24] Review Session (Lectures 6-9, in Korean, optional)
[12/1] Final Exam
Supplementary Materials
[12/8] Special Topics in Data Mining - Invited Talk by Hyungu Kang (Ph.D. Student, SKKU)
Presentation
Students are required to give a 10-min paper presentation online using icampus (in English). The objective is to share journal papers or conference proceedings (it doesn’t have to be yours) devoted to the application of data mining in students’ own research fields. Students are responsible for scheduling the presentation using the schedule form that will be released on icampus after class in the 2nd week. The presentation topic should be relevant to the lecture topic of the same week. The presentation video and material should be submitted to the instructor via e-mail at least one day before the presentation. The presentation video will be uploaded to icampus so that other students will watch and evaluate it as a class activity for the week.
Assignments
All assignments should be submitted to icampus by midnight of the due date. Late submissions will NOT be accepted.
[A1] Self-introduction (due date: 9/15) - icampus
[A2] Paper presentation schedule (due date: 9/15) - icampus
[A3] Research proposal (due date: 12/1) - icampus
Academic Integrity
Students are responsible for maintaining high standards of academic integrity in all of their class activities. Cheating or plagiarism in any form will not be tolerated. Any violation of academic integrity is a serious offense and is therefore subject to an appropriate sanction or penalty.