ESM4111: Data Analytics and Machine Learning (Fall 2022)

Notice


Overview

This course provides an elementary introduction to basic concepts and techniques of machine learning and data mining for industrial data analytics. Topics covered include data exploration, classification, cluster analysis, anomaly detection, and association analysis.

* This course is equivalent to "ESM5204: Industrial Artificial Intelligence". Do not sign up if already taken.
**
Prerequisites: Linear Algebra, Applied Statistics II, Data Mining, and Programming for Data Science
*** You must have taken the prerequisite courses (or equivalent) before taking this course.

General Information

    • Class Time: ­Thu 15:00~17:45

    • Location: 26421 (Engineering Building 2)

    • Language: English (except Q&A)

Instructor

    • Prof. Seokho Kang

    • Office: 27408B (Engineering Building 2)

    • E-mail: s.kang@skku.edu

    • Office Hours: by appointment via e-mail

Teaching Assistant

    • Ms. Jisu Yoo

    • Office: 27407 - Data Mining Lab. (Engineering Building 2)

    • E-mail: yoo171084@gmail.com

References

Evaluation

    • Attendance (10%)

    • Assignments (10%)

    • Presentation (10%)

    • Mid-term Exam (30%)

    • Final Exam (40%)

    • Total (100% + a)

Schedule (Subject to Change)

Lecture Notes + Further Readings (will be updated soon)

* Most of the lecture slides are adapted from Pang-Ning Tan’s materials, which are available at https://www-users.cs.umn.edu/~kumar001/dmbook
** Futher readings are not mandatory but strongly recommended.

    • [9/1] Course Introduction [download]

        1. Liao, S. H., Chu, P. H., & Hsiao, P. Y. (2012). Data mining techniques and applications–A decade review from 2000 to 2011. Expert Systems with Applications, 39(12), 11303-11311.

        2. Wu, X., et al. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1-37.

    • [9/8] 1. Data [download]

        1. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157-1182.

        2. Lin, W. C., & Tsai, C. F. (2019). Missing value imputation: a review and analysis of the literature (2006–2017). Artificial Intelligence Review, 1-23.

    • [9/15] 2. Classification: Basic Concepts [download]

        1. Kuhn, M., & Johnson, K. (2013). Over-fitting and model tuning. In Applied Predictive Modeling (pp. 61-92). Springer, New York, NY.

        2. Domingos, P. (2000). A unified bias-variance decomposition and its applications. In Proceedings of International Conference on Machine Learning (pp. 231-238).

        3. Cawley, G. C., & Talbot, N. L. (2010). On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research, 11(Jul), 2079-2107.

    • [9/22] 3. Classification: Algorithms - Part 1 [download]

        1. Bottou, L., & Vapnik, V. (1992). Local learning algorithms. Neural computation, 4(6), 888-900.

        2. Garcia, S., Derrac, J., Cano, J., & Herrera, F. (2012). Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), 417-435.

    • [9/29] 4. Classification: Algorithms - Part 2 [download]

        1. Burges, C. J. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121-167.

        2. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.

        3. Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85-117.

    • [10/6] 5. Classification: Algorithms - Part 3 [download]

        1. Rokach, L. (2010). Ensemble-based classifiers. Artificial Intelligence Review, 33(1-2), 1-39.

        2. Krawczyk, B. (2016). Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5(4), 221-232.

        3. Kang, S. (2020). Model validation failure in class imbalance problems. Expert Systems with Applications, 113190.

        4. Santos, M. S., Soares, J. P., Abreu, P. H., Araujo, H., & Santos, J. (2018). Cross-validation for imbalanced datasets: Avoiding overoptimistic and overfitting approaches. IEEE Computational Intelligence Magazine, 13(4), 59-76.

    • [10/13] Review Session (Lectures 1-5, in Korean, optional)

    • [10/20] Mid-Term Exam

    • [10/27] 6. Cluster Analysis - Part 1 [download]

        1. Jain, A. K. (2010). Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8), 651-666.

        2. Dhillon, I. S., Guan, Y., & Kulis, B. (2004). Kernel k-means: spectral clustering and normalized cuts. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data mining (pp. 551-556).

    • [11/3] 7. Cluster Analysis - Part 2 [download]

        1. Kriegel, H. P., Kröger, P., Sander, J., & Zimek, A. (2011). Density‐based clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(3), 231-240.

        2. Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395-416.

        3. Vega-Pons, S., & Ruiz-Shulcloper, J. (2011). A survey of clustering ensemble algorithms. International Journal of Pattern Recognition and Artificial Intelligence, 25(3), 337-372.

    • [11/10] 8. Anomaly Detection [download]

        1. Tax, D. M. (2001). One-class classification. PhD thesis, Delft University of Technology.

        2. Tax, D. M., & Duin, R. P. (2004). Support vector data description. Machine Learning, 54(1), 45-66.

        3. Ruff, L. et al. (2021). A unifying review of deep and shallow anomaly detection. Proceedings of the IEEE, 109(5), 756-795.

    • [11/17] 9. Association Analysis [download]

        1. Han, J., Cheng, H., Xin, D., & Yan, X. (2007). Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery, 15(1), 55-86.

        2. Geng, L., & Hamilton, H. J. (2006). Interestingness measures for data mining: A survey. ACM Computing Surveys, 38(3), 9-es.

    • [11/24] Review Session (Lectures 6-9, in Korean, optional)

    • [12/1] Final Exam

Supplementary Materials

    • [12/8] Special Topics in Data Mining - Invited Talk by Hyungu Kang (Ph.D. Student, SKKU)

Presentation

Students are required to give a 10-min paper presentation online using icampus (in English). The objective is to share journal papers or conference proceedings (it doesn’t have to be yours) devoted to the application of data mining in students’ own research fields. Students are responsible for scheduling the presentation using the schedule form that will be released on icampus after class in the 2nd week. The presentation topic should be relevant to the lecture topic of the same week. The presentation video and material should be submitted to the instructor via e-mail at least one day before the presentation. The presentation video will be uploaded to icampus so that other students will watch and evaluate it as a class activity for the week.

Assignments

All assignments should be submitted to icampus by midnight of the due date. Late submissions will NOT be accepted.

    • [A1] Self-introduction (due date: 9/15­) - icampus

    • [A2] Paper presentation schedule (due date: 9/15­) - icampus

    • [A3] Research proposal (due date: 12/1) - icampus

Academic Integrity

Students are responsible for maintaining high standards of academic integrity in all of their class activities. Cheating or plagiarism in any form will not be tolerated. Any violation of academic integrity is a serious offense and is therefore subject to an appropriate sanction or penalty.