This course provides an elementary introduction to basic concepts and techniques of machine learning and data mining for industrial data analytics. Topics covered include data exploration, classification, cluster analysis, anomaly detection, and association analysis.
* This course is equivalent to "ESM5204: Industrial Artificial Intelligence". Do not sign up if already taken.
** Prerequisites: Linear Algebra, Applied Statistics I & II, Data Mining (or equivalent)
*** You must have taken the prerequisite course (or equivalent) before taking this course.
Class Time: Wed 15:00~18:00
Location: online – icampus (https://icampus.skku.edu/)
Language: English (except Q&A)
Prof. Seokho Kang
Office: 27408B (Engineering Building 2)
E-mail: s.kang@skku.edu
Office Hours: by appointment via e-mail
Mr. Hyungu Kang and Ms. Sunbin Lee
Office: 27407 - Data Mining Lab. (Engineering Building 2)
E-mail: via_monoh@naver.com and gozjd1234@naver.com
Pang-Ning Tan, Michael Steinbach, Anuj Karpatne & Vipin Kumar, Introduction to Data Mining (2nd ed.), Pearson, 2019. (https://www-users.cs.umn.edu/~kumar001/dmbook)
Jiawei Han, Micheline Kamber & Jian Pei, Data Mining: Concepts and Techniques (3rd ed.), Morgan Kaufmann Publishers, 2011. (http://hanj.cs.illinois.edu/bk3)
Attendance (10%)
Assignments (10%)
Presentation (10%)
Mid-term Exam (30%)
Final Exam (40%)
Total (100% + a)
Syllabus [download]
Note: ALL classes will be conducted online using icampus.
* Most of the slides are adapted from Pang-Ning Tan’s materials, which are available at https://www-users.cs.umn.edu/~kumar001/dmbook
[9/1] Course Introduction [download]
Liao, S. H., Chu, P. H., & Hsiao, P. Y. (2012). Data mining techniques and applications–A decade review from 2000 to 2011. Expert Systems with Applications, 39(12), 11303-11311.
Wu, X., et al. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1-37.
[9/8] 1. Data [download]
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157-1182.
Lin, W. C., & Tsai, C. F. (2019). Missing value imputation: a review and analysis of the literature (2006–2017). Artificial Intelligence Review, 1-23.
[9/15] 2. Classification: Basic Concepts [download]
Kuhn, M., & Johnson, K. (2013). Over-fitting and model tuning. In Applied Predictive Modeling (pp. 61-92). Springer, New York, NY.
Domingos, P. (2000). A unified bias-variance decomposition and its applications. In Proceedings of International Conference on Machine Learning (pp. 231-238).
Cawley, G. C., & Talbot, N. L. (2010). On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research, 11(Jul), 2079-2107.
[9/22] 3. Classification: Algorithms - Part 1 [download]
Bottou, L., & Vapnik, V. (1992). Local learning algorithms. Neural computation, 4(6), 888-900.
Garcia, S., Derrac, J., Cano, J., & Herrera, F. (2012). Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), 417-435.
[9/29] 4. Classification: Algorithms - Part 2 [download]
Burges, C. J. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121-167.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85-117.
[10/6] 5. Classification: Algorithms - Part 3 [download]
Rokach, L. (2010). Ensemble-based classifiers. Artificial Intelligence Review, 33(1-2), 1-39.
Krawczyk, B. (2016). Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5(4), 221-232.
Kang, S. (2020). Model validation failure in class imbalance problems. Expert Systems with Applications, 113190.
Santos, M. S., Soares, J. P., Abreu, P. H., Araujo, H., & Santos, J. (2018). Cross-validation for imbalanced datasets: Avoiding overoptimistic and overfitting approaches. IEEE Computational Intelligence Magazine, 13(4), 59-76.
Hand, D. J. (2006). Classifier technology and the illusion of progress. Statistical Science, 1-14.
[10/13] Review Session (Lectures 1-5, in Korean, optional)
[10/20] Mid-Term Exam (online, icampus)
[10/27] 6. Cluster Analysis - Part 1 [download]
Jain, A. K. (2010). Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8), 651-666.
Dhillon, I. S., Guan, Y., & Kulis, B. (2004). Kernel k-means: spectral clustering and normalized cuts. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data mining (pp. 551-556).
[11/3] 7. Cluster Analysis - Part 2 [download]
Kriegel, H. P., Kröger, P., Sander, J., & Zimek, A. (2011). Density‐based clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(3), 231-240.
Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395-416.
Vega-Pons, S., & Ruiz-Shulcloper, J. (2011). A survey of clustering ensemble algorithms. International Journal of Pattern Recognition and Artificial Intelligence, 25(3), 337-372.
[11/10] 8. Anomaly Detection [download]
Tax, D. M. (2001). One-class classification. PhD thesis, Delft University of Technology.
Tax, D. M., & Duin, R. P. (2004). Support vector data description. Machine Learning, 54(1), 45-66.
Ruff, L. et al. (2021). A unifying review of deep and shallow anomaly detection. Proceedings of the IEEE, 109(5), 756-795.
[11/17] 9. Association Analysis [download]
Han, J., Cheng, H., Xin, D., & Yan, X. (2007). Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery, 15(1), 55-86.
Geng, L., & Hamilton, H. J. (2006). Interestingness measures for data mining: A survey. ACM Computing Surveys, 38(3), 9-es.
[11/24] Special Topics in Data Mining - Invited Talk by Saerom Park (Assistant Professor, Sungshin Women's University)
[12/1] Review Session (Lectures 6-9, in Korean, optional)
[12/8] Final Exam
-
Each student will give a 10-min presentation in class (in English). The objective is to share the applications of data mining in students’ own research fields. Students are required to schedule the presentation (the schedule form has been released on icampus). The presentation topic should be relevant to the lecture topic of the same day. The presentation video and material should be submitted to the instructor via e-mail at least one day before the presentation.
All assignments should be submitted to icampus by midnight of the due date. Late submissions will NOT be accepted.
[A1] Self-introduction report (due date: 9/15)
[A2] Research proposal (due date: 12/1)
Students are responsible for maintaining high standards of academic integrity in all of their class activities. Cheating or plagiarism in any form will not be tolerated. Any violation of academic integrity is a serious offense and is therefore subject to an appropriate sanction or penalty.