This course provides an elementary introduction to basic concepts and techniques of machine learning and data mining for industrial artificial intelligence. Topics covered include data exploration, classification, association analysis, cluster analysis, and anomaly detection.
* Prerequisites: Linear Algebra, Applied Statistics I & II, Data Mining (or equivalent)
Class Time: Mon 12:00~13:15 and 13:30~14:45
Location: 26419 (Engineering Building 2)
Language: Korean
Prof. Seokho Kang
Office: 27408B (Engineering Building 2)
E-mail: s.kang@skku.edu
Office Hours: Mon 13:30~15:00 or by appointment
Mr. Kyoham Shin and Ms. Suhee Yoon
Office: 27407 - Data Mining Lab. (Engineering Building 2)
E-mail: 21khaiden@gmail.com and suhee.y66@gmail.com
Office Hours: after class
Pang-Ning Tan, Michael Steinbach, Anuj Karpatne & Vipin Kumar, Introduction to Data Mining (2nd ed.), Pearson, 2019. (https://www-users.cs.umn.edu/~kumar001/dmbook)
Jiawei Han, Micheline Kamber & Jian Pei, Data Mining: Concepts and Techniques (3rd ed.), Morgan Kaufmann Publishers, 2011. (http://hanj.cs.illinois.edu/bk3)
Attendance (10%)
Assignments (10%)
Presentation (10%)
Mid-term Exam (35%)
Final Exam (35%)
Total (100% + a)
Syllabus [download] (last updated on 3/24)
Note: ALL classes (except mid-term and final exams) will be conducted online using icampus (https://icampus.skku.edu/)
* Most of the slides are adapted from Pang-Ning Tan’s materials, which are available at https://www-users.cs.umn.edu/~kumar001/dmbook
[3/9] Course Introduction [download] (last updated on 3/9)
Liao, S. H., Chu, P. H., & Hsiao, P. Y. (2012). Data mining techniques and applications–A decade review from 2000 to 2011. Expert Systems with Applications, 39(12), 11303-11311.
Wu, X., et al. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1-37.
[3/16] 1. Data [download] (last updated on 3/9)
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157-1182.
Lin, W. C., & Tsai, C. F. (2019). Missing value imputation: a review and analysis of the literature (2006–2017). Artificial Intelligence Review, 1-23.
[3/23] 2. Classification: Basic Concepts [download] (last updated on 3/9)
Kuhn, M., & Johnson, K. (2013). Over-fitting and model tuning. In Applied predictive modeling (pp. 61-92). Springer, New York, NY.
Domingos, P. (2000). A unified bias-variance decomposition and its applications. In Proceedings of 17th International Conference on Machine Learning (pp. 231-238).
Cawley, G. C., & Talbot, N. L. (2010). On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research, 11(Jul), 2079-2107.
[3/30] 3. Classification: Algorithms - Part 1 [download]
Bottou, L., & Vapnik, V. (1992). Local learning algorithms. Neural computation, 4(6), 888-900.
Garcia, S., Derrac, J., Cano, J., & Herrera, F. (2012). Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), 417-435.
[4/6] 3. Classification: Algorithms - Part 2 [download]
Burges, C. J. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121-167.
Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199-222.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85-117.
[4/13] 3. Classification: Algorithms - Part 3 [download]
Rokach, L. (2010). Ensemble-based classifiers. Artificial Intelligence Review, 33(1-2), 1-39.
Krawczyk, B. (2016). Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5(4), 221-232.
Kang, S. (2020). Model validation failure in class imbalance problems. Expert Systems with Applications, 113190.
Hand, D. J. (2006). Classifier technology and the illusion of progress. Statistical Science, 1-14.
[5/4] 4. Cluster Analysis - Part 1 [download]
Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651-666.
Dhillon, I. S., Guan, Y., & Kulis, B. (2004). Kernel k-means: spectral clustering and normalized cuts. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 551-556).
[5/11] 4. Cluster Analysis - Part 2 [download]
Kriegel, H. P., Kröger, P., Sander, J., & Zimek, A. (2011). Density‐based clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(3), 231-240.
Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and computing, 17(4), 395-416.
Vega-Pons, S., & Ruiz-Shulcloper, J. (2011). A survey of clustering ensemble algorithms. International Journal of Pattern Recognition and Artificial Intelligence, 25(3), 337-372.
[5/18] 5. Anomaly Detection [download]
Tax, D. M. (2001). One-class classification. PhD thesis, Delft University of Technology.
Tax, D. M., & Duin, R. P. (2004). Support vector data description. Machine Learning, 54(1), 45-66.
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 1-58.
Khan, S. S., & Madden, M. G. (2014). One-class classification: taxonomy of study and review of techniques. Knowledge Engineering Review, 29(3), 345-374.
[5/25] 6. Association Analysis [download]
Han, J., Cheng, H., Xin, D., & Yan, X. (2007). Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery, 15(1), 55-86.
Geng, L., & Hamilton, H. J. (2006). Interestingness measures for data mining: A survey. ACM Computing Surveys, 38(3), 9-es.
[6/15] Final Exam
[4/20] Paper Presentation 1 [download]
[4/27] Special Issues in Industrial Artificial Intelligence: Overcoming Data Scarcity
[6/1] Paper Presentation 2 [download]
[6/8} Invited Talk by Dr. Inbeom Park, Postdoctoral Associate at Seoul National University
Each student will present at least one research paper in class (give a 10-min presentation). The objective is to share recent research advances or applications of industrial artificial intelligence in real-world problems in industrial engineering fields.
Submit your presentation video (10-min) and material to the instructor and TA at least one day before your presentation (e-mails: s.kang@skku.edu and 21khaiden@gmail.com)
[4/20] Paper Presentation I (12 students): 강현구, 박일하, 박희영, 백승민, 신교함, 이규원, 이해인, 이형준, 전용석, 최예림, 최원화, 한종민
[6/1] Paper Presentation II (12 students): 강명인, 강문식, 김민환, 김재호, 김지수, 김진주, 김한민, 김혜인, 김현찬, 송승현, 심아름, 이세진
All assignments must be submitted to icampus (https://icampus.skku.edu/) by midnight of the due date. Late submissions will NOT be accepted.
[A1] Self-introduction Report (due date: 3/30): introduce yourself within 1 page. include the following: your name, student ID, personal history, your research areas/interests, what do you expect to learn from this course?, suggestions/comments to the instructor, miscellaneous.
[A2] Research Proposal (due date: 6/15): write a research proposal within 3 pages. the topic must be relevant to what you have learned or will learn from this course.
Students are responsible for maintaining high standards of academic integrity in all of their class activities. Cheating or plagiarism in any form will not be tolerated. Any violation of academic integrity is a serious offense and is therefore subject to an appropriate sanction or penalty.