Data science has become an important part of various industrial fields, such as manufacturing, marketing, and finance. This course aims to provide a broad introduction to the programming language for data science. Students will learn and practice how to implement data science techniques with Python, including data preprocessing, supervised learning, unsupervised learning, and model selection and evaluation.
* Prerequisites: Python, Linear Algebra, Applied Statistics II, and Data Mining
** You must have taken the prerequisite courses (or equivalent) before taking this course.
Class Time: Friday, 12:00-14:45
Location: 26421 (Engineering Building 2)
Language: Korean
Prof. Seokho Kang
Office: 27408B (Engineering Building 2)
E-mail: s.kang at skku.edu
Office Hours: by appointment
Ms. Jinju Park
Office: 27407 - Data Mining Lab. (Engineering Building 2)
E-mails: apfhsk777 at naver.com
Andreas C. Müller & Sarah Guido, Introduction to Machine Learning with Python: A Guide for Data Scientists, O'Reilly Media, 2016.
Aurélien Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, O'Reilly Media, 2019.
scikit-learn: Machine Learning in Python, https://scikit-learn.org/
Attendance (10%)
Assignments (20%)
Mid-term Exam (30%)
Final Exam (40%)
Total (100% + a)
Syllabus [download]
Course Introduction [download]
1. Supervised Learning - Part 1 (Overview) [download]
1. Supervised Learning - Part 2 (k-NN) [download]
1. Supervised Learning - Part 3 (Linear Models) [download]
1. Supervised Learning - Part 4 (DT, Ensemble) [download]
1. Supervised Learning - Part 5 (SVM) [download]
1. Supervised Learning - Part 6 (ANN, Misc.) [download]
2. Unsupervised Learning - Part 1 (Overview) [download]
2. Unsupervised Learning - Part 2 (PCA) [download]
2. Unsupervised Learning - Part 3 (t-SNE) [download]
2. Unsupervised Learning - Part 4 (k-Means, Hierarchical) [download]
2. Unsupervised Learning - Part 5 (DBSCAN, Misc) [download]
3. Representing Data and Engineering Features [download]
4. Model Evaluation and Improvement [download]
5. Algorithm Chains and Pipelines [download]
6. ML Project Checklist [download]
Jupyter Notebooks for Lecture Notes [download]
Assignments should be submitted to icampus by midnight on the due date. Late submissions will NOT be accepted.
[A1] Self-Introduction (due date: 3/21) - icampus
[A2] Comparison of Supervised Learning Algorithms (due date: 4/11) - icampus
[A3] Classification with Reject Option (due date: 4/25) - icampus
[A4] Clustering and Feature Engineering (due date: 5/23) - icampus
[A5] AutoML (due date: 6/13) - icampus
Students are responsible for maintaining high standards of academic integrity in all of their class activities. Cheating or plagiarism in any form will not be tolerated. Any violation of academic integrity is a serious offense and is therefore subject to an appropriate sanction or penalty.