Data Mining and Big Data
資料探勘與大數據
Data Mining and Big Data
資料探勘與大數據
Instructor: 徐立群 (LihChyun Shu, shulc@mail.ncku.edu.tw)
Office: 63323研究室
Course Description:
In the last few years, big data is being heard everywhere. IBM characterizes big data using 4Vs, namely volume, velocity, variety, and veracity. These characteristics often make traditional data processing techniques inadequate. Researchers are identifying new problems which require investigating new solutions. Data mining is the exploration and analysis of large quantities of data in order to discover valid, novel, potentially useful, and ultimately understandable patterns in data. While data mining technology has been applied to some big data problems, innovative solutions are currently being investigated due to emerging challenging problems. In this course, students will learn cutting-edge data mining techniques that are applicable to real world problems. In addition, they will read a few important articles to understand the problems that still need new solutions.
Course Objectives:
Students will have a good understanding of the conceptual foundations of data mining. They will learn to use data mining software to explore a few data sets and try to discover the underlying patterns. Students will learn cutting-edge data mining technologies and learn to apply them to a few practical problems. In addition, we will look into big data problems, such as finding similar items.
Content Summary:
Introduction to data-analytic thinking
Business problems and data science solutions
Introduction to predictive modeling
Fitting a model to data
Outfitting and its avoidance
Similarity, neighbors, and cluster
Decision analytic thinking: what is a good model?
Explainable artificial intelligence
Visualizing model performance
Evidence and Probabilities
Process mining and its applications
Finding similar items
Textbook
Foster Provost and Tom Fawcett. Data science for business, O’Reilly, 2013.
References
Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman. Mining of Massive Datasets, 2014.
Eric Siegel. 預測分析時代: 讓數據告訴你,誰會買、誰說謊、誰會離職、誰會死! 大塊出版社,2014.
Gareth James et al. An Introduction to Statistical Learning with Applications in Python, Springer, 2023.
Selected articles from major business and IT journals and magazines
Introduction to data-analytic thinking (Chapter 1 of Provost & Fawcett)
Business problems and data science solutions (Chapter 2 of Provost & Fawcett)
Introduction to predictive modeling: From Correlation to Supervised Segmentation (Chapter 3 of Provost & Fawcett)
Decision Analytic Thinking I: What Is a Good Model? (Chapter 7 of Provost & Fawcett)
Visualizing Model Performance (Chapter 8 of Provost & Fawcett)
Representing and Mining Text (Chapter 10 of Provost & Fawcett)
Data visualization / Infographics / Visual Analytics
Evaluation (subject to change)
Homework assignments (40%)
Exam (30%)
Group Term Project (30%)