*Email: The instructor & TAs will not answer questions by email. All questions should be directed to Piazza, either publicly (for all students in the class to see), or privately (for only instructor/TAs to see).
This course focuses on both concepts and practice. We will introduce (a) the core data mining concepts and (b) practical skills for applying data mining techniques to solve real-world problems.
- Study the major data mining problems as different types of computational tasks (prediction, classification, clustering, etc.) and the algorithms appropriate for addressing these tasks
- Learn how to analyze data through statistical and graphical summarization, supervised and unsupervised learning algorithms
- Systematically evaluate data mining algorithms and understand how to choose algorithms for different analysis tasks
- Learn how to gather and process raw data into suitable input for a range of data mining algorithms
- Critique the methods and results from a data mining practice
- Design and implement data mining applications using real-world datasets, and evaluate and select proper data mining algorithms to apply to practical scenarios
Students are expected to be familiar with the basics of Linear Algebra, Probability and Statistics, and should be comfortable with programming. We will use R as the primary analysis platform, and hence familiarity of R is preferred.
Attendance is mandatory and will be recorded. Arriving late and leaving early without permission will affect your grade. If you must be absent please contact me in advance. Three or more absences will result in automatic failure of the course except in extraordinary circumstances.
Grades are based on three major activities listed below. Assignments are due as scheduled, and grades on late work will be decreased by 10% per day late. See the assignment page for more details.
- 20% class participation & reading summary
- 40% homework and midterm
- 40% final project (including 4 milestones)
This course will use materials from several recommended books listed below. The first and the third book are available online over Pitt network. The second book is available online. There will be reading assignments over the course of the semester. Links to the electronic copies of these readings will be provided. There are also other recommended books for further reading and for learning R.
- Data Mining and Business Analytics with R, Johannes Ledolter, Wiley, 2013, ISBN: 978-1118447147 (online access via Pitt network) (primary book, hereafter referred as "DMR")
- Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (2nd ed.), Bing Liu, Springer, 2011, ISBN: 978-3642194597 (available online) (secondary book, hereafter referred as "WDM")
- Practical Data Science with R, Nina Zumel and John Mount, Manning Publications 2014, ISBN: 9781617291562 (online access via Pitt network) (third book, hereafter referred as "DSR")
See the list of recommended readings.
See the course schedule page.
See the university policies page.