CSCE822 Data Mining and Warehousing
University of South Carolina, CSE
Spring, 2025
University of South Carolina, CSE
Spring, 2025
Meeting time: Tuesday/Thursday 1:15-2:30 pm. Location: 300 Main st B112
Instructor: Dr. Jianjun Hu Research lab: Machine learning and evolution Lab
Office Hour: Tuesday/Thursday 2:30 to 3:30 pm or by appointment.
Class overview
Data Mining studies algorithms and computational paradigms that allow computers to find patterns and regularities in databases, perform predictions and forecasting, and generally improve their performance through interaction with data. It is currently regarded as the key element of a more general process called Knowledge Discovery that deals with extracting useful knowledge from raw data. The knowledge discovery process includes data selection, cleaning, coding, using different statistical and machine learning techniques, and visualization of the generated structures. The course will cover all these issues and will illustrate the whole process by examples. Special emphasis will be given to machine learning and deep learning methods as they provide the real knowledge discovery tools.
Textbook: Introduction to Data Mining (2nd Edition) by Pang-ning Tan et al.
ISBN-13: 978-0133128901ISBN-10: 0133128903 (Amazon link)
Textbook companion website (with slides, codes, etc)
Main course tasks:
Four homeworks, 1 Midterm, 1 final project proposal, 1 final project report and presentation.
Learning outcomes
After completion of this course the student should be able to:
Understand what Is Data Mining, what kinds of data can be mined, what kinds of patterns can be mined, and what kinds of applications are targeted.
Explain major Issues in data mining.
Applymachine learning, pattern recognition, statistics, visualization, algorithm, database technology and high-performance computing in data mining applications.
Identify what kinds of technologies are used for different application.
Manipulate data preprocessing, mining frequent patterns andassociation, classification, clustering, and outlier detection.
Topics
Classification algorithms (KNN, Decision trees, Random forst, SVM, Neural networks)
Regression analysis
Clustering
Model/performance evaluation
Deep learning
Frequent itemset mining
time series mining
text mining (NLP)
graph mining
Graph neural networks
Grading
Your grade is determined by the following four parts:
Assignments: 35% of your grade will be determined by assignments throughout the semester.
Midterm: 30%
Final project: 30%
Attendance: 5%
Grades are on the following fixed scale: A [90 – 100] B+ [86 – 90) B [75 – 86) C+ [70 – 75) C [60 – 70) D+ [55 – 60) D [40 – 55) F [0 – 40)