# CSCE822 Data Mining and Warehousing

University of South Carolina, CSE

Fall, 2023

Meeting time: Tuesday/Thursday 1:15-2:30 pm. Location: Swearingen Bld. 2A11

Instructor: Dr. Jianjun Hu Research lab: Machine learning and evolution Lab

Office Hour: Tuesday/Thursday 2:30 to 3:20 pm or by appointment.

Class overview

Data Mining studies algorithms and computational paradigms that allow computers to find patterns and regularities in databases, perform predictions and forecasting, and generally improve their performance through interaction with data. It is currently regarded as the key element of a more general process called Knowledge Discovery that deals with extracting useful knowledge from raw data. The knowledge discovery process includes data selection, cleaning, coding, using different statistical and machine learning techniques, and visualization of the generated structures. The course will cover all these issues and will illustrate the whole process by examples. Special emphasis will be given to machine learning and deep learning methods as they provide the real knowledge discovery tools.

Textbook: Introduction to Data Mining (2nd Edition) by Pang-ning Tan et al.

ISBN-13: 978-0133128901ISBN-10: 0133128903 (Amazon link)

Textbook companion website (with slides, codes, etc)

Main course tasks:

Four homeworks, 1 Midterm, 1 final project proposal, 1 final project report and presentation.

Learning outcomes

After completion of this course the student should be able to:

Understand what Is Data Mining, what kinds of data can be mined, what kinds of patterns can be mined, and what kinds of applications are targeted.

Explain major Issues in data mining.

Applymachine learning, pattern recognition, statistics, visualization, algorithm, database technology and high-performance computing in data mining applications.

Identify what kinds of technologies are used for different application.

Manipulate data preprocessing, mining frequent patterns andassociation, classification, clustering, and outlier detection.

Topics

Classification algorithms (KNN, Decision trees, Random forst, SVM, Neural networks)

Regression analysis

Clustering

Model/performance evaluation

Deep learning

Frequent itemset mining

time series mining

text mining (NLP)

graph mining

Graph neural networks

Grading

Your grade is determined by the following four parts:

Assignments: 35% of your grade will be determined by assignments throughout the semester.

Midterm: 30%

Final project: 30%

Attendance: 5%

Grades are on the following fixed scale: A [90 – 100] B+ [86 – 90) B [75 – 86) C+ [70 – 75) C [60 – 70) D+ [55 – 60) D [40 – 55) F [0 – 40)