Syllabus
This course, 'Introduction to Data Mining,' spans 16 weeks and is aimed at providing students with a comprehensive understanding of how to extract valuable insights from large datasets. It covers a range of topics, from fundamental concepts to advanced data mining techniques. Students will learn to identify patterns, analyse real-time data streams, understand link analysis, and apply clustering techniques. The course also delves into specific applications like web advertising strategies, recommendation systems, and social network analysis. Designed for individuals with a basic background in math and programming, the course combines theoretical knowledge with practical assignments, preparing students to tackle real-world data mining challenges effectively
2024S courses (TA: Song Kim, Dahee Kim, Hyewon Kim)
Midterm (statistics), Final (statistics)
2025S courses (TA: Eunyeong Sim, Euibin Bae)
Midterm (statistics), Final (statistics) Final_solution
2025S course outcomes:
(k,o)-core: 멀티 오더 그래프에서 응집성을 갖는 서브그래프 식별, Minseok Kim et al, 데이터베이스 연구 2025 (2nd round) [paper]
LLAMA: Leiden-LPA 기반 대규모 커뮤니티 분석, Hyeongmin Son et al., 한국 데이터베이스 학술대회 2025
위치 기반 소셜 네트워크에서 공간 근접 커뮤니티 탐지를 위한 엔트로피 가중 적응형 라벨 전파 알고리즘, Doyeol Oh et al., 한국 데이터베이스 학술대회 2025
WSCAN++: 가중치 기반 구조적 클러스터링 알고리즘, Seungchan Choi et al., 한국 데이터베이스 학술대회 2025
선호 지역성 기반 소셜 네트워크에서의 커뮤니티 탐지, Deokhyun Kim et al., 데이터베이스 연구 2025 (3rd round)
Tentative Outlines for the Courses
Introduction to Data Mining - Introduction to the fundamental concepts, techniques, and applications of data mining. This includes understanding the process of discovering patterns and knowledge from large datasets.
Readings:
Data - Examination of different types of data, data quality, and pre-processing techniques such as cleaning, integration, reduction, and transformation.
Readings:
Frequent pattern mining - Techniques for identifying frequent patterns, associations, and correlations within datasets. This includes algorithms like Apriori, FP-Growth, and ECLAT.
Readings:
Classification - Methods for classifying data into predefined categories. This includes various algorithms like decision trees, Naive Bayes, support vector machines, and neural networks.
Readings:
Introduction to Data Mining - Ch 3, 4
Clustering - Techniques for grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. Algorithms include k-means, hierarchical clustering, CURE, and DBSCAN.
Readings:
Hashing - Use of hashing techniques for efficient data retrieval, including hash functions, hash tables, and applications in data mining.
Readings:
Social network analysis - Analysis of social networks to understand the relationships and interactions among social entities. Topics include community detection, influence propagation, and network metrics.
Readings:
Link analysis - Study of techniques for analyzing the connections and relationships between nodes in a network. This includes algorithms like PageRank and HITS.
Readings:
Stream data analysis - Techniques for analyzing data that arrives in a continuous stream. This includes methods for processing, summarising, and querying stream data in real-time.
Readings:
Recommendation systems - Recommendation systems predict user preferences and suggest relevant items using collaborative filtering, content-based filtering, and hybrid methods, while evaluating performance with metrics like precision and recall.
Readings:
Recitation
Java and Eclipse tutorial - Introduction to Java programming and the Eclipse integrated development environment. This includes setting up the environment, basic programming constructs, and developing simple applications.
References:
https://www.eclipse.org/
Overleaf tutorial - Introduction to Overleaf, an online LaTeX editor. This includes creating, editing, and collaborating on LaTeX documents for academic writing.
References:
Summary of data structure & algorithms - Overview of fundamental data structures, their properties, and applications.
References:
Data structures & Algorithms in Java - Ch 1-14
Summary of probability, statistics, and linear algebra - Introduction to basic probability and statistics that are utilised in data mining course.
Polynomial time reduction & NP-hardness - Concepts of polynomial-time reduction, NP-completeness, and NP-hardness.
Readings:
Introduction to Algorithms - Ch 34
Algorithm Design - Ch 8
Approximation algorithm - Study of approximation algorithms for NP-hard problems, PTAS, FPTAS, and their performance guarantees.
Readings:
Introduction to Algorithms - Ch 35
Algorithm Design - Ch 11
Flow - Understanding flow in networks and related algorithms.
Readings:
Algorithm Design - Ch 7
References:
The Algorithm Design Manual - Ch 8
Introduction to Algorithms - Ch 24
Algorithms by Jeff Erickson - Ch 10
Tutorials
LaTeX: http://www.ctan.org/tex-archive/info/lshort/english/lshort.pdf
Overleaf: https://ko.overleaf.com/learn/latex/Learn_LaTeX_in_30_minutes
Install Eclipse: https://www.educative.io/answers/how-to-install-eclipse-ide
Install Java 11: https://drive.google.com/file/d/1J7M4Aud_7MocSNa6qAaq7dsHQPkmALdO/view?usp=drive_link
With highest honor
2024S: Donggyu Lee
2025S: Woungjae Choo