Big Data and Knowledge Management Systems (빅데이터 및 지식 관리 시스템)

Spring 2021 (Tue/Thur 11:00~12:15)

Instructor: Prof. SangKyun cha (chask@snu.ac.kr)

TA: Joohyun Lee(jhlee@kdb.snu.ac.kr), Jaehwan Lim(amethyst@snu.ac.kr)

Course Overview

Summary

This course deals with modeling and management of various types of big data and knowledge from the data-driven service life cycle perspective. Students learn modern technologies of ingestion, storage, distribution, and processing of big data in parallel and distributed cloud computing environment. Final project involves a group of students collecting real-world big data and build a data science analysis using relational and graph databases. Students report their own solutions on data science problems and demonstrate feasibility of the solutions.

  • Historical development of data models and data management systems

  • Structured data management and relational data model

  • Relational data storage and meta data management, and query processing

  • Transaction management and database recovery management

  • Spatio-temporal data, graph data, semi-structured and unstructured data, and knowledge structures

  • Parallel and distributed big data systems and complex analytics processing and machine learning on cloud infrastructure

  • Big data and model life cycle management


Grading scheme

Attendance 5%

Assignment 30%

Midterm exam 25%

Final exam 25%

Project 15%


Content

  1. Introduction

              • Real-world data and knowledges structures, and their life cycle

              • Data management of Structured and Unstructured data

              • Query, Complex Analytics, and Machine Learning

              • Transactions, Concurrency Control, Logging, and Recovery

              • HW development and Real-world demands driving Technology Paradigm Changes: Relational DBMS, In-Memory Platform, Ambient AI

  2. First Order Logic, PROLOG

              • First Order Logic Fundamentals, Translating Knowledge into Logic

              • Resolution Algorithm, Unification

              • PROLOG Syntax, PROLOG Tutorial (Hanoi Tower, RDBMS, Graph)

  3. Natural Language Processing in PROLOG

              • Debugging and Tracing in Prolog

              • Natural Language Analysis in Prolog

  4. Frequent Graph Structures and Applications, Graph Storage and Operators -Neo4j

  5. Neo4j in Depth

  6. Spatial/Temporal Type in Neo4j

  7. Relational DBMS

              • Relational Algebra and Query Language

              • Physical Tables: Row vs Column Stores, Indexes

              • Why we need Virtual Table(View)

  8. SQL: Data Definition Language, Data Manipulation Language, Data Control Language, Transactions: ACID Properties, Isolation Levels

  9. Index as Redundant Structures for Fast Access: B+-Tree

  10. Logical and Physical Query Plan, Relational Query Optimization

  11. In-Memory Data Management Architecture, Column Store

  12. Dictionary Encoding for Compression in Column Store

  13. In-Memory Complex Analytical Query Processing

  14. Transaction Processing, Concurrency Control - Locking and Latching, Multi-Version Concurrency Control(MVCC), Index Concurrency Control (with OLFIT)

  15. Logging & Recovery, and Replication for High- Availability

  16. Distributed Systems for Scalability: Shared-Nothing Partitioning, Distributed Transaction and Query Processing, Distributed Computing for Cloud and Edge System

  17. Data Cleansing, ETL(Extract-Transform-Load)

  18. Distributed Workflow Management and Long Transactions