Multi-core Programming (CPH351)

Course Description

Parallel computing is a form of computation that utilizes multiple computing resources for solving a problem. Recently, parallel computing is widely employed in various fields (e.g., simulation, graphics, AI, Etc.) to improve the performance in the perspective of speed or accuracy. Among various parallel computing architectures, multi-core CPUs and GPUs are most commonly employed computing resources. This course aims to understand the power of parallel computing and learn a basic programming skill for developing a parallel program. During the 16-week course, students will learn

Concept of parallel computing, How to design parallel algorithms
OpenMP - A programming interface for utilizing multi-core CPUs (MIMD architecture)
CUDA - A programming interface for utilizing a NVIDIA GPU (many-core SIMT architecture)

Also, students will join two team projects whose goal is improving the performance of an application by using multi-core CPUs or/and GPUs.

Previous Years: [Spring, 2019], [Spring 2018] / [Course story] [Team projects]

Class overview (PDF / Video)

- Schedule (This is a flipped-learning course)
  - Lecture: Online (EL)
  - Problem solving: Thur. 11:00~13:00 / #125, 2nd Eng. Building
    - Catch-up class (with TA) : Tue. 09:00~11:00 (#125)
- Instructor: Duksu Kim (bluekds at koreatech.ac.kr / #435, 2nd Eng. Building)
  - Office hour : The. and Thur. 14:00~16:00 (#435)
- Teaching Assistant: JaeHong Lee (oorange31 at koreatech.ac.kr / #331, 2nd Eng. Building)
  - Office hour : The. 15:00~17:00 (#331)
- Course git repository [link]

Prerequisite/Requirements

- (Required) C Programming
- (Recommended) System Programming, Data structure
- (Required) PC or Laptop with a multi-core CPU
- (Recommended) PC or Laptop with a NVIDIA GPU
  - We will rent a development kit (i.g. Jetson Kit) for CUDA if you need
- However, you need to prepare a monitor and a keyboard/mouse yourself to use that.

Textbooks

- [Main textbook] Lecture notes in this page
- (OpenMP) Using OpenMP, B. Chapman et al. (The MIT Press) [link]
- (OpenMP) An Introduction to Parallel Programming, Peter Pacheco (MK) [Eng] [Kor]
- (CUDA) CUDA C Programming guide (NVIDIA) [link]
- (CUDA) Professional CUDA C Programming, Jhon Cheng, Max Grossman, Ty Mckercher (NVIDIA) [link]

Setup CUDA Dev. environments

- Windows Dev. environments [Doc(Kor)][Video]
- Linux(Ubuntu) Dev. environments on Jetson Kit [Doc(Kor)]
- Google COLAB [Doc(Kor)][Video] (For students who do not have a Nvidia GPU)

FAQ page [link]

- This page may not work with Internet Explorer 10 or older versions (Recommended browser: Chrome)

Lecture Slides and Videos (in Korean)

Lecture slides of a week will be uploaded at the beginning of the week, and the lecture video will be updated on EL
- Some of figures and sample code comes from reference textbooks
- At the end of a week, the lecture video will be updated at this homepage (* For your attendance check, you must watch the video on EL)

Lecture 1. Why Parallel Computing?

- What is and Why parallel computing?
- Why parallel programming?

- Slides (Kor) (Eng)
- Video

Lecture 2. Introduction to Parallel Computing

- Flynn’s Taxonomy (SIMD and MIMD)
- Nondeterminism
- Performance of Parallel Computing

- Slides (Kor) (Eng)
- Video

Lecture 3. OpenMP Overview

- What is OpenMP?
- Hello OpenMP!

- Slides (Kor) (Eng)
- Video

[OpenMP] Lab 1 (3/26)

- Hello OpenMP!
- Encrypted Image
  - Files for Lab 1-2 [link]

- Slides
- Video
  - Lab Intro (1-1) (1-2) / Solution
  - Commentary from students
    - Video : (SW Jo) (JH Park)
    - Doc : (JK Shin)
- FAQ

Lecture 4. Introduction to OpenMP (Part1)

- Parallel construct
- Work-sharing construct
- Scope of variables

- Slides (Kor) (Eng)
- Videos

[OpenMP] Lab 2 (4/2)

- Matrix Vector Multiplication
  - Files for Lab 2-1 [link]
- Trapezoidal Rule

- Slides
- Video
  - Lab Intro (2-1) (2-2)
  - Solution (2-1)(2-2)
  - Commentary from students for Lab 2-2
    - Video : (SW Jo) (JH Park)

Lecture 5. Introduction to OpenMP (Part2)

- Synchronization Constructs
- Locks

- Slides (Kor) (Eng)
- Videos

[OpenMP] Lab 3 (4/9)

- Histogram
  - Ver. 1, 2, 3

- Slides
- Video
  - Lab Intro. / Solution
- Commentary from students
  - Video: (SW Jo)
  - Doc: (DG Han)

Lecture 6. Introduction to OpenMP (Part3)

- Reduction clause
- Scheduling clauses
- Nested parallelism

- Slides (Kor) (Eng)
- Videos

Lecture 7. Introduction to GPGPU/CUDA

- Heterogeneous Parallel Computing with CUDA

Slides (Eng)
Video
- Part 1 (HPC Lab. Winter school w4-1)
- Part 2 (HPC Lab. Winter school w4-2)

[CUDA] Lab 0 - CUDA Dev. environment setup

- Check [here]

Lecture 8. CUDA Thread and Execution Model

- Hello CUDA!
- Basic workflow of CUDA Program
- CUDA Thread Hierarchy & Organizing threads

Slides (Kor) (Eng-old)
Video

[CUDA] Lab 4 (5/7)

- Vector sum
- Matrix Addition

- Slides
- Video
  - Lab Intro.
  - Solution (4-1) (4-2)
- Commentary from student
  - Video: JH Park

Lecture 9. CUDA Memory Model

- CUDA memory hierarchy
- Memory model & Performance
- Using shared memory

Slides (Eng)
Video
- CUDA memory hierarchy
- Memory model & Performance
- Using shared memory (Part 1) (Part 2)

[CUDA] Lab 5 (5/21)

- Matrix Multiplication

- Slides
- Video
  - Lab Intro.
  - Solution (5-1) (5-2)
- Commentary from student
  - Doc: JH Yoo

Lecture 10. Maximizing Memory Throughput

- Global memory
  - Aligned memory access
  - Coalesced memory access
- Shared memory
  - - Bank conflict

[CUDA] Lab 6 (5/28)

- Optimizing Matrix Multiplication

- Slides
- No Video for this Lab.

HPC Lab.에서 학부 연구생을 모집 중입니다.

고성능 컴퓨팅, VR/AR, 인공지능/자율주행 등을 함께 공부/연구하고 싶으신 분들 연락주세요 :)

자세한 내용은 여기 를 확인해주세요.

Lecture 11. Synchronization & Concurrent execution

- Synchronization
- CUDA stream
- Concurrent execution
  - Hiding data transfer overhead
- CUDA Event

[CUDA] Lab 7 (6/4)

- Trapezoidal Rule on GPU

- Slides
- Video
  - Lab Intro.
  - Solution
- Commentary from student(s)
  - Doc: JK Shin

Lecture 12. Get More Power!

- Multi-GPUs
- Heterogeneous Computing

Slides
Video
- Multi-GPUs
- Heterogeneous Computing

The final exam (6/16) - Good Luck!

Extra Lectures

- Precision issue on floating point operations for CUDA/GPU
  - Slides / Video
- Nsight Debugger and Profiler for CUDA
  - Slides / Video (TBA)

Multi-core Programming (CPH351)

Course Description

Class overview (PDF / Video)

Prerequisite/Requirements

Textbooks

Setup CUDA Dev. environments

FAQ page [link]

Lecture Slides and Videos (in Korean)

Lecture 1. Why Parallel Computing?

Contents

Lecture 2. Introduction to Parallel Computing

Contents

Lecture 3. OpenMP Overview

Contents

[OpenMP] Lab 1 (3/26)

Contents

Lecture 4. Introduction to OpenMP (Part1)

Contents

[OpenMP] Lab 2 (4/2)

Contents

Lecture 5. Introduction to OpenMP (Part2)

Contents

[OpenMP] Lab 3 (4/9)

Contents

Lecture 6. Introduction to OpenMP (Part3)

Contents

Lecture 7. Introduction to GPGPU/CUDA

Contents

[CUDA] Lab 0 - CUDA Dev. environment setup

Lecture 8. CUDA Thread and Execution Model

Contents

[CUDA] Lab 4 (5/7)

Contents

Lecture 9. CUDA Memory Model

Contents

[CUDA] Lab 5 (5/21)

Contents

Lecture 10. Maximizing Memory Throughput

Contents

[CUDA] Lab 6 (5/28)

Contents

HPC Lab.에서 학부 연구생을 모집 중입니다.

Lecture 11. Synchronization & Concurrent execution

Contents

[CUDA] Lab 7 (6/4)

Contents

Lecture 12. Get More Power!

Contents

The final exam (6/16) - Good Luck!

Extra Lectures