Multi-core Programming (CPH351)

Course Description

Parallel computing is a form of computation that utilizes multiple computing resources for solving a problem. Recently, parallel computing is widely employed in various fields (e.g., simulation, graphics, AI, Etc.) to improve the performance in the perspective of speed or accuracy. Among various parallel computing architectures, multi-core CPUs and GPUs are most commonly employed computing resources. This course aims to understand the power of parallel computing and learn a basic programming skill for developing a parallel program. During the 16-week course, students will learn

  • Concept of parallel computing, How to design parallel algorithms

  • OpenMP - A programming interface for utilizing multi-core CPUs (MIMD architecture)

  • CUDA - A programming interface for utilizing a NVIDIA GPU (many-core SIMT architecture)

Also, students will join two team projects whose goal is improving the performance of an application by using multi-core CPUs or/and GPUs.

Class overview (PDF / Video)

    • Schedule (This is a flipped-learning course)

      • Lecture: Online (EL)

      • Problem solving: Thur. 11:00~13:00 / #125, 2nd Eng. Building

        • Catch-up class (with TA) : Tue. 09:00~11:00 (#125)

    • Instructor: Duksu Kim (bluekds at koreatech.ac.kr / #435, 2nd Eng. Building)

      • Office hour : The. and Thur. 14:00~16:00 (#435)

    • Teaching Assistant: JaeHong Lee (oorange31 at koreatech.ac.kr / #331, 2nd Eng. Building)

      • Office hour : The. 15:00~17:00 (#331)

    • Course git repository [link]

Prerequisite/Requirements

    • (Required) C Programming

    • (Recommended) System Programming, Data structure

    • (Required) PC or Laptop with a multi-core CPU

    • (Recommended) PC or Laptop with a NVIDIA GPU

      • We will rent a development kit (i.g. Jetson Kit) for CUDA if you need

    • However, you need to prepare a monitor and a keyboard/mouse yourself to use that.

Textbooks

    • [Main textbook] Lecture notes in this page

    • (OpenMP) Using OpenMP, B. Chapman et al. (The MIT Press) [link]

    • (OpenMP) An Introduction to Parallel Programming, Peter Pacheco (MK) [Eng] [Kor]

    • (CUDA) CUDA C Programming guide (NVIDIA) [link]

    • (CUDA) Professional CUDA C Programming, Jhon Cheng, Max Grossman, Ty Mckercher (NVIDIA) [link]

Setup CUDA Dev. environments

    • Windows Dev. environments [Doc(Kor)][Video]

    • Linux(Ubuntu) Dev. environments on Jetson Kit [Doc(Kor)]

    • Google COLAB [Doc(Kor)][Video] (For students who do not have a Nvidia GPU)

FAQ page [link]

    • This page may not work with Internet Explorer 10 or older versions (Recommended browser: Chrome)

Lecture Slides and Videos (in Korean)

  • Lecture slides of a week will be uploaded at the beginning of the week, and the lecture video will be updated on EL

    • Some of figures and sample code comes from reference textbooks

    • At the end of a week, the lecture video will be updated at this homepage (* For your attendance check, you must watch the video on EL)

Lecture 1. Why Parallel Computing?

    • What is and Why parallel computing?

    • Why parallel programming?

Contents

Lecture 2. Introduction to Parallel Computing

    • Flynn’s Taxonomy (SIMD and MIMD)

    • Nondeterminism

    • Performance of Parallel Computing

Contents

Lecture 3. OpenMP Overview

    • What is OpenMP?

    • Hello OpenMP!

Contents

[OpenMP] Lab 1 (3/26)

    • Hello OpenMP!

    • Encrypted Image

      • Files for Lab 1-2 [link]

Contents

Lecture 4. Introduction to OpenMP (Part1)

    • Parallel construct

    • Work-sharing construct

    • Scope of variables

[OpenMP] Lab 2 (4/2)

    • Matrix Vector Multiplication

      • Files for Lab 2-1 [link]

    • Trapezoidal Rule

Contents

Lecture 5. Introduction to OpenMP (Part2)

    • Synchronization Constructs

    • Locks

Contents

[OpenMP] Lab 3 (4/9)

    • Histogram

      • Ver. 1, 2, 3

Contents

Lecture 6. Introduction to OpenMP (Part3)

    • Reduction clause

    • Scheduling clauses

    • Nested parallelism

Lecture 7. Introduction to GPGPU/CUDA

    • Heterogeneous Parallel Computing with CUDA

Contents

  • Slides (Eng)

  • Video

    • Part 1 (HPC Lab. Winter school w4-1)

    • Part 2 (HPC Lab. Winter school w4-2)

[CUDA] Lab 0 - CUDA Dev. environment setup

Lecture 8. CUDA Thread and Execution Model

    • Hello CUDA!

    • Basic workflow of CUDA Program

    • CUDA Thread Hierarchy & Organizing threads

[CUDA] Lab 4 (5/7)

    • Vector sum

    • Matrix Addition

Contents

Lecture 9. CUDA Memory Model

    • CUDA memory hierarchy

    • Memory model & Performance

    • Using shared memory

Contents

[CUDA] Lab 5 (5/21)

    • Matrix Multiplication

Contents

Lecture 10. Maximizing Memory Throughput

    • Global memory

      • Aligned memory access

      • Coalesced memory access

    • Shared memory

        • Bank conflict

[CUDA] Lab 6 (5/28)

    • Optimizing Matrix Multiplication

Contents

HPC Lab.에서 학부 연구생을 모집 중입니다.

고성능 컴퓨팅, VR/AR, 인공지능/자율주행 등을 함께 공부/연구하고 싶으신 분들 연락주세요 :)

자세한 내용은 여기 를 확인해주세요.

Lecture 11. Synchronization & Concurrent execution

    • Synchronization

    • CUDA stream

    • Concurrent execution

      • Hiding data transfer overhead

    • CUDA Event

[CUDA] Lab 7 (6/4)

    • Trapezoidal Rule on GPU

Contents

Lecture 12. Get More Power!

    • Multi-GPUs

    • Heterogeneous Computing

The final exam (6/16) - Good Luck!

Extra Lectures

    • Precision issue on floating point operations for CUDA/GPU

    • Nsight Debugger and Profiler for CUDA