Advanced Parallel Computing (240199)

Classes(Fall, 2020) - Class overview (PDF)

Online class
Instructor: Duksu Kim (bluekds at koreatech.ac.kr / #435, 2nd Eng. Building)
Office hour : Mon. 14:00~16:00

Prerequisite

(Required) C Programming
(Strongly Recommended) Multi-core Programming (Undergraduate level)
(Recommended) System Programming, Data structure
(Required) PC or Laptop with a multi-core CPU / (Recommended) PC or Laptop with a NVIDIA GPU
- We will rent a development kit (e.g., Jetson kit) for CUDA if you need
- However, you need to prepare a monitor and a keyboard/mouse yourself to use that.

Textbooks

Lecture notes in this page
Lecture videos for the multi-core programming class (CPH351) [link]
- Sample codes are available at the git repository [link]
References
- (OpenMP) Using OpenMP, B. Chapman et al. (The MIT Press) [link]
- (CUDA) CUDA C Programming guide (NVIDIA) [link]
- (CUDA) Professional CUDA C Programming, Jhon Cheng, Max Grossman, Ty Mckercher (NVIDIA) [link]

Setup CUDA Dev. environments

- Windows Dev. environments [Kor]
- Linux(Ubuntu) Dev. environments on Jetson Kit [Kor]
- Trouble shooting
  - Q. My laptop has a Nvidia GPU, but CUDA does not work properly
    - A. Check the GPU system on your laptop whether a hybrid GPU system (e.g., Intel HD graphics + Nvidia GPU)
      - In this case, disabling the intel GPU on the device manager of you OS may fix the problem

Lecture Notes and Videos

Lecture 1. Parallel Processing Overview (9/2)

- What is and Why Parallel Computing
- Parallel Program Performance
- Parallel Program Design
- Parallel Processing Hardware
- Heterogeneous Computing

- Lecture slides

Lecture 2-3. OpenMP Overview (9/9, 9/16)

- OpenMP introduction
- Parallel construct
- Work-sharing construct
- Scope of Variables
- Synchronization construct & Locks
- Nested parallelism

- Lecture slides
- Related videos

Lecture 4. CUDA Overview I (9/23)

- Introduction to GPGPU
- Hello CUDA
- Basic Workflow of CUDA
- CUDA Thread Hierarchy
- Organizing Threads

- Lecture slides
- Related videos

Lecture 5. CUDA Overview II (9/30)

- CUDA Execution Model
- CUDA Memory Model & Performance
- Using Shared Memory
- Maximizing Memory Throughput

- Lecture slides
- Related videos

Lecture 6. CUDA Overview III (10/7)

- Synchronization
- CUDA Stream & Concurrent Execution
- CUDA Event
- Multi-GPUs and Heterogeneous Computing

- Lecture slides
- Related videos

Paper Seminar (10/14)

- Efficient Implementation of Strassen's Algorithm for Memory Allocation using AVX Intrinsic on Multi-core Architecture [paper]
  - Nwe Zin Oo and Panyayot Chaikan, ITC-CSCC 2019
  - Presented by Homin Kang (Slides) (Video)
- CUKNN: A parallel implementation of K-nearest neighbor on CUDA-enabled GPU [paper]
  - Liang et al, IEEE Youth Conference on Information, Computing and Telecommunication 2009
  - Presented by Yonggyu Kim (Slides) (Video)
- Fast Parallel Training of Neural Language Models [paper]
  - Xiao et al, IJCA 2017
  - Presented by Jin-Hwan Kim (Slides) (Video)

Paper Seminar (10/21)

- Fast Filtering of LiDAR Point Cloud in Urban Areas Based on Scan Line Segmentation and GPU Acceleration [paper]
  - Hu et al., IEEE Geoscience and Remote Sensing Letters, 2012
  - Presented by Jae-Min Sa (Slides) (Video)
- Energy-efficient excution of data-parallel application on heterogeneous mobile platforms [paper]
  - Prakash et al, ICCD, 2015
  - Presented by Euihyeok Lee (Slides) (Video)
- Safety view management for augmented reality based on MapReduce strategy on multi-core processors [paper]
  - Hsiao-Chien Tsai, ITST 2013
  - Presented by Juhwan Lee (Slides) (Video)
- Real-Time Face Detection and Tracking Utilising OpenMP and ROS [paper]
  - Tusa et al., Asia-Pacific Conference on Computer Aided System Engineering 2015
  - Presented by Ye-Chan Choi (Slides) (Video)

[Project] Proposal presentation (10/28)

Paper Seminar (11/04)

- M-DTM: Migration-based dynamic thermal management for heterogeneous mobile multi-core processors [paper]
  - Kim et al, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015
  - Presented by Sang-Won Hwang (Slides) (Video)
- Parallel Processing for Data Deduplication [paper]
  - P Sobe et al, PARS-Mitteilungen, 2015
  - Presented by In-Chul Hwang (Slides) (Video)
- Robust Dynamic Resource Allocation via Probabilistic Task Pruning in Heterogeneous Computing Systems [paper]
  - Gentry et al., IPDPS, 2019
  - Presented by Homin Kang (Slides) (Video)
- Parallel K Nearest Neighbor Matching for 3D Reconstruction [paper]
  - Cao et al., IEEE Access, 2019
  - Presented by Yonggyu Kim (Slides) (Video)

Paper Seminar (11/11)

- Parallel Scheduled Sampling [paper]
  - Duckworth et al., arXiv:1906.04331, Jun 2019
  - Presented by Jin-Hwan Kim (Slides) (Video)
- Neural Network Implementation using CUDA and OpenMP [paper]
  - Jang et al., Digital Image Computing: Techniques and Applications, 2008
  - Presented by Jae-Min Sa (Slides) (Video)
- SandTrap: Trackiing information flows on demand with parallel permissions [paper]
  - Razeen et al, MobiSys, 2018
  - Presented by Euihyeok Lee (Slides) (Video)

[Project] Midterm presentation (11/18)

Paper Seminar (11/25)

- CPU and GPU Parallel Processing for Mobile Augmented Reality [paper]
  - Baek et al., International Congress on Image and Signal Processing (CISP), 2013
  - Presented by Juhwan Lee (Slides) (Video)
- Light Field Depth Estimation on Off-the-Shelf Mobile GPU [paper]
  - Ivan et al, CVPR Workshops, 2018
  - Presented by Ye-Chan Choi (Slides) (Video)
- DeepSense: A GPU-based Deep Convolutional Neural Network Framework on Commodity Mobile Devices [paper]
  - Huynh et al., Workshop on Wearable Systems and Applications, 2016
  - Presented by Sang-Won Hwang (Slides) (Video)

12/02 - No class

[Project] Final presentation (12/09, 12/16)

Paper Seminar (Pending)

- Performance and Scalability of GPU-based Convolutional Neural Networks [paper]
  - Strigl et al., IEEE Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010
  - Presented by Joon-Ho Park (Slides) (Video)
- Flexible, High Performance Convolutional Neural Networks for Image Classification [paper]
  - Curesan et al., Twenty-Second International Joint Conference on Artificial Intelligence, 2011
  - Presented by Joon-Ho Park (Slides) (Video)

Advanced Parallel Computing (240199)

Classes(Fall, 2020) - Class overview (PDF)

Prerequisite

Textbooks

Setup CUDA Dev. environments

Lecture Notes and Videos

Lecture 1. Parallel Processing Overview (9/2)

Contents

Lecture 2-3. OpenMP Overview (9/9, 9/16)

Contents

Lecture 4. CUDA Overview I (9/23)

Contents

Lecture 5. CUDA Overview II (9/30)

Contents

Lecture 6. CUDA Overview III (10/7)

Contents

Paper Seminar (10/14)

Paper Seminar (10/21)

[Project] Proposal presentation (10/28)

Paper Seminar (11/04)

Paper Seminar (11/11)

[Project] Midterm presentation (11/18)

Paper Seminar (11/25)

12/02 - No class

[Project] Final presentation (12/09, 12/16)

Paper Seminar (Pending)