Advanced Parallel Computing (240199)

Classes(Fall, 2020) - Class overview (PDF)

  • Online class

  • Instructor: Duksu Kim (bluekds at koreatech.ac.kr / #435, 2nd Eng. Building)

  • Office hour : Mon. 14:00~16:00

Prerequisite

  • (Required) C Programming

  • (Strongly Recommended) Multi-core Programming (Undergraduate level)

  • (Recommended) System Programming, Data structure

  • (Required) PC or Laptop with a multi-core CPU / (Recommended) PC or Laptop with a NVIDIA GPU

    • We will rent a development kit (e.g., Jetson kit) for CUDA if you need

    • However, you need to prepare a monitor and a keyboard/mouse yourself to use that.

Textbooks

  • Lecture notes in this page

  • Lecture videos for the multi-core programming class (CPH351) [link]

    • Sample codes are available at the git repository [link]

  • References

    • (OpenMP) Using OpenMP, B. Chapman et al. (The MIT Press) [link]

    • (CUDA) CUDA C Programming guide (NVIDIA) [link]

    • (CUDA) Professional CUDA C Programming, Jhon Cheng, Max Grossman, Ty Mckercher (NVIDIA) [link]

Setup CUDA Dev. environments

    • Windows Dev. environments [Kor]

    • Linux(Ubuntu) Dev. environments on Jetson Kit [Kor]

    • Trouble shooting

      • Q. My laptop has a Nvidia GPU, but CUDA does not work properly

        • A. Check the GPU system on your laptop whether a hybrid GPU system (e.g., Intel HD graphics + Nvidia GPU)

          • In this case, disabling the intel GPU on the device manager of you OS may fix the problem

Lecture Notes and Videos

Lecture 1. Parallel Processing Overview (9/2)

    • What is and Why Parallel Computing

    • Parallel Program Performance

    • Parallel Program Design

    • Parallel Processing Hardware

    • Heterogeneous Computing

Lecture 2-3. OpenMP Overview (9/9, 9/16)

    • OpenMP introduction

    • Parallel construct

    • Work-sharing construct

    • Scope of Variables

    • Synchronization construct & Locks

    • Nested parallelism

Lecture 4. CUDA Overview I (9/23)

    • Introduction to GPGPU

    • Hello CUDA

    • Basic Workflow of CUDA

    • CUDA Thread Hierarchy

    • Organizing Threads

Lecture 5. CUDA Overview II (9/30)

    • CUDA Execution Model

    • CUDA Memory Model & Performance

    • Using Shared Memory

    • Maximizing Memory Throughput

Lecture 6. CUDA Overview III (10/7)

    • Synchronization

    • CUDA Stream & Concurrent Execution

    • CUDA Event

    • Multi-GPUs and Heterogeneous Computing

Contents

Paper Seminar (10/14)

    • Efficient Implementation of Strassen's Algorithm for Memory Allocation using AVX Intrinsic on Multi-core Architecture [paper]

      • Nwe Zin Oo and Panyayot Chaikan, ITC-CSCC 2019

      • Presented by Homin Kang (Slides) (Video)

    • CUKNN: A parallel implementation of K-nearest neighbor on CUDA-enabled GPU [paper]

      • Liang et al, IEEE Youth Conference on Information, Computing and Telecommunication 2009

      • Presented by Yonggyu Kim (Slides) (Video)

    • Fast Parallel Training of Neural Language Models [paper]

      • Xiao et al, IJCA 2017

      • Presented by Jin-Hwan Kim (Slides) (Video)

Paper Seminar (10/21)

    • Fast Filtering of LiDAR Point Cloud in Urban Areas Based on Scan Line Segmentation and GPU Acceleration [paper]

      • Hu et al., IEEE Geoscience and Remote Sensing Letters, 2012

      • Presented by Jae-Min Sa (Slides) (Video)

    • Energy-efficient excution of data-parallel application on heterogeneous mobile platforms [paper]

      • Prakash et al, ICCD, 2015

      • Presented by Euihyeok Lee (Slides) (Video)

    • Safety view management for augmented reality based on MapReduce strategy on multi-core processors [paper]

      • Hsiao-Chien Tsai, ITST 2013

      • Presented by Juhwan Lee (Slides) (Video)

    • Real-Time Face Detection and Tracking Utilising OpenMP and ROS [paper]

      • Tusa et al., Asia-Pacific Conference on Computer Aided System Engineering 2015

      • Presented by Ye-Chan Choi (Slides) (Video)

[Project] Proposal presentation (10/28)

Paper Seminar (11/04)

    • M-DTM: Migration-based dynamic thermal management for heterogeneous mobile multi-core processors [paper]

      • Kim et al, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015

      • Presented by Sang-Won Hwang (Slides) (Video)

    • Parallel Processing for Data Deduplication [paper]

      • P Sobe et al, PARS-Mitteilungen, 2015

      • Presented by In-Chul Hwang (Slides) (Video)

    • Robust Dynamic Resource Allocation via Probabilistic Task Pruning in Heterogeneous Computing Systems [paper]

      • Gentry et al., IPDPS, 2019

      • Presented by Homin Kang (Slides) (Video)

    • Parallel K Nearest Neighbor Matching for 3D Reconstruction [paper]

      • Cao et al., IEEE Access, 2019

      • Presented by Yonggyu Kim (Slides) (Video)

Paper Seminar (11/11)

    • Parallel Scheduled Sampling [paper]

      • Duckworth et al., arXiv:1906.04331, Jun 2019

      • Presented by Jin-Hwan Kim (Slides) (Video)

    • Neural Network Implementation using CUDA and OpenMP [paper]

      • Jang et al., Digital Image Computing: Techniques and Applications, 2008

      • Presented by Jae-Min Sa (Slides) (Video)

    • SandTrap: Trackiing information flows on demand with parallel permissions [paper]

      • Razeen et al, MobiSys, 2018

      • Presented by Euihyeok Lee (Slides) (Video)

[Project] Midterm presentation (11/18)

Paper Seminar (11/25)

    • CPU and GPU Parallel Processing for Mobile Augmented Reality [paper]

      • Baek et al., International Congress on Image and Signal Processing (CISP), 2013

      • Presented by Juhwan Lee (Slides) (Video)

    • Light Field Depth Estimation on Off-the-Shelf Mobile GPU [paper]

      • Ivan et al, CVPR Workshops, 2018

      • Presented by Ye-Chan Choi (Slides) (Video)

    • DeepSense: A GPU-based Deep Convolutional Neural Network Framework on Commodity Mobile Devices [paper]

      • Huynh et al., Workshop on Wearable Systems and Applications, 2016

      • Presented by Sang-Won Hwang (Slides) (Video)

12/02 - No class

[Project] Final presentation (12/09, 12/16)

Paper Seminar (Pending)

    • Performance and Scalability of GPU-based Convolutional Neural Networks [paper]

      • Strigl et al., IEEE Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010

      • Presented by Joon-Ho Park (Slides) (Video)

    • Flexible, High Performance Convolutional Neural Networks for Image Classification [paper]

      • Curesan et al., Twenty-Second International Joint Conference on Artificial Intelligence, 2011

      • Presented by Joon-Ho Park (Slides) (Video)