Special Topic in Computer Architecture

Special Topic in Computer Architecture:

"Parallel Computing and Acceleration for HPC"

* All the programming will be tested in a Linux based computing platform (Raspberry Pi and Jetson TK1).

(Week 1: 9 March)

1-1 Conventional Computer Architecture and Parallel Computing Basic

(Week 2: 16 March)

1-2 Optimal Manycore Computing: A Performance/Energy Analysis and Optimization of Multi-Core Architectures with Voltage Scaling Techniques

(Paper PDF) (PPT PDF)

(Week 3: 23 March)

2. CPU:: Parallel Computing Programming - pthread & CPU Vector Processing

* pthread Programming

- Linux pthread Instructions

- Multi-threaded Matrix Multiplication

* Vector Processing (30 March)

- Crunching Numbers with AVX and AVX2 (Intel)

LINK : Cornell's Vectorization

- ARM Vector Instructions

NEON

NEON

The ARM® NEON™ general-purpose SIMD engine efficiently processes current and future multimedia formats, enhancing the user experience.

NEON technology can accelerate multimedia and signal processing algorithms such as video encode/decode, 2D/3D graphics, gaming, audio and speech processing, image processing, telephony, and sound synthesis by at least 3x the performance of ARMv5 and at least 2x the performance of ARMv6 SIMD.

NEON Image

LINK: ARM Vector Instructions List

LINK: SIMD Assembly Tutorial: ARM NEON

LINK : ARM NETON Instruction Set and Why You Should Care

LINK : Using your C compiler to exploit NEON™ Advanced SIMD

LINK : Programming with vector instructions (Intel Family, MMX, SSE, AVX)

Demo: Lab Exercie

(Week 4: 30 March)

3. CPU:: Parallel Computing Programming - OpenMP

https://www.youtube.com/watch?v=6FMn7M5jxrM

https://www.youtube.com/playlist?list=PLLX-Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG

Demo: Lab Exercie

(Week 5: 6/27 April)

4-1. GPU:: Parallel Computing Programming - CUDA Basic and GPU Architecture

4-2. GPU:: Parallel Computing Programming - CUDA Optimization

Demo: Lab Exercie

(Week 6: -)

4-3. GPU:: Parallel Computing Programming - CUDA Enabled Embeded Computing

Demo: Lab Exercie

(Week 7: 4 May)

5. CPU/GPU/FPGA:: Parallel Computing Programming - OpenCL 1

* Vector Addition detailed explanation

[ref] OpenCL setting for Visual Studio (LINK)

- Bristol OpenCL

(Optional)

Opt-1 FPGA:: Hardware Acceleration - FPGA Basics

Opt-2 FPGA:: Hardware Acceleration - Verilog HDL

Demo: Lab Exercie

Opt-3 FPGA:: Hardware Acceleration - Simple MIPS Processor Design

Opt-4 FPGA:: Hardware Acceleration - Manycore MIPS Processor Design

Demo: Lab Exercie

* Paper Survey and Presentation

(Week 7: 18 May)

AYINEBYONA ELIAB: Solving a Big-Data Problem with GPU: The Network Traffic Analysis

DAM MINH TUNG (OpenCL for FPGA Acceleration) : Using OpenCL to Rapidly Prototype FPGA Designs

NGUYEN VAN TOAN (OpenCL for FPGA Acceleration): Characterization of OpenCL on a Scalable FPGA Architecture

Implementing FPGA Design with the OpenCL Standard PDF

(Week 8: 25 May)

김원표: Generative Adversarial Networks

박승주: Performance Analysis of GPU-based Convolutional Neural Networks

(Week 9: 1 June May)

엄주성

(Week 9: 11 May): No Class since some of students have to join their project meeting.

(Week 9: 8 June)

신지훈

전한재

우윤희

김용휘: GAMT: A Fast and Scalable IP Lookup Engine for GPU-based Software Routers

* Project Assignment

- OpenCL for FPGA Acceleration (Toan, Dam)

- Machine Learning (김원표 or 박승주)

- Vision and/or Graphics Acceleration (전한재, 신지훈): Neon Vector Processing Survey + Matrix-Vector Multiplication with NEON

- Eliab, 엄주성, 우윤희 ???