Special Topic in Computer Architecture
Special Topic in Computer Architecture:
"Parallel Computing and Acceleration for HPC"
* All the programming will be tested in a Linux based computing platform (Raspberry Pi and Jetson TK1).
(Week 1: 9 March)
1-1 Conventional Computer Architecture and Parallel Computing Basic
(Week 2: 16 March)
1-2 Optimal Manycore Computing: A Performance/Energy Analysis and Optimization of Multi-Core Architectures with Voltage Scaling Techniques
(Paper PDF) (PPT PDF)
(Week 3: 23 March)
2. CPU:: Parallel Computing Programming - pthread & CPU Vector Processing
* pthread Programming
- Linux pthread Instructions
- Multi-threaded Matrix Multiplication
* Vector Processing (30 March)
- Crunching Numbers with AVX and AVX2 (Intel)
LINK : Cornell's Vectorization
- ARM Vector Instructions
NEON
NEON
The ARM® NEON™ general-purpose SIMD engine efficiently processes current and future multimedia formats, enhancing the user experience.
NEON technology can accelerate multimedia and signal processing algorithms such as video encode/decode, 2D/3D graphics, gaming, audio and speech processing, image processing, telephony, and sound synthesis by at least 3x the performance of ARMv5 and at least 2x the performance of ARMv6 SIMD.
LINK: ARM Vector Instructions List
LINK: SIMD Assembly Tutorial: ARM NEON
LINK : ARM NETON Instruction Set and Why You Should Care
LINK : Using your C compiler to exploit NEON™ Advanced SIMD
LINK : Programming with vector instructions (Intel Family, MMX, SSE, AVX)
Demo: Lab Exercie
(Week 4: 30 March)
3. CPU:: Parallel Computing Programming - OpenMP
https://www.youtube.com/watch?v=6FMn7M5jxrM
https://www.youtube.com/playlist?list=PLLX-Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG
Demo: Lab Exercie
(Week 5: 6/27 April)
4-1. GPU:: Parallel Computing Programming - CUDA Basic and GPU Architecture
4-2. GPU:: Parallel Computing Programming - CUDA Optimization
Demo: Lab Exercie
(Week 6: -)
4-3. GPU:: Parallel Computing Programming - CUDA Enabled Embeded Computing
Demo: Lab Exercie
(Week 7: 4 May)
5. CPU/GPU/FPGA:: Parallel Computing Programming - OpenCL 1
* Vector Addition detailed explanation
[ref] OpenCL setting for Visual Studio (LINK)
- Bristol OpenCL
(Optional)
Opt-1 FPGA:: Hardware Acceleration - FPGA Basics
Opt-2 FPGA:: Hardware Acceleration - Verilog HDL
Demo: Lab Exercie
Opt-3 FPGA:: Hardware Acceleration - Simple MIPS Processor Design
Opt-4 FPGA:: Hardware Acceleration - Manycore MIPS Processor Design
Demo: Lab Exercie
* Paper Survey and Presentation
(Week 7: 18 May)
AYINEBYONA ELIAB: Solving a Big-Data Problem with GPU: The Network Traffic Analysis
DAM MINH TUNG (OpenCL for FPGA Acceleration) : Using OpenCL to Rapidly Prototype FPGA Designs
NGUYEN VAN TOAN (OpenCL for FPGA Acceleration): Characterization of OpenCL on a Scalable FPGA Architecture
Implementing FPGA Design with the OpenCL Standard PDF
(Week 8: 25 May)
김원표: Generative Adversarial Networks
박승주: Performance Analysis of GPU-based Convolutional Neural Networks
(Week 9: 1 June May)
엄주성
(Week 9: 11 May): No Class since some of students have to join their project meeting.
(Week 9: 8 June)
신지훈
전한재
우윤희
김용휘: GAMT: A Fast and Scalable IP Lookup Engine for GPU-based Software Routers
* Project Assignment
- OpenCL for FPGA Acceleration (Toan, Dam)
- Machine Learning (김원표 or 박승주)
- Vision and/or Graphics Acceleration (전한재, 신지훈): Neon Vector Processing Survey + Matrix-Vector Multiplication with NEON
- Eliab, 엄주성, 우윤희 ???