Teaching

Courses taught by Dr Sparsh Mittal


In Spring (Jan-May) 2019, Dr Sparsh taught "Hardware Architectures for Deep Learning" course. Here are its course-contents. 


Overview and motivation for designing hardware accelerators for deep-learning


Background: 

Approximate computing and storage

Roofline Model

Cache tiling (blocking)

GPU architecture, CUDA programming, understanding shared/global memory-bottlenecks in GPUs

FPGA architecture

Matrix multiplication using systolic array

3D/2.5D DRAM memory for high bandwidth

DRAM architecture


Deep-learning: 

Deep learning on FPGAs

Case study of Microsoft's Brainwave

Deep learning on Embedded System (especially NVIDIA's Jetson Platform)

Deep learning on Edge Devices (smartphones). Review of “Machine Learning at Facebook: Understanding Inference at the Edge”.

Study of Google's Tensor Processing Unit

Memristor-based accelerators for deep-learning

Intel's Xeon Phi architecture and Deep-learning using Intel's Xeon Phi

Convolutional strategies: Direct, FFT-based, Winograd-based and Matrix-multiplication based. Review of "Performance Analysis of GPU-based Convolutional Neural Networks"

Addressing memory bottleneck during DNN training. Review of "vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design"

Hardware-aware pruning of DNNs. Review of "Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism."

Distributed training of DNNs. Review of  "Optimizing Network Performance for Distributed DNN Training on GPU  Clusters: ImageNet/AlexNet Training in 1.5 Minutes"

Review of “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding”

Hardware/system-challenges in autonomous driving. Review of "The Architectural Implications of Autonomous Driving: Constraints and Acceleration".

Neural branch predictor. Review of "Using Branch Predictors to Predict Brain Activity in Brain-Machine Implants"

Data-compression and its use for addressing memory bottleneck in DL

Comparison of memory technologies (SRAM, DRAM, eDRAM, STT-RAM, PCM, Flash) and their suitability for designing memory-elements in DNN accelerator