This lists the topics planned to be covered in the next lecture (from several places) & recommended reading. This may help students who need to look up the contents before the lecture.
Lecture 1 covered the introduction to parallel programming and introduced the concepts of SIMD, SIMT, SPMD. We also covered ROCm, several HIP programming concepts and a few ROCm/HIP libraries. (chapter 1, HIP textbook)
Lecture 2 would start with a review of the previous lecture, abbreviations, Q/A, followed by more examples on HIP programming, different types of memory on a GPU, and end with a discussion on the AMD GPU architecture. (chapter 2, HIP textbook)
Lecture 3 would start with a review of the previous lecture. We would then venture into assembly, some performance considerations and cover some profiling tools including Omniperf and Rocprof . (chapter 3 & 4, HIP textbook)
Lecture 4 would start with a review of the previous lecture. We would then venture into reduction trees, prefix scans and atomic operations & histogramming (chapter 9, CUDA textbook)
Lecture 5 Lecture 5 is for the most part a review lecture. We would start with a review of the previous lectures and then move to naive matrix multiplication. We would then look into tiled matrix multiplication. (chapter 5, CUDA textbook)
Lecture 6 will look into better using LDS, avoiding LDS bank conflicts, matrix transpose and matrix convolution (naive and tiled). (chapter 6&8, CUDA textbook)
Lecture 7 will start wit a review of convolutions and then go into a case study on genomics applications on GPUs
Lecture 8 will go you a peak into a case study on machine learning
Lecture 9 will cover parallel sparse methods, graphs and streams. (chapter 10, CUDA textbook)
Lecture 10 will be class project presentation followed by Q/A