The tutorial material covers three main topics:
Introduction to CUDA Python with Numba
Begin working with the Numba compiler and CUDA programming in Python
Use Numba decorators to GPU-accelerate numerical Python functions
Optimize host-to-device and device-to-host memory transfers
CUDA kernels in Python with Numba
Learn CUDA’s parallel thread hierarchy and how to extend parallel program possibilities
Launch massively parallel custom CUDA kernels on the GPU
Utilize CUDA atomic operations to avoid race conditions during parallel execution
Multi-dimensional grids and shared memory for CUDA Python with Numba
Learn support for GPU-accelerated Monte Carlo methods
Learn multidimensional grid creation and how to work in parallel on 2D matrices
Leverage on-device shared memory to promote memory coalescing while reshaping 2D matrices
(Note: due to this being only a half-day tutorial, it is unlikely that this third topic will be covered during the tutorial session. However, the material will be available for self-study thereafter.)
The content for this tutorial has been developed by the NVIDIA Deep Learning Institute
Fact Sheet: Fundamentals of Accelerated Computing with CUDA Python
Jupyter Notebook topic 1 page 4
Slides from topic 2
Jupyter Notebook topic 3 page 1