2018-a-Anzt

2018 NCTS Short Course on

Scientific Computing and Machine Learning on Multi- and Manycore Architectures

國家理論中心數學組 2018 短期課程:多核心電腦上的科學計算與機器學習

News

Records/Live Stream

Instructor

Hartwig Anzt Karlsruhe Institute of Technology, Germany and University of Tennessee, USAhttp://www.icl.utk.edu/~hanzt/index_kit.html

Short Bio: My research focus is on developing and optimizing numerical methods for efficient high-performance computing. In particular, I am interested in sparse linear algebra, iterative and asynchronous methods, Krylov solvers, preconditioning. The approach I take is based on the idea of re-formulating the problems in terms of fixed-point problems to allow for higher parallelization levels. The implementation of the fixed-point methods typically make heavy use of (data-parallel) batched routines, and possess relaxed synchronization requirements. I also work on fault tolerance, energy efficiency, as well as Multi- and Manycore (GPU) computing.

Time

2018/1/15-19 and 1/22-25 (Monday to Friday, 9:30-12:20 and 13:40-15:30 with 10-minute breaks for every 50-minute lecture)

2018/1/30 (Tuesday, 15:30-18:00 for project presentation and then have a pizza dinner)

Place

Room 440, Astro-Mathematics Building, NTU (台灣大學 天文數學館 440室)

Course Overview

This course covers the fundamentals of designing and implementing numerical linear algebra operations and algorithms on modern multi- and manycore architectures. New trends in the direction of Machine Learning/ Deep Learning will also be covered. The course bridges between mathematical theory on linear solvers, iteration methods, and preconditioning and programming aspects like MPI, OpenMP and CUDA programming. The course splits into five parts:

  • Part I will start with an overview about HPC, current trends in high-end computing systems and environments, and continue with an introduction of common architecture designs and programming methodologies.
  • Part II covers programming techniques for multi- and manycore. Particular we will learn OpenMP and CUDA programming. Besides that, we will look at distributed memory systems and using MPI. A side aspect here is performance modeling and the introduction of tools that allow assessing the efficiency of implementations when running in parallel.
  • Part III will cover central routines and algorithms needed for scientific computing: BLAS, SpMV, linear solvers based on factorization and inversion routines, Singular Value Decompositions. Also, sparse solvers will be addressed.
  • Part IV we focus on batched routines as they are becoming increasingly popular and relevant. We look into batched BLAS efforts, hardware-specific optimization efforts, and sophisticated algorithms composed of batched routines.
  • Part V of the course is devoted to Machine Learning in the widest sense. We will in particular look at the concept of Deep Neural Networks (DNN), available ML libraries, and the performance of convolution kernels.
  • Part VI contains the presentation of the student projects. Each student chooses a topic in agreement with the lecturer and prepares a presentation along with a project class paper. This project accounts for 60% of the course grade. Although students are encouraged to come up with their own project ideas, a list containing possible topics will be distributed at the beginning of the course.

Each day contains two blocks: one morning block (3 hours) and one afternoon block (2 hours). While the morning block covers the more challenging theoretical background, the afternoon block aims at student involvement by containing practical examples, small exercises, and discussions. The students will have to complete 8 homework assignments covering the distinct topics.

Prerequisites

  • Linear Algebra and Numerics knowledge (vectors, matrices, factorization, iterative methods, convergence, residual, error, etc ...)
  • Fundamentals in programming ( C/C++, compiling and running programs )

Grading

  • The grade would be based on homework (40%) and a final project (60%) where the technical content, the class presentation, and the project paper will be accounted for.

Textbook

  • The Sourcebook of Parallel Computing, Edited by Jack Dongarra, Ian Foster, Geoffrey Fox, William Gropp, Ken Kennedy, Linda Torczon, Andy White, October 2002, 760 pages, ISBN 1-55860-871-0, Morgan Kaufmann Publishers.

Further reading

  • Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, SIAM Publication, Philadelphia, 1994.
  • Introduction to High-Performance Scientific Computing, by Victor Eijkhout with Edmond Chow, Robert Van De Geijn, February 2010.
  • Introduction to High-Performance Computing for Scientists and Engineers, by Georg Hager, Gerhard Wellein, CRC Press, 2010.

Detailed course content day-by-day

Organizers

Tsung-Min Huang, National Taiwan Normal University

Wen-Wei Lin, National Chiao Tung University

Yu-Chen Shu, National Cheng Kung University

Weichung Wang, National Taiwan University

Contact person: 游墨霏小姐 (02-3366-8814, murphyyu@ncts.ntu.edu.tw)