2017-c-summer

國家理論中心數學組 2017 暑期課程

用超級電腦做科學計算

Scientific Computing on Supercomputer

2017年6月26日星期ㄧ至7月7日星期五。國立臺灣大學天文數學館 301 室

最新訊息

[2017/07/06] 公告：提醒大家吃完午餐請將餐盒丟在302教室的垃圾桶中，並做好分類
[2017/07/04] Putty 產生金鑰投影片可由此觀看
[2017/07/04] Part C. 教材可從 http://nkl.cc.u-tokyo.ac.jp/NTU2017S/ 下載
[2017/07/04] Part C. 提問請至 https://app.sli.do/event/jow9bgh5/ask
[2017/06/29] Part B. codes 可由此下載，Codes Directory
[2017/06/29] Part B. 提問請至 https://app.sli.do/event/jow9bgh5/ask
[2017/06/29] Part B. 投影片可由此下載，Slides Directory
[2017/06/28] 第三天 FVM and PCG 影片：https://youtu.be/eKcZ1CLq_Ks (講者電腦) https://youtu.be/D8B2X_ACz5Y (現場畫面)
[2017/06/28] 第三天下午 code 可至此下載：https://drive.google.com/open?id=0B6vku8qPKNhdeFlYRVFoR3dDRlE
[2017/06/28] 台灣本地學生如要申請交通或住宿補助，請於 2017年6月28日17:00 前填寫表格。若出席率達 90% 以上，並經審核通過，將給予「部分」經費補助。
[2017/06/28] 第三天 sparse 影片：https://youtu.be/oxXjrYveGfk (講者電腦) https://youtu.be/ujM8sN_VirM (現場畫面)
[2017/06/28] 完整範例程式：https://github.com/wlab-pro/scsc17summer/releases
[2017/06/28] Part A. day 3 投影片可由此下載，Version Control、Doxygen & Cmake、FVM and PCG
[2017/06/28] Part A. 提問請至 https://app.sli.do/event/jow9bgh5/ask
[2017/06/28] 第三天 Git, Doxygen, Cmake 影片：https://youtu.be/vUkb6O80KE4 (講者電腦) https://youtu.be/0fRFjGkSC3k (現場畫面)
[2017/06/27] 第二天 MAGMA 影片：https://youtu.be/WRZI-obBADQ (講者電腦)，https://youtu.be/pcs4PdNWyZg (現場畫面)
[2017/06/27] 第二天 MKL 影片：https://youtu.be/1V67qATgyA4 (講者電腦)，https://youtu.be/W1JXIF0vYFU (現場畫面)
[2017/06/27] Part A. day 2 投影片可由此下載，Introduction to Linear System Solvers、Intel® Math Kernel Library、GPU and MAGMA
[2017/06/26] 第一天影片 https://youtu.be/s3n8ZTG8f8I (講者電腦) https://youtu.be/R3NuB8vOoT4 (現場畫面)
[2017/06/26] 範例程式：https://github.com/wlab-pro/scsc17summer0/releases
[2017/06/26] Part A. day 1 投影片可由此下載，Overview、Linux & Shell & Editor、Graph Laplacian and Surface Parameterization、Makefile and Matrix
[2017/06/19] Part C. 教材可從 http://nkl.cc.u-tokyo.ac.jp/NTU2017S/ 下載
[2017/06/15] 寄發錄取通知
[2017/05/22] 臺灣大學地圖：PDF 檔，Google 版地圖

時間與講者

Part A. 手把手數學軟體工具應用
- 6/26, 6/27, 6/28 (9:10-16:20)
- 蔡宇翔，張大衛，楊慕，陳彥禎，吳亭慧，廖為謙 (台灣大學應數所)，蘇逸鎮 (中央大學數學所)
Part B. CUDA GPU 程式設計
- 6/29, 6/30, 7/3 (9:10-16:20)
- 郭芳安 (國家高速網路與計算中心)
Part C. Advanced Course on Multi-Threaded Parallel Programming using OpenMP/OpenACC for Multicore/Manycore Systems
- 7/4, 7/5, 7/6, 7/7 (9:10-17:00, 英文授課)
- Kengo Nakajima and Tetsuya Hoshino (Information Technology Center, The University of Tokyo, Japan)

報名資訊

報名 2017/6/12 截止，6月15日寄發錄取通知。

報名費繳交處：台北市羅斯福路四段1號，台灣大學天文數學館203室，黃芝鈞小姐，02-3366-8813，zhijun@ncts.ntu.edu.tw。
需在2017年6月12日前繳交 1000元報名費，完成報名手續，逾期未繳將取消報名。
若上課出席率達 90% 以上，將於2017年7月7日全額退費，並發給上課證書。
因座位有限，主辦單位保留是否接受報名權利。若主辦單位無法接受報名，將全額退費。
上課期間提供午餐。

課程簡介

善用高效能計算，不僅可以解決大型科學或工程問題，也能開發重要的產業應用軟體。在這個兩星期的密集課程裡，我們將簡介 (A) 手把手數學軟體工具應用，(B) CUDA GPU 程式設計，(C) CPU/GPU 的 OpenMP/OpenACC 程式設計。這個短期課程提供一個非常難得的機會，利用東京大學的超級電腦 Reedbush，學習最先進的高效能計算軟體語言與開發環境。歡迎教師、碩博士生、大學部同學報名參加，建議具備 C 程式語言基本能力。

參考連結

大型線性系統與特徵值問題：師大數學線上課程

Part A. 手把手數學軟體工具應用

在本課程中，我們將透過有趣的應用問題，手把手帶領大家熟悉數學函式庫與相關軟體開發工具，進入高效能運算的世界。

課程時間表 (Course Schedule)

授課講師 (Instructors)

蔡宇翔，張大衛，楊慕，陳彥禎，吳亭慧，廖為謙（台灣大學應數所）

蘇逸鎮（中央大學數學所）

Part B. CUDA GPU 程式設計

近年來，處理器朝向多核心架構發展已是普遍情況，一般中央處理器受限於 x86 架構與記憶體頻寬影響尚無法大量增加處理器核心，從而在一顆晶片上得到超過 1 Teraflops 以上的處理能力。2012 年推出 2,000 核心等級的通用計算圖形處理器（以下簡稱 GPGPU），其後更將最大理論浮點計算量增加至雙精度 1.2 TFlops，透過其發展之應用程式更是涵蓋各領域，目前已有許多商用軟體支援 CUDA，應用領域包括動畫算圖、生物計算、力學模擬、化學分析、金融計算為大宗。目前 GPGPU 提供使用者多層級快取系統與大頻寬記憶體，讓使用者更自由地最佳化應用程式。過往GPGPU 被詬病的精度問題在最新一代 Fermi 架構下得以解決。在 2014 年 6 月TOP500 所公佈的前 10 名超級電腦排名裡已有 2 部採用 GPGPU 架構處理大量資料，分別為美國橡樹嶺國家實驗室（ORNL）與瑞士超級電腦中心（CSCS）為研究開發人員提供 PetaFlop 等級計算能力。

本課程為 CUDA 程式實做與 Linux GPGPU 平行計算環境建置課程，學員需要有C語言與使用 Linux 平台的基礎，透過 GPU/CPU/Memory 基礎架構解說與實例操作，如果已有 OpenMP 或 MPI 等平行計算經驗更佳，學員可以從中了解 CUDA 之操作，搭配課堂上所演示的平行演算法可使學員更加容易地從序列式計算轉換至平行計算。課程主要涵蓋CUDA環境建置，SDK說明，實例操作、環境建置等。

上課環境為 Linux 系統並編寫簡單的 CUDA 程式，課程最後有討論時間並練習，如果學員有 CUDA 問題也可在課程上發問。

課程時間表 (Course Schedule)

授課講師 (Instructors)

郭芳安 (國家高速網路與計算中心)

Part C. Advanced Course on Multi-Threaded Parallel Programming using OpenMP/OpenACC for Multicore/Manycore Systems

課程內容 (Schedule and Contents)

July 4, 2017 (T)
- Introduction
- Finite-Volume Method (FVM)
- Login to Reedbush System
- OpenMP
- Reordering (1/2)
July 5, 2017 (W)
- Reordering (2/2)
- Parallel FVM by OpenMP
July 6, 2017 (Th)
- Introduction to GPU Programming
- OpenACC (1/2)
July 7, 2017 (F)
- OpenACC (2/2)
- Parallel FVM by OpenACC
- Exercises

Overview

In order to make full use of modern supercomputer systems with multicore/manycore architectures, hybrid parallel programming with message-passing and multithreading is essential. While MPI is widely used for message-passing, OpenMP for CPU and OpenACC for GPU are the most popular ways for multithreading on multicore/manycore clusters. In this 4-day course, we focus on optimization of single node performance using OpenMP and OpenACC for CPU and GPU. We “parallelize” a finite-volume method (FVM) code with Krylov iterative solvers for Poisson’s equation on Reedbush supercomputer at the University of Tokyo with 1.93 PF peak performance (http://www.cc.u-tokyo.ac.jp/system/reedbush/index-e.html), which consists of the most recent CPU’s (Intel Xeon E5-2695 v4 (Broadwell-EP)) and GPU’s (NDIVIA Tesla P100 (Pascal)).

It is assumed that all the participants have already finished “邁向 Petascale 高速計算 (February 2016)” or “國家理論中心數學組「高效能計算」短期課程 (February 2017)”, or they have equivalent knowledge and experiences in numerical analysis and parallel programming by MPI and OpenMP.

In the winter schools in February 2016 and February 2017, the target application was a 3D FVM code for Poisson’s equation by Conjugate Gradient (CG) iterative method with very simple Point Jacobi preconditioner. This time our target is same FVM code, but linear equations are solved by ICCG (CG iterative method with Incomplete Cholesky preconditioning), which is more complicated, powerful and widely used in practical applications. Because ICCG includes “data dependency”, where writing/reading data to/from memory could occur simultaneously, parallelization using OpenMP/OpenACC is not straight forward. We need certain kind of reordering in order to extract parallelism. In this 4-day course, lectures and exercise on the following issues will be provided:

Overview of Finite-Volume Method (FVM)
Krylov Iterative Method, Preconditioning
Implementation of the Program
Introduction to OpenMP/OpenACC
Reordering/Coloring Method
Parallel FVM by OpenMP/OpenACC

預備知識 (Prerequisites)

Completion of one of the following two short courses:
- 邁向 Petascale 高速計算 (February 2016) https://sites.google.com/site/school4scicomp/previous/2016-a-winter
- 國家理論中心數學組「高效能計算」短期課程 (February 2017) https://sites.google.com/site/school4scicomp/previous/2017-b-spring
- (or equivalent knowledge and experiences in numerical analysis (FVM, preconditioned iterative solvers) and parallel programming using OpenMP and MPI )
- 6月28日課程將會幫同學複習這部分的內容
Experiences in Unix/Linux (vi or emacs)
Fundamental numerical algorithms (Gaussian Elimination, LU Factorization, Jacobi/Gauss-Seidel/SOR Iterative Solvers, Conjugate Gradient Method (CG))
Experiences in SSH Public Key Authentication Method

課程講義 (Course Handouts)

http://nkl.cc.u-tokyo.ac.jp/NTU2017S/
http://nkl.cc.u-tokyo.ac.jp/17s/ (Lectures at the University of Tokyo (OpenMP only))
http://nkl.cc.u-tokyo.ac.jp/NTU2015/
http://nkl.cc.u-tokyo.ac.jp/files/multicore-c.tar
http://nkl.cc.u-tokyo.ac.jp/files/multicore-f.tar

課前準備 (Preparation)

Windows
- Cygwin with gcc/gfortran and OpenSSH
  - Please make sure to install gcc (C) or gfortran (Fortran) in “Devel”, and OpenSSH in “Net”
  - ParaView
MacOS, UNIX/Linux
- ParaView
Cygwin: https://www.cygwin.com/
ParaView: http://www.paraview.org/

授課教授 (Instructors)

Professor Kengo Nakajima (中島研吾教授，東京大學情報基盤中心超級計算研究部門)

Professor Tetsuya Hoshino (星野哲也助教授，東京大學情報基盤中心超級計算研究部門)

主辦單位與聯絡人

主辦單位：國家理論中心數學組

協辦單位：東京大學情報基盤中心，臺灣大學數學系與應用數學科學研究所，台灣工業與應用數學學會 GPU與高效能計算活動學群

主持人：王偉仲 (台灣大學應數所)，林文偉 (交通大學應數系)，舒宇宸 (成功大學數學系)，黃楓南 (中央大學數學系)，黃聰明 (台灣師範大學數學系)

聯絡人：陸小姐 (02-3366-8811, risalu@ncts.ntu.edu.tw)

Google Sites

Report abuse