2019-b-Yokota

Second order optimization in Large-sCale deep learning

July 1-4, 2019 (9:00-12:00)

Room 202, Astro-Mathematics Building, National Taiwan University

News

[2019/07/01]: GPU cluster login information are available here (https://gist.github.com/nanaHa1003/0eeb60df19d151c6c8f96bdf8bdf63ca).
[2019/06/01] Map: http://goo.gl/iutx｜Transportation: https://visitorcenter.ntu.edu.tw/eng/p5-transportation.php

Course Recordings

2019/07/01: https://youtu.be/1D1EFjviwNU ; https://youtu.be/7hZbecMdqok
2019/07/02: https://youtu.be/pkx_BFXMZP4 ; https://youtu.be/eCrNV8VtJFM
2019/07/03: https://youtu.be/15aplyy-5mA ; https://youtu.be/PBO56zVuQUA
2019/07/04: https://youtu.be/NJm37LNHEnw ; https://youtu.be/89meurXnU00

Slide Decks

Course handouts (https://drive.google.com/drive/folders/1eZ_bI0242eEPfUegwfrAq81aT_UEsTIw?usp=sharing)

Registration

Please register online (https://forms.gle/t11TfSZTKP7mZFFRA) before 12:00 noon, June 28, 2019 (The seats are limited. Admitted applicants will receive email notifications.)

Overview

Deep neural networks are usually trained using first order stochastic gradient descent. In this short course, we investigate the possibility of using second order optimization methods such as natural gradient decent, generalized Gauss-Newton, and Newton methods. We will discuss the similarities and differences between these second order methods through the relationship between curvature (Riemann geometry of the parameter space) and covariance (Bayesian statistics of the parameter distribution). The course will also have a hands-on component where demonstrations will be given using a PyTorch implementation. We will also discuss fast approximation techniques and their distributed parallel implementation.

Schedule

7/1 (Mon) Introduction to optimization methods for deep learning (1.5 hours) + hands-on (1.5 hours)
7/2 (Tue) Second order optimization methods (1.5 hours) + hands-on (1.5 hours)
7/3 (Wed) Fast approximation for second order methods (1.5 hours) + hands-on (1.5 hours)
7/4 (Thu) Distributed parallel training with second order methods (1.5 hours) + hands-on (1.5 hours)

Instructor

Prof. RIO YOKOTA

Tokyo Institute of Technology

Rio Yokota is an Associate Professor at GSIC, Tokyo Institute of Technology. He was a Research Scientist at ECRC, KAUST from September 2011 to March 2015, where he worked on fast multipole methods (FMM) and their application to sparse matrix preconditioners, and also their implementation on large-scale heterogeneous architectures. During his PhD in Mechanical Engineering at Keio University, he worked on the implementation of fast multipole methods on special purpose machines such as MDGRAPE-3, and then on GPUs after CUDA was released. During a post-doc at the University of Bristol, he continued to work on FMM, and was part of the team that won the Gordon Bell prize for price/performance in 2009 using 760 GPUs. During a postdoc with Lorena Barba at Boston University he developed an open source parallel FMM code -- ExaFMM. He is now running this code on full nodes of the TSUBAME 2.5 and K computer in Japan, and also on large Blue Gene and Cray machines, for applications in molecular dynamics and sparse preconditioners. Rio is a member of ACM and SIAM.

主辦單位：科技部國家理論科學研究中心數學組

協辦單位：東京大學情報基盤中心，科技部人工智慧技術暨全幅健康照護聯合研究中心，台灣工業與應用數學學會，臺灣大學數學系與應用數學科學研究所

主持人：王偉仲 (台灣大學應數所)，林文偉 (交通大學應數系)，舒宇宸 (成功大學數學系)，黃聰明 (台灣師範大學數學系)

聯絡人：劉馥瑤小姐 (02-3366-8819, claireliu@ncts.ntu.edu.tw)

Google Sites

Report abuse