Schedule

08:00 - 08:30 Introduction (M. Ozay):

In this introductory lecture, we will briefly review the recent success of normalization methods for training of deep architectures in computer vision and machine learning. We will use recent results to motivate the following theoretical questions:

Why do many feature/weight/gradient normalization methods seem to give about equally good results?
What are the geometric and statistical properties of normalization methods?
Can we solve exploding and vanishing gradient problems using normalization methods?
How do normalization methods affect convergence of CNNs to local minima?
How does using simple re-parameterization tricks help for optimization on normalized kernels?
How can we implement normalization methods efficiently in large networks with small batch sizes?

08:30 - 09:30 Normalization Techniques: Motivation, Methods and Analysis (L. Huang ):

Presentation file.

Discussing motivation of normalization methods in deep neural networks from the perspective of optimization of network parameters. An overview of state-of-the-art feature, weight and gradient normalization methods, such as instance and batch normalization of features, and block orthogonal weight and gradient normalization. Employment of normalization methods in large-scale networks with small batch sizes.

09:30 - 10:15 Applying Normalization Methods for Computer Vision Tasks in Practice (L. Huang):

Presentation file.

Practical tips and tricks to implement and use the aforementioned normalization methods for various computer vision tasks, such as, image style transfer and training of GANs.

10:15 - 10:30 Coffee Break

10:30 - 12:15 Mathematical Foundations, Theoretical Results and Challenges (M. Ozay ):

Presentation file.

Basics of differential geometry and Riemannian manifolds. Fundamentals of optimization on matrix manifolds.

Geometric and statistical relationship between batch, weight and gradient normalization methods. Development of new normalization methods by solving unconstrained optimization problems on Riemannian manifolds. Training deep neural networks (DNNs) using multiple different normalization methods among input-output channels with stochastic sampling. Convergence analysis of optimization methods for training DNNs with normalization methods.

12:15- 12:30 Questions and discussion