Normalization Methods for Training Deep Neural Networks:

Theory and Practice

Organizers: M. Ozay and L. Huang

September 8th, Morning Session

Feature, weight (kernel) and gradient normalization methods have been used as building blocks of deep neural networks (DNNs). However, our understanding of theoretical foundations of these methods and the mathematical reasons for their success remain elusive. In addition, employment of theoretical methods in real world computer vision tasks using various large scale DNNs such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) with small batch size is a challenge.

To this end, this tutorial will first review recent work to provide a mathematical justification for geometric and statistical properties of different normalization methods applied among different ensembles of input-output channels. The theoretical analysis of normalization methods presented exploits mathematical tools that will guide researchers to develop novel normalization methods, and help them to improve our understanding of theoretical foundations of normalization methods. In addition, we will consider practical methods for implementation of various particular normalization methods such as batch normalization, block orthogonal weight and gradient normalization using CNNs and RNNs with small batch size in the context of important vision applications.

The tutorial will assume no particular background, beyond some basic working knowledge of computer vision and machine learning. All the necessary notions and mathematical foundations will be described. Our target audience is graduate students, researchers and practitioners who have been working on development of novel deep learning algorithms and/or their application to solve practical problems in computer vision and machine learning tasks. For intermediate and advanced level researchers, we will present theoretical analysis of normalization methods and the mathematical tools used to develop new normalization methods.