1. Yann LeCun, Ido Kanter, Sara A.Solla. Second order properties of Error Surfaces: Learning time and Generalization. NIPS, 1990
2. Yann LeCun, Leon Bottou, Genevieve B. Orr, Klaus-Robert Muller. Efficient BackProp. Neural Networks: tricks of the trade. 1998
3. Simon Wiesler, Hermann Ney. A Convergence Analysis of Log-Linear Training, NIPS 2011
4. James Martens, Roger Grosse. Optimizing Neural Networks with Kronecker-factored Approximate Curvature. ICML 2015
1. Guillaume Desjardins, Karen Simonyan, Razvan Pascanu, Koray Kavukcuoglu. Natural Neural Networks. NIPS 2015.
2. Grégoire Montavon and Klaus-Robert Müller. Deep Boltzmann Machines and the Centering Trick. Neural Networks: Tricks of the Trade, 2012
3. Simon Wiesler, Alexander Richard, Ralf Schluter, Hermann Ney. Mean-normalized Stochastic Gradient for Large-scale Deep Learning. ICASSP 2014.
4. Jan Melchior, Asja Fischer, Laurenz Wiskott. How to Center Binary Deep Boltzmann Machines. JMLR 2016.
5. Sergey Ioffe and Christian Szegedy. Batch normalization accelerating deep network training by reducing internal covariate shift. ICML 2015
6. Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton. Layer Normalization. Arxiv:1607.06450, 2016
7. Dmitry Ulyanov and Andrea Vedaldi. Instance Normalization: The Missing Ingredient for Fast Stylization. arXiv:1607.08022, 2016
8. Mengye Ren, Renjie Liao, Raquel Urtasun, Fabian H. Sinz, Richard S. Zemel. Normalizing the normaliziers-comparing and extending network normalization schemes. ICLR 2017
9. Yuxin Wu and Kaiming He. Group Normalization. ECCV 2018
10. Qianli Liao, Kenji Kawaguchi and Tomaso Poggio. Streaming Normalization: Towards Simpler and More Biologically-plausible Normalizations for Online and Recurrent Learning. arXiv:1610.06160
11. Sergey Ioffe. Batch Renormalization: Towards Reducing Minibatch Dependence in Batch Normalized Models. NIPS 2017
12. Guangrun Wang, Jiefeng Peng, Ping Luo, Xinjiang Wang and Liang Lin. Batch Kalman Normalization: Towards Training Deep Neural Networks with Micro-batches. arXiv:1802.03133, 2018
13. Ping Luo, Jiamin Ren and Zhanglin Peng. Differentiable Learning to Normalize via Switchable Normalization. arXiv:1806.10779, 2018
14. Elad Hoffer, Ron Banner, Itay Golan and Daniel Soudry. Norm matters: efficient and accurate normalization schemes in deep networks. arXiv:1803.01814, 2018
15. Shuang Wu, Guoqi Li, Lei Deng, Liu Liu, Yuan Xie, and Luping Shi. L1-Norm Batch Normalization for Efficient Training of Deep Neural Networks. arXiv:1802.09769, 2018
16. Günter Klambauer, Thomas Unterthiner, Andreas Mayr, Sepp Hochreiter. Self-Normalizing Neural Networks. NIPS 2017.
17. Natural Neural Networks, Guillaume Desjardins, Karen Simonyan, Razvan Pascanu, Koray Kavukcuoglu. NIPS 2015
18. Ping Luo. Learning Deep Architectures via Generalized Whitened Neural Networks. ICML 2017
19. Lei Huang, Dawei Yang, Bo Lang, Jia Deng, Decorrelated Batch Normalization. CVPR 2018
1. Devansh Arpit, Yingbo Zhou, Bhargava U. Kota, Venu Govindaraju. Normalization Propagation A Parametric Technique for Removing Internal Covariate Shift in Deep Networks. ICML 2016.
2. Alexander Shekhovtsov and Boris Flach. Normalization of Neural Networks using Analytic Variance Propagation. arXiv:1803.10560, 2018
3. Wenling Shang, Justion Chiu, Kihyuk Sohn. Exploring Normalization in Deep Residual Networks. AAAI 2017.
4. Tim Salimans, Diederik P. Kingma, Weight Normalization A Simple Reparameterization to Accelerate Training of Deep Neural Networks, NIPS 2016
5. Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. AISTATS, 2010
6. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. ICCV 2015
7. Lei Huang, Xianglong Liu, Yang Liu, Bo Lang, Dacheng Tao. Centered Weight Normalization in Accelerating Training of Deep Neural Networks. ICCV 2017.
8. Andrew M. Saxe, James L. McClelland, Surya Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. ICLR 2014
9. Dmytro Mishkin and Jiri Matas. All You Need Is a Good Init. ICLR 2016.
10. Lei Huang, Xianglong Liu, Bo Lang, Admas Wei Yu, Bo Li. Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent Stiefel Manifolds in Deep Neural Networks. AAAI 2018.
11. Mete Ozay and Takayuki Okatani. Optimization on Submanifolds of Convolution Kernels in CNNs. arXiv:1610.07008, 2016
12. Minhyung Cho and Jaehyung Lee. Riemannian approach to batch normalization. NIPS 2017
13. Projection Based Weight Normalization for Deep Neural Networks. Lei Huang, Xianglong Liu, Bo Lang, Bo Li. arXiv:1710.02338. 2017
14. Nathan Srebro and Adi Shraibman. Rank, Trace-Norm and Max-Norm. COLT 2015.
15. Behnam Neyshabur, Ruslan Salakhutdinov, Nathan Srebro. Path-SGD: Path-Normalized Optimization in Deep Neural Networks. NIPS 2015
16. Kui Jia, Dacheng Tao, Shenghua Gao, and Xiangmin Xu. Improving training of deep neural networks via singular value bounding CVPR 2017.
17. Chunjie Luo, Jianfeng Zhan, Lei Wang, Qiang Yang. Cosine Normalization: Using Cosine Similarity Instead of Dot Product in Neural Networks. arXiv:1702.05870, 2017
1. Ping Luo. EigenNet: Towards Fast and Structural Learning of Deep Neural Networks. IJCAI 2017.
2. Adams Wei Yu, Lei Huang, Qihang Lin, Ruslan Salakhutdinov, and Jaime G Carbonell. Block-normalized gradient method: An empirical study for training deep neural network. CoRR, abs/1707.04822, 2017.
3. Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, Andrew Rabinovich. GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks, ICML 2018.
1. Tim Cooijmans, Nicolas Ballas, César Laurent, Çaglar Gulçehre, Aaron Courville. Recurrent Batch Normalization. ICLR,2017
2. Cesar Laurent, Gabriel Pereyra, Philemon Brakel, Ying Zhang, Yoshua Bengio. Batch Normalized Recurrent Neural Networks. ICASSP 2016
3. Martín Arjovsky, Amar Shah, and Yoshua Bengio. Unitary evolution recurrent neural networks. ICML, 2016
4. Scott Wisdom, Thomas Powers, John Hershey, Jonathan Le Roux, and Les Atlas. Full-capacity unitary recurrent neural networks. NIPS, 2016
5. Zakaria Mhammedi, Andrew D. Hellicar, Ashfaqur Rahman, and James Bailey. Efficient orthogonal parametrization of recurrent neural networks using householder reflections. ICML, 2017.
1. Dmitry Ulyanov. Instance Normalization: The Missing Ingredient for Fast Stylization. 1607.08022, 2016.
2. Leon A. Gatys, Alexander S. Ecker, Matthias Bethge. Image Style Transfer Using Convolutional Neural Networks. CVPR 2016.
3. Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. ECCV 2016.
4. Vincent Dumoulin, Jonathon Shlens, Manjunath Kudlur. A Learned Representation for Artistic Style. ICLR 2017
5. Xun Huang and Serge Belongie. Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization. ICCV 2017
6. Yijun Li, Chen Fang, Jimei Yang Zhaowen Wang, Xin Lu, Ming-Hsuan Yang. Universal Style Transfer via Feature Transforms. NIPS 2017
1. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio. Generative adversarial networks. NIPS 2014.
2. Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen. Improved Techniques for Training GANs. NIPS 2016
3. Sitao Xiang and Hao Li. On the Effects of Batch and Weight Normalization in Generative Adversarial Networks. arXiv:1704.03971. 2017
4. Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida. Spectral Normalization for Generative Adversarial Networks. ICLR 2018
5. Aliaksandr Siarohin, Enver Sangineto, Nicu Sebe. Whitening and Coloring transform for GANs. arXiv:1806.00420, 2018.
1. Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, Nicolas Usunier. Parseval Networks: Improving Robustness to Adversarial Examples. ICML 2017
2. Alexander G. de G. Matthews, Mark Rowland, Jiri Hron, Richard E. Turner, Zoubin Ghahramani, Gaussian Process Behaviour in Wide Deep Neural Networks, ICLR, 2018.
3. Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein, Deep Neural Networks as Gaussian Processes, ICLR, 2018.
4. S. Oymak, ``Learning Compact Neural Networks with Regularization,'' to appear at ICML 2018.
5. K. Zhong, Z. Song, P. Jain, P. L. Bartlett, I. S. Dhillon, Recovery Guarantees for One-hidden-layer Neural Networks, ICML 2017.
6. R. Ge, J. D. Lee, T. Ma, Learning One-hidden-layer Neural Networks with Landscape Design, ICLR 2018.
7. Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, Aleksander Madry, How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift), CoRR abs/1805.11604, 2018.
8. Yuxin Wu and Kaiming He, Group Normalization, Arxiv, abs/ 1803.08494, 2018.
9. B. Neyshabur, Z. Li, S. Bhojanapalli, Y. LeCun, N. Srebro Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks, CoRR abs/1805.12076, 2018.
10. Peter L Bartlett and Shahar Mendelson, Rademacher and gaussian complexities: Risk bounds and structural results, Journal of Machine Learning Research, 3(Nov):463–482, 2002.
11. Peter L Bartlett, Dylan J Foster, and Matus J Telgarsky, Spectrally-normalized margin bounds for neural networks, In Advances in Neural Information Processing Systems, pages 6241–6250, 2017
12. Noah Golowich, Alexander Rakhlin, and Ohad Shamir, Size-independent sample complexity of neural networks, arXiv preprint arXiv:1712.06541, 2017.
13. Nick Harvey, Chris Liaw, and Abbas Mehrabian, Nearly-tight vc-dimension bounds for piecewise linear neural networks, arXiv preprint arXiv:1703.02930, 2017.
14. Behnam Neyshabur, Ryota Tomioka, and Nathan Srebro, Norm-based capacity control in neural networks. In Proceeding of the 28th Conference on Learning Theory (COLT), 2015.
15. Behnam Neyshabur, Srinadh Bhojanapalli, and Nathan Srebro, A PAC-bayesian approach to spectrally-normalized margin bounds for neural networks, In International Conference on Learning Representations, 2018.
16. Devansh Arpit, Yingbo Zhou, Bhargava U. Kota, Venu Govindaraju, Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks, ICML, 2016.
17. Xavier Glorot, Yoshua Bengio, Understanding the difficulty of training deep feedforward neural networks, AISTATS, 2010.
18. Mete Ozay and Takayuki Okatani, Training CNNs with Normalized Kernels, AAAI 2018.
19. Lei Huang, Xianglong Liu, Bo Lang, Admas Wei Yu, Bo Li, Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent Stiefel Manifolds in Deep Neural Networks, AAAI, 2018.
20. Lei Huang, Xianglong Liu, Bo Lang, Bo Li, Projection Based Weight Normalization for Deep Neural Networks, ICCV 2017.
21. Ping Luo and Jiamin Ren and Zhanglin Peng, Differentiable Learning-to-Normalize via Switchable Normalization, arXiv:1806.10779, 2018.
22. Mete Ozay and Takayuki Okatani, Optimization on Product Submanifolds of Convolution Kernels, Arxiv, 2017.
23. G. Cybenko, "Approximations by superpositions of sigmoidal functions", Mathematics of Control, Signals, and Systems, 2(4), 303-314, 1989.
24. Kurt Hornik, "Approximation Capabilities of Multilayer Feedforward Networks", Neural Networks, 4(2), 251–257, 1991.
25. S. Liang, R. Srikant, Why Deep Neural Networks for Function Approximation?, ICLR 2017.
26. H. N. Mhaskar and T. Poggio, Deep vs. shallow networks: An approximation theory perspective, Analysis and Applications, 2016.
27. Uri Shaham, Alexander Cloninger, Ronald R.Coifman, "Provable approximation properties for deep neural networks2, Applied and Computational Harmonic Analysis, Volume 44, Issue 3, May 2018, Pages 537-557, 2018.
28. Zhou Lu, Hongming Pu, Feicheng Wang, Zhiqiang Hu, Liwei Wang, "The Expressive Power of Neural Networks: A View from the Width", NIPS 2017.
29. Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, Jascha Sohl Dickstein, "On the Expressive Power of Deep Neural Networks", ICML 2017.
30. Y. Ollivier, "A visual introduction to Riemannian curvatures and some discrete generalizations", in Analysis and Geometry of Metric Measure Spaces: Lecture Notes of the 50th Séminaire de Mathématiques Supérieures (SMS), Montréal, 2011, Galia Dafni, Robert McCann, Alina Stancu, eds, AMS, 2013.
31. T. Miyato, T. Kataoka, M. Koyama, Y. Yoshida, Spectral Normalization for Generative Adversarial Networks, ICLR 2018.
32. K. Jia, D. Tao, S. Gao, and X. Xu, Improving training of deep neural networks via Singular Value Bounding, CVPR 2017.