Oct 1 (1:45-5:30pm in MSB1147): Opening event.
1:45pm: Reception
2-2:50 Talk by Jorge Nocedal (Northwestern). Title: Nonlinear Optimization Methods for Machine Learning .
Abstract: Most high-dimensional nonconvex optimization problems cannot be solved to optimality. However, deep neural networks have a benign geometry that allows stochastic optimization methods find acceptable solutions. There are, nevertheless, many open questions concerning the optimization process, including trade-offs between parallelism and the predictive ability of solutions, as well as the choice of a metric with the right statistical properties. In this talk we discuss classical and new optimization methods in the light of these observations, and conclude with some open questions.
Short bio: Jorge Nocedal is the Walter P. Murphy Professor in the Department of Industrial Engineering and Management Sciences at Northwestern University. His research is in optimization, both deterministic and stochastic, and with emphasis on very large-scale problems. His current work is driven by applications in machine learning. He is a SIAM Fellow, and was awarded the 2012 George B. Dantzig Prize as well as the 2017 Von Neumann Theory Prize, for contributions to theory and algorithms of nonlinear optimization.
3-3:50 Talk by Andrea Montanari (Stanford) . Title: A Mean Field View of the Landscape of Two-Layers Neural Networks.
Abstract: Multi-layer neural networks are among the most powerful models in machine learning, yet the fundamental reasons for this success defy mathematical understanding. Learning a neural network requires to optimize a non-convex high-dimensional objective (risk function), a problem which is usually attacked using stochastic gradient descent (SGD). Does SGD converge to a global optimum of the risk or only to a local optimum? In the first case, does this happen because local minima are absent, or because SGD somehow avoids them? In the second, why do local minima reached by SGD have good generalization properties?
We consider a simple case, namely two-layers neural networks, and prove that -in a suitable scaling limit- SGD dynamics is captured by a certain non-linear partial differential equation (PDE) that we call distributional dynamics (DD). We then consider several specific examples, and show how DD can be used to prove convergence of SGD to networks with nearly ideal generalization error. This description allows to 'average-out' some of the complexities of the landscape of neural networks, and can be used to prove a general convergence result for noisy SGD.
[Based on joint work with Song Mei and Phan-Minh Nguyen]
Short bio: Andrea Montanari received a Laurea degree in Physics in 1997, and a Ph. D. in Theoretical Physics in 2001 (both from Scuola Normale Superiore in Pisa, Italy). He has been post-doctoral fellow at Laboratoire de Physique Théorique de l'Ecole Normale Supérieure (LPTENS), Paris, France, and the Mathematical Sciences Research Institute, Berkeley, USA. Since 2002 he is Chargé de Recherche (with Centre National de la Recherche Scientifique, CNRS) at LPTENS. In September 2006 he joined Stanford University as a faculty, and since 2015 he is Full Professor in the Departments of Electrical Engineering and Statistics.
He was co-awarded the ACM SIGMETRICS best paper award in 2008. He received the CNRS bronze medal for theoretical physics in 2006, the National Science Foundation CAREER award in 2008, the Okawa Foundation Research Grant in 2013, and the Applied Probability Society Best Publication Award in 2015. He is an Information Theory Society distinguished lecturer for 2015-2016. In 2016 he received the James L. Massey Research & Teaching Award of the Information Theory Society for young scholars. In 2018 he was an invited sectional speaker at the International Congress of Mathematicians.
3:50-4:30 coffee/break
4:30-5:30 Talk by Joel Tropp (Caltech). Title: Applied Random Matrix Theory.
Abstract: Random matrices now play a role in many areas of theoretical, applied, and computational mathematics. Therefore, it is desirable to have tools for studying random matrices that are flexible, easy to use, and powerful. Over the last fifteen years, researchers have developed a remarkable family of results, called matrix concentration inequalities, that balance these criteria. This talk offers an invitation to the field of matrix concentration inequalities and their applications.
Short bio: Joel A. Tropp is Steele Family Professor of Applied & Computational Mathematics at the California Institute of Technology. He earned the Ph.D. degree in Computational Applied Mathematics from the University of Texas at Austin in 2004. His research centers on data science, applied mathematics, numerical algorithms, and random matrix theory. Prof. Tropp won a PECASE in 2008, and he has received society best paper awards from SIAM in 2010, EUSIPCO in 2011, and IMA in 2015. He has also been recognized as a Highly Cited Researcher in Computer Science each year from 2014–2017.