Foundations of Data Science - Virtual Talk Series

... on the "Theory of Large ML Models."

Mikhail Belkin (UC San Diego)

Wednesday May 15

2pm Pacific Time

Mikhail Belkin is a Professor at Halicioglu Data Science Institute and Computer Science and Engineering Department at UCSD and an Amazon Scholar. Prior to that he was a Professor at the Department of Computer Science and Engineering and the Department of Statistics at the Ohio State University. He received his Ph.D. from the Department of Mathematics at the University of Chicago (advised by Partha Niyogi). His research interests are broadly in theory and applications of machine learning, deep learning and data analysis. Some of his well-known work includes widely used Laplacian Eigenmaps, Graph Regularization and Manifold Regularization algorithms, which brought ideas from classical differential geometry and spectral graph theory to data science. His more recent work has been concerned with understanding remarkable mathematical and statistical phenomena observed in deep learning. The empirical evidence necessitated revisiting some of the classical concepts in statistics and optimization, including the basic notion of over-fitting. One of his key findings has been the "double descent" risk curve that extends the textbook U-shaped bias-variance trade-off curve beyond the point of interpolation. His recent work focusses on understanding feature learning and over-parameterization in deep learning. Mikhail Belkin is an ACM Fellow and a recipient of a NSF Career Award and a number of best paper and other awards. He had served on the editorial boards of IEEE Proceedings on Pattern Analysis Machine Intelligence and the Journal of the Machine Learning Research. He is the editor-in-chief of SIAM Journal on Mathematics of Data Science (SIMODS).

Title: The puzzle of dimensionality and feature learning in neural networks and kernel machines

Abstract: Remarkable progress in AI has far surpassed expectations of just a few years ago.  At their core, modern models, such as transformers, implement traditional statistical models -- high order Markov chains. Nevertheless, it is not generally possible to estimate Markov models of that order given any possible amount of data. Therefore these methods must implicitly exploit low-dimensional structures present in data.  Furthermore, these structures must be reflected in high-dimensional internal parameter spaces of the models.  Thus, to build  fundamental understanding of modern AI, it is necessary to identify and analyze these latent low-dimensional structures.  In this talk, I will discuss how deep neural networks of various architectures learn low-dimensional features and how the lessons of deep learning can be incorporated in non-backpropagation-based algorithms that we call Recursive Feature Machines. I will provide a number of experimental results on different types of data, as well as some connections to classical sparse learning methods, such as Iteratively Reweighted Least Squares.