Friday, July 18: Meeting 118-120
All times in Vancouver, Canada
Poster size for workshops:
Portrait orientation, 24"w x 36"h (61cm"w x 91.5cm" h).
*Note this is different for the main conference
Abstract: We probe the local geometry of loss landscapes coming from high-dimensional Gaussian mixture classification tasks as seen from a typical training trajectory. We study this by proving limits for the empirical spectral distribution and locations of outliers, for the empirical Hessian at all points in parameter space. Notably, this limiting spectral theory only depends on the point in parameter space through some $O(1)$-many summary statistics of the parameter; these summary statistics also follow an autonomous evolution under training by stochastic gradient descent. This allows us to investigate the limiting spectral theory along the limiting training trajectory, and find interesting phenomena like splitting and emergence of outliers over the course of training. Based on joint works with G. Ben Arous, J. Huang, and A. Jagannath.
TBD
Best Papers of HiLD Awards
TBD
Abstract: Computational capabilities in large AI systems emerge from the collective dynamics and interactions of basic neural elements under simple optimization rules. What are the fundamental principles driving this? How do interactions, dynamics, and data structure conspire to yield these abilities? We’ll revisit a classic setting - that of learning word embeddings (such as word2vec) from natural language. We’ll show how the dynamics of such algorithms can be distilled into a simpler solvable theory. Such algorithms give rise to a well-known linear computational structure in their learned representations that enables analogical reasoning, one of the simplest examples of an “emergent” capability. We propose a solvable microscopic theory of how word correlations in language can arise from factorization in latent space and drive the emergence of this linear computational structure.
Abstract: TBD
Abstract: TBD
TBD
Abstract: In many applications of statistical estimation via sampling, one may wish to sample from a high-dimensional target distribution that is adaptively evolving to the samples already seen. We study an example of such dynamics, given by a Langevin diffusion for posterior sampling in a Bayesian linear regression model with i.i.d. regression design, whose prior continuously adapts to the Langevin trajectory via a maximum marginal-likelihood scheme. Using techniques of dynamical mean-field theory (DMFT), we provide a precise characterization of a high-dimensional asymptotic limit for the joint evolution of the prior parameter and law of the Langevin sample. We then carry out an analysis of the equations that describe this DMFT limit, under conditions of approximate time-translation-invariance which include, in particular, settings where the posterior law satisfies a log-Sobolev inequality. In such settings, we show that this adaptive Langevin trajectory converges on a dimension-independent time horizon to an equilibrium state that is characterized by a system of replica-symmetric fixed-point equations, and the associated prior parameter converges to a critical point of a replica-symmetric limit for the model free energy. We explore the nature of the free energy landscape and its critical points in a few simple examples, where such critical points may or may not be unique.
This is joint work with Justin Ko, Bruno Loureiro, Yue M. Lu, and Yandi Shen.
Best Papers of HiLD Awards: TBD