Day 1 - Talks

Florent Krzakala

On statistical physics and inference problems

Heuristic tools from statistical physics, in particular the replica method, have been used in the past to locate the phase transitions and compute the optimal learning and generalisation errors in many machine learning tasks. This field is currently witnessing an impressive revival. In this talk, we provide a rigorous justification of these approaches for high-dimensional generalized linear models — used in signal processing, statistical inference, machine learning, communication theory and other fields — and discuss computational to statistical gaps where the learning is possible, but computationally hard.

Andrea Pagnani

Expectation propagation: applications and theory

Underdetermined systems of linear equations are ubiquitous in many fields, from biology to information technology. In recent years, different approximations schemes have been proposed to analyze the space of solutions. Expectation Propagation (EP) is an interesting theoretical framework particularly indicated to deal with realistic models of prior probability distribution. I will discuss three relevant applications to the method: (i) sampling metabolic networks, (ii) tomography, (iii) inference of brain activity from EEG data.

Planted ensembles of mean-field spin glass models make a bridge between statistical mechanics of disordered systems and Bayesian inference problems. In particular random graphs generated from the Stochastic Block Model possess a hidden community structure and constitute a natural testbed for graph inference algorithms. This model exhibit phase transitions, both for the information-theoretically optimal estimation of the underlying communities and for the accuracy of efficient estimation algorithms. One of these phase transitions corresponds to the Kesten-Stigum threshold for an associated tree reconstruction problem, or de Almeida-Thouless instability for the corresponding spin-glass. I will present in this talk some systematic moment expansions around this bifurcation that shed new light on these problems, in particular for the reconstruction of the 4-state Potts model and of the asymmetric Ising one.

In the course of evolution, proteins undergo substantial changes in their amino-acid sequences, while conserving their three-dimensional fold and their biological functionality. Modern sequencing techniques provide us with increasingly large families of proteins of variable sequence but almost constant global properties. The (inverse) statistical mechanics of complex disordered systems offer a perfect framework to build data-driven statistical models of sequence variability, and to relate them to protein structure and function. However, the choice of models is far from being obvious. Models treating each amino-acid position independently belong to the most successful models in sequence bioinformatics, but miss important information contained in the original data. I will overview the surprising efficiency of pairwise models in extracting information from sequence, and even in designing new, artificial but fully functional proteins. However, even these models have important limitations, opening interesting questions about statistical models of correlated, noisy and incomplete data.

The study of ecosystems formed by a large number of interacting species provides an interesting application of spin-glass physics. The model I will be focusing on is defined by Lotka-Volterra equations with symmetric random interactions. The theoretical analysis, confirmed by our numerical studies, shows that for strong and heterogeneous interactions the system displays multiple equilibria which are all marginally stable. Also, the multiple equilibria regime is analogous to a critical spin-glass phase. These properties allow to obtain general identities between the ecosystem diversity and single species responses, which generalise and saturate May’s bound. Moreover they provide a new perspective as to why many systems in several different fields appear to be poised at the edge of stability. In particular I will discuss new experimental ways to probe spin-glass criticality and marginal stability in ecosystems. I will also discuss generalisation to non symmetric interaction.

We plan to review recent progress in Euclidean Random Optimization problems, more precisely for matching, 2-matching, traveling salesman problem, in one and two dimensions, both for the monopartite and bipartite case.

An approximate method for conducting resampling in Lasso, the l1 penalized linear regression, in a semi-analytic manner is developed, whereby the average over the resampled datasets is directly computed without repeated numerical sampling, thus enabling an inference free of the statistical fluctuations due to sampling finiteness, as well as a significant reduction of computational time. The proposed method is employed to implement bootstrapped Lasso (Bolasso) and stability selection, both of which are variable selection methods using resampling in conjunction with Lasso, and it resolves their disadvantage regarding computational cost. To examine approximation accuracy and efficiency, numerical experiments were carried out using simulated datasets. Moreover, an application to a real-world dataset, the wine quality dataset, is presented. To process such real-world datasets, an objective criterion for determining the relevance of selected variables is also introduced by the addition of noise variables and resampling.

Continuous attractor neural networks (CANN) are conceptually important in theoretical neuroscience, as they provide mechanisms for the coding of collective coordinates by noisy neural populations. I will describe some recent theoretical developments regarding CANN, relying heavily on disordered systems and random matrix methods. Connections with experiments will be discussed.