Monday, September 5, 2:00pm (online)
Blair Bilodeau (University of Toronto)
Minimax Rates for Conditional Density Estimation via Empirical Entropy
We consider the task of estimating a conditional density using i.i.d. samples from a joint distribution, which is a fundamental problem with applications in both classification and uncertainty quantification for regression. For joint density estimation, minimax rates have been characterized for general density classes in terms of uniform (metric) entropy, a well-studied notion of statistical capacity. When applying these results to conditional density estimation, the use of uniform entropy—which is infinite when the covariate space is unbounded and suffers from the curse of dimensionality—can lead to suboptimal rates. Consequently, minimax rates for conditional density estimation cannot be characterized using these classical results. We resolve this problem for well-specified models, obtaining matching (within logarithmic factors) upper and lower bounds on the minimax Kullback–Leibler risk in terms of the empirical Hellinger entropy for the conditional density class. The use of empirical entropy allows us to appeal to concentration arguments based on local Rademacher complexity, which—in contrast to uniform entropy—leads to matching rates for large, potentially nonparametric classes and captures the correct dependence on the complexity of the covariate space. Our results require only that the conditional densities are bounded above, and do not require that they are bounded below or otherwise satisfy any tail conditions. Based on https://arxiv.org/abs/2109.10461.
Monday, September 19, 2:00pm (online)
Kaizheng Wang (Columbia University)
Adaptive and Robust Multi-task Learning
In this talk, we study the multi-task learning problem that aims to simultaneously analyze multiple datasets collected from different sources and learn one model for each of them. We propose a family of adaptive methods that automatically utilize possible similarities among those tasks while carefully handling their differences. We derive sharp statistical guarantees for the methods and prove their robustness against outlier tasks. Numerical experiments on synthetic and real datasets demonstrate the efficacy of our new methods.
Monday, September 26, 3:45pm (in person, room 200)
Eddie Aamari (Sorbonne Université)
Statistical Query Complexity of Manifold Estimation
The statistical query (SQ) framework consists in replacing the usual access to samples from a distribution, by the access to adversarially perturbed expected values of functions interactively chosen by the learner. This framework provides a natural estimation complexity measure, enforces robustness through adversariality, and is closely related to differential privacy. In this talk, we study the SQ complexity of estimating d-dimensional submanifolds in R^n. We propose a purely geometric algorithm called Manifold Propagation, that reduces the problem to three local geometric routines: projection, tangent space estimation, and point detection. We then provide constructions of these geometric routines in the SQ framework. Given an adversarial STAT(τ) oracle and a target precision ε=Ω(τ^{2/(d+1)}) for the Hausdorff distance, the resulting SQ manifold reconstruction algorithm has query complexity O(n polylog(n)ε^{d/2}), which is proved to be nearly optimal. In the process, we will present low-rank matrix completion results for SQ's and lower bounds for (randomized) SQ estimators in general metric spaces.
Monday, October 3, 2:00pm (in person)
Sayan Mukherjee (Duke & Leipzig University)
Modeling Shapes and Fields
We will consider modeling shapes and fields via topological and lifted-topological transforms. Specifically, we show how the Euler Characteristic Transform and the Lifted Euler Characteristic Transform can be used in practice for statistical analysis of shape and field data. The Lifted Euler Characteristic is an alternative to the. Euler calculus developed by Ghrist and Baryshnikov for real valued functions. We also state a moduli space of shapes for which we can provide a complexity metric for the shapes. We also provide a sheaf theoretic construction of shape space that does not require diffeomorphisms or correspondence. A direct result of this sheaf theoretic construction is that in three dimensions for meshes, 0-dimensional homology is enough to characterize the shape. We will also discuss Gaussian processes on fiber bundles and applications to evolutionary questions about shapes.
Monday, October 10, 2:00pm (in person)
Lu Yu (CREST, ENSAE)
Mirror Descent Strikes Again: Optimal Stochastic Convex Optimization under Infinite Noise Variance
We study stochastic convex optimization under infinite noise variance. Specifically, when the stochastic gradient is unbiased and has uniformly bounded (1 + κ)-th moment, for some κ ∈ (0, 1], we quantify the convergence rate of the Stochastic Mirror Descent algorithm with a particular class of uniformly convex mirror maps, in terms of the number of iterations, dimensionality and related geometric parameters of the optimization problem. Interestingly this algorithm does not require any explicit gradient clipping or normalization, which have been extensively used in several recent empirical and theoretical works. We complement our convergence results with information-theoretic lower bounds showing that no other algorithm using only stochastic first-order oracles can achieve improved rates. Our results have several interesting consequences for devising online/streaming stochastic approximation algorithms for problems arising in robust statistics and machine learning.
November 2022: Short course of Lukas Steinberger (University of Vienna)
Monday 7 --- 9:00-12:15 am, room 2045
Thursday 10 --- 10:00-12:15 am, room 2045
Monday 14 --- 9:00-12:15 am, room 2045
Thursday 17 --- 10:00-12:15 am, room 2033
Statistics of Differential Privacy
With the advent of big data and powerful machine learning methods, traditional approaches towards data privacy protection, such as aggregation and anonymization, have reached their limits. At the turn of the century a new notion of data privacy protection called `differential privacy’ (DP) has emerged that is inherently statistical and admits a rigorous mathematical definition and analysis. Despite its increasing popularity in recent years, statisticians are only beginning to explore its full potential and its limitations for statistical inference. In this course we give an introduction to the rapidly growing field of differential privacy from the perspective of mathematical statistics and formally study trade-offs between privacy protection and data utility for several parametric and nonparametric estimation problems. After clarifying basic concepts and looking at first examples of differentially private data release mechanisms, we will mainly focus on the mathematical techniques required to establish statistical optimality properties. We will also encounter several open problems and possible directions for future research.
In particular, we will cover the following topics: General introduction and properties of DP data release mechanisms; Basic principles of mechanism design; Central vs. local paradigm of DP; Strong data processing inequalities; Theory of minimaxity with local DP; Towards a theory of efficiency with local DP (time permitting). The prerequisites for this course are real analysis and probability theory. Familiarity with basic concepts of mathematical statistics will be very helpful but not strictly necessary.
Monday, November 21, 2:00pm (in person)
Fanny Yang (ETH Zurich)
How the strength of the inductive bias affects the generalization performance of interpolators
Interpolating models have recently gained popularity in the statistical learning community due to common practices in modern machine learning: complex models achieve good generalization performance despite interpolating high-dimensional training data. In this talk, we prove generalization bounds for high-dimensional linear models that interpolate noisy data generated by a sparse ground truth. In particular, we first show that minimum-l1-norm interpolators achieve high-dimensional asymptotic consistency at a logarithmic rate. Further, as opposed to the regularized or noiseless case, for min-lp-norm interpolators with 1<p<2 we surprisingly obtain polynomial rates. Our results suggest a new trade-off for interpolating models: a stronger inductive bias encourages a simpler structure better aligned with the ground truth at the cost of an increased variance. We finally discuss our latest results, where we show that this phenomenon also holds for nonlinear models.
Monday, November 28, 2:00pm (in person)
Angelika Rohde (University of Freiburg)
Sharp adaptive similarity testing with pathwise stability for ergodic diffusions
Within the nonparametric diffusion model, we develop a multiple test to infer about similarity of an unknown drift $b$ to some reference drift $b_0$: At prescribed significance, we simultaneously identify those regions where violation from similarity occurs, without a priori knowledge of their number, size and location. This test is shown to be minimax-optimal and adaptive. At the same time, the procedure is robust under small deviation from Brownian motion as the driving noise process. A detailed investigation for fractional driving noise, which is neither a semimartingale nor a Markov process, is provided for Hurst indices close to the Brownian motion case.
Monday, January 16, 2:00pm
Marylou Gabrié (École polytechnique)
Opportunities and Challenges in Enhancing Sampling with Learning
Deep generative models parametrize very flexible families of distributions able to fit complicated datasets of images or text. Virtually, these models provide independent samples from complex high-distributions at negligible costs. On the other hand, sampling exactly a target distribution, such a Bayesian posterior, is typically challenging: either because of dimensionality, multi-modality, ill-conditioning or a combination of the previous. In this talk, I will review recent works trying to enhance traditional inference and sampling algorithms with learning. I will present in particular flowMC, an adaptive MCMC with Normalizing Flow along with first applications and remaining challenges.
Monday, January 23, 2:00pm
Lénaïc Chizat (EPFL)
Grid-free, Debiaised, Stable Entropic Wasserstein Barycenters
The notion of "Wasserstein barycenter" is a natural way to define the average a family of probability measures but suffer from a high computational and statistical complexity, particularly in high dimension. In this talk, we introduce a formulation of entropy-regularized Wasserstein barycenters that enjoys favorable optimization, approximation, and statistical properties. This barycenter is defined as the unique probability measure that minimizes the sum of entropic optimal transport costs with respect to a family of given probability measures, plus an entropy term. We show that: (i) this notion of barycenter lends itself naturally to a grid-free optimization algorithm which, in the mean-field limit, converges globally at an exponential rate, (ii) it is debiased, in the sense that it is a better approximation of the unregularized Wasserstein barycenter than the naive entropic Wasserstein barycenter and that (iii) it can be estimated at the parametric rate: given n samples from each of the probability measures, the barycenter converges to the population barycenter at a rate n^{-1/2}, as measured in relative entropy.
Monday, January 30, 2:00pm
Eftychia Solea (ENSAI)
High-dimensional Nonparametric Functional Graphical Models via the Functional Additive Partial Correlation Operator
This article develops a novel approach for estimating a high-dimensional and nonparametric graphical model for functional data. Our approach is built on a new linear operator, the functional additive partial correlation operator, which extends the partial correlation matrix to both the nonparametric and functional settings. We show that its nonzero elements can be used to characterize the graph, and we employ sparse regression techniques for graph estimation. Moreover, the method does not rely on any distributional assumptions and does not require the computation of multi-dimensional kernels, thus avoiding the curse of dimensionality. We establish both estimation consistency and graph selection consistency of the proposed estimator, while allowing the number of nodes to grow with the increasing sample size. Through simulation studies, we demonstrate that our method performs better than existing methods in cases where the Gaussian or Gaussian copula assumption does not hold. We also demonstrate the performance of the proposed method by a study of an electroencephalography data set to construct a brain network.
Monday, February 6, 2:00pm
Francesca Crucinio (ENSAE)
A divide and conquer sequential Monte Carlo approach to high dimensional filtering
We propose a divide-and-conquer approach to filtering which decomposes the state variable into low-dimensional components to which standard particle filtering tools can be successfully applied and recursively merges them to recover the full filtering distribution. It is less dependent upon factorization of transition densities and observation likelihoods than competing approaches and can be applied to a broader class of models. Performance is compared with state-of-the-art methods on a benchmark problem and it is demonstrated that the proposed method is broadly comparable in settings in which those methods are applicable, and that it can be applied in settings in which they cannot.
Monday, March 6, 2:00pm
Austin Stromme (MIT)
Statistical optimal transport: barycenters and entropic regularization
Optimal transport (OT) is a popular paradigm for comparing and interpolating datasets using a minimum energy criterion, and gives rise to a Riemannian-like geometry on probability distributions. In this talk, we'll study statistical and connected algorithmic questions relating to some of the most promising facets of OT: barycenters, OT on Gaussians, entropic regularization, and transfer learning. Throughout, we'll emphasize dimension-free rates of estimation and practical algorithms, as well as the geometry and optimization-centric viewpoint that yields such results.
Monday, March 20, 2:00pm
Guillaume Maillard (Université du Luxembourg)
Robust density estimation in total variation under a shape constraint
We solve the problem of estimating the distribution of presumed i.i.d observations for the total variation loss, under the assumption that the underlying density satisfies a shape constraint of the following type: monotonicity on an interval, unimodality, convexity/concavity on an interval or log-concavity. Our estimator is well-defined even in cases where the MLE isn't, such as for unimodal densities with unknown mode. Moreover, it possesses similar optimality properties, with regard to some global rates of convergence, as the MLE does when it exists. Like the MLE, it also enjoys some adaptation properties with respect to some specific target densities in the model, for which our estimator is proven to converge at a parametric rate. Unlike the MLE, in general, our estimator is robust, not only with respect to model mis-specification, but also to contamination, the presence of outliers among the dataset and the equidistribution assumption. Our main result on the risk of the estimator takes the form of an exponential deviation inequality which is non-asymptotic and involves explicit numerical constants. We deduce from it several global rates of convergence, including some bounds for the minimax L1 risks over sets of convex or monotone densities. These bounds derive from specific results on the approximation of densities which are monotone, convex, concave or log-concave. Such results may be of independent interest. The presentation is based on joint work with Yannick Baraud and Hélène Halconruy.
Wednesday, March 22, 4:00pm
Simon Mauras (Tel Aviv university)
Truthful Matching with Online Items and Offline Agents
We study truthful mechanisms for welfare maximization in online bipartite matching. In our setting, every buyer is associated with a (possibly private) desired set of items, and has a private value for being assigned an item in her desired set. Unlike most online matching settings, where agents arrive online, in our setting the items arrive online in an adversarial order while the buyers are present for the entire duration of the process. This poses a significant challenge to the design of truthful mechanisms, due to the ability of buyers to strategize over future rounds. We provide an almost full picture of the competitive ratios in different scenarios, including myopic vs. non-myopic agents, tardy vs. prompt payments, and private vs. public desired sets. Among other results, we identify the frontier for which the celebrated e/(e−1) competitive ratio for the vertex-weighted online matching of Karp, Vazirani and Vazirani extends to truthful agents and online items. Preprint available at https://arxiv.org/abs/2211.02004
Monday, March 27, 2:00pm
Radu-Alexandru Dragomir (EPFL)
Optimization method for quartic problems
Many tasks in signal processing and learning require minimizing quartic polynomials, such as matrix factorization, phase retrieval and sensor network localization. In this talk, I will give an overview of different gradient-based methods for this problem, including mirror descent and a new homogenized gradient scheme. Additionally, I will show that the performance can be improved by using a preconditioner based on the Lewis leverage scores.
Thursday, March 30, 5:00pm
Dmitrii Ostrovskii (University of Southern California)
Fast and optimal algorithm for online portfolio, and beyond
In his seminal 1991 paper, Thomas M. Cover introduced a simple and elegant mathematical model for stock trading, which later on came to be known as online portfolio selection (OPS). This model is specified with only two integer parameters: the number of assets d and time horizon T. In each round, the trader selects a portfolio—distribution of the current capital over the set of d assets; then, the adversary generates a vector of returns (i.e., relative prices of the assets), and the trader’s capital is multiplied by the “aggregated return”. Despite its simplicity, this model captures the two key properties of the stock market: (i) market “plays against” the trader; (ii) money accumulates multiplicatively. In the 30 years that followed, the OPS model has received a great deal of attention from the learning theory, information theory, and quantitative finance communities.
In the same paper, Cover also proposed an algorithm, termed Universal Portfolios, that admitted a strong performance guarantee: the regret of O(d log(T)) against the best portfolio in hindsight, and without any restrictions of returns or portfolios. This guarantee was later on shown to be worst-case optimal, and no other algorithm attaining it has been found to date. Unfortunately, exact computation of a universal portfolio amounts to averaging over a log-concave distribution, which is a challenging task. Addressing this, Kalai and Vempala (2002) achieved the running time of O(d^4 T^14) per round via sampling techniques. However, with such a running time essentially prohibiting problems of nontrivial size, yet remaining state-of-the-art, the problem of finding an optimal and practical OPS algorithm was left open.
In this talk, after discussing some of the arising technical challenges, I shall present a fast and optimal OPS algorithm that combines regret optimality with the runtime of O(d^2 T), thus dramatically improving state of the art. As we shall see, its motivation and analysis are closely related to establishing a sharp bound on the accuracy of the Laplace approximation for a log-concave distribution with a polyhedral support, which is a result of independent interest. Finally, I will briefly discuss possible extensions of these ideas beyond the OPS context—in particular, in quantum state tomography.
Monday, April 3, 2:00pm
Anru Zhang (Duke University)
Tensor Learning in 2020s: Methodology, Theory, and Applications
The analysis of tensor data, i.e., arrays with multiple directions, has become an active research topic in the era of big data. Datasets in the form of tensors arise from a wide range of scientific applications. Tensor methods also provide unique perspectives to many high-dimensional problems, where the observations are not necessarily tensors. Problems in high-dimensional tensors generally possess distinct characteristics that pose great challenges to the data science community. In this talk, we discuss several recent advances in tensor learning and their applications in genomics and computational imaging. We also illustrate how we develop statistically optimal methods and computationally efficient algorithms that interact with the modern theories of computation, high-dimensional statistics, and non-convex optimization.
Wednesday, April 5, 9:45am
Claire Boyer (Sorbonne Université)
Is interpolation benign for random forest regression?
Statistical wisdom suggests that very complex models, interpolating training data, will be poor at predicting unseen examples.Yet, this aphorism has been recently challenged by the identification of benign overfitting regimes, specially studied in the case of parametric models: generalization capabilities may be preserved despite model high complexity.While it is widely known that fully-grown decision trees interpolate and, in turn, have bad predictive performances, the same behavior is yet to be analyzed for Random Forests (RF).In this paper, we study the trade-off between interpolation and consistency for several types of RF algorithms. Theoretically, we prove that interpolation regimes and consistency cannot be achieved simultaneously for several non-adaptive RF. Since adaptivity seems to be the cornerstone to bring together interpolation and consistency, we study interpolating Median RF which are proved to be consistent in the interpolating regime. This is the first result conciliating interpolation and consistency for RF, highlighting that the averaging effect introduced by feature randomization is a key mechanism, sufficient to ensure the consistency in the interpolation regime and beyond. Numerical experiments show that Breiman's RF are consistent while exactly interpolating, when no bootstrap step is involved. We theoretically control the size of the interpolation area, which converges fast enough to zero, giving a necessary condition for exact interpolation and consistency to occur in conjunction.
Thursday, April 3, 2:15pm
Matus Telgarsky (University of Illinois, Urbana Champaign)
Searching for the implicit bias of deep learning
What makes deep learning special — why is it effective in so many settings where other models fail? This talk will present recent progress from three perspectives. The first result is approximation-theoretic: deep networks can easily represent phenomena that require exponentially-sized shallow networks, decision trees, and other classical models. Secondly, I will show that their statistical generalization ability — namely, their ability to perform well on unseen testing data — is correlated with their prediction margins, a classical notion of confidence. Finally, comprising the majority of the talk, I will discuss the interaction of the preceding two perspectives with optimization: specifically, how standard descent methods are implicitly biased towards models with good generalization. Here I will present two approaches: the strong implicit bias, which studies convergence to specific well-structured objects, and the weak implicit bias, which merely ensures certain good properties eventually hold, but has a more flexible proof technique.
Tuesday, May 9, 10:30am
Steffen Grünewalder (Newcastle University)
Compressed Empirical Measures
I will discuss some of my recent results on compressing the empirical measure in the context of finite dimensional reproducing kernel Hilbert spaces (RKHSs). The aim is to significantly reduce the size of the sample while preserving minimax optimal rates of convergence. Such a reduction in size is of crucial importance when working with kernel methods in the context of large-scale data since kernel methods scale poorly with the sample size. In the RKHS context, an embedding of the empirical measure is contained in a convex set within an RKHS and can be approximated by using convex optimization techniques. Such an approximation gives rise to a small core-set of data points. A key quantity that controls the size of such a core-set is the size of the largest ball that fits within the convex set and which is centred at the embedding of the empirical measure. I will give an overview of how high probability lower bounds on the size of such a ball can be derived before discussing how the approach can be adapted to standard problems such as non-linear regression. (The talk will be based on an extended version of https://arxiv.org/pdf/2204.08847.pdf).
Monday, May 22, 2:00pm
Jordan Serres (ENSAE)
Stein's method for stability estimates of the Poincaré constant
The Poincaré inequality governs the exponential convergence rate of algorithms such as Langevin dynamics. Interesting questions are then to understand how the Poincaré constant changes when the dynamics is perturbed, or to understand when this constant is minimal under certain constraints. In this talk, I will present some such results in the context of Markov diffusions. Their proof is based in particular on Stein's method for general one-dimensional distributions.
Monday, June 5, 2:00pm
Zijian Guo (Rutgers University)
Doubly Debiased Lasso: High-Dimensional Inference under Hidden Confounding
Inferring causal relationships or related associations from observational data can be invalidated by the existence of hidden confounding. We focus on a high-dimensional linear regression setting, where the measured covariates are affected by hidden confounding and propose the Doubly Debiased Lasso estimator for individual components of the regression coefficient vector. Our advocated method simultaneously corrects both the bias due to estimation of high-dimensional parameters as well as the bias caused by the hidden confounding. We establish its asymptotic normality and also prove that it is efficient in the Gauss-Markov sense. The validity of our methodology relies on a dense confounding assumption, i.e. that every confounding variable affects many covariates. The finite sample performance is illustrated with an extensive simulation study and a genomic application.
This is based on a joint work with Domagoj Cevid and P. Buhlmann.
Monday, June 12, 2:00pm
Nikita Zhivotovskiy (UC Berkeley)
Optimal PAC Bounds without Uniform Convergence
In statistical learning theory, the problem of determining sample complexity of realizable binary classification for VC classes was a longstanding challenge. Notable advancements by Simon and Hanneke established sharp upper bounds, but their argument's reliance on the uniform convergence principle curtailed its broader applicability to learning settings like multiclass classification. In this presentation, we will discuss a new technique to resolve this limitation and introduce optimal high probability risk bounds within a framework that surpasses uniform convergence constraints. Beyond binary classification, we will also delve into applications in scenarios where uniform convergence is notably sub-optimal. For multiclass classification, we will prove an optimal risk bound that scales with the one-inclusion hypergraph density of the class, effectively addressing the sub-optimality in the analysis by Daniely and Shalev-Shwartz. Additionally, for realizable bounded regression with absolute loss, we will derive an optimal risk bound based on a revised version of the scale-sensitive dimension, thus refining the results of Bartlett and Long. This talk is based on the joint work with Ishaq Aden-Ali, Yeshwanth Cherapanamjeri, and Abhishek Shetty.