8 April 2024
Speaker: Maximilian Wiesmann (Max Planck Institute for Mathematics in the Sciences)
Title: Adversarial Regularization Regimes in Classification Tasks
Abstract: In this talk we demonstrate the possibility of a trend reversal in binary classification tasks between the dataset and a classification score obtained from a trained model. This trend reversal occurs for certain choices of the regularization parameter for model training, namely, if the parameter is contained in what we call the adversarial regularization regime. For ridge regression, we give necessary and sufficient algebraic conditions on the dataset for the existence of an adversarial regularization regime. Moreover, our results provide a data science practitioner with a hands-on tool to avoid hyperparameter choices suffering from trend reversal. We furthermore present numerical results on adversarial regularization regimes for logistic regression. Finally, we draw connections to datasets exhibiting Simpson’s paradox, providing a natural source of adversarial datasets.
15 January 2024
Speaker: Mona Azadkia (The London School of Economics and Political Science)
Title: A Simple Measure of Conditional Dependence
Abstract: In this talk, we introduce a coefficient of conditional dependence between two random variables Y and Z given a set of other variables X_1,..., X_p, based on an i.i.d. sample. The coefficient has a long list of desirable properties, the most important of which is that under absolutely no distributional assumptions, it converges to a limit in [0,1], where the limit is 0 if and only if Y and Z are conditionally independent given X_1,..., X_p, and is 1 if and only if Y is equal to a measurable function of Z given X_1,..., X_p. Moreover, it has a natural interpretation as a nonlinear generalization of the familiar partial R^2 statistic for measuring conditional dependence by regression. Using this statistic, we devise a new variable selection algorithm called Feature Ordering by Conditional Independence (FOCI), which is model-free, has no tuning parameters, and is provably consistent under sparsity assumptions. A number of applications to synthetic and real datasets are worked out.
20 November 2023
Speaker: Junhyung Park (Max Planck Institute for Intelligent Systems)
Title: A Measure-Theoretic Axiomatisation of Causality
Abstract: Causality is a central concept in a wide range of research areas, yet there is still no universally agreed axiomatisation of causality. We view causality both as an extension of probability theory and as a study of what happens when one intervenes on a system, and argue in favour of taking Kolmogorov's measure-theoretic axiomatisation of probability as the starting point towards an axiomatisation of causality. To that end, we propose the notion of a causal space, consisting of a probability space along with a collection of transition probability kernels, called causal kernels, that encode the causal information of the space. Our proposed framework is not only rigorously grounded in measure theory, but it also sheds light on long-standing limitations of existing frameworks including, for example, cycles, latent variables and stochastic processes.
6 November 2023
Speaker: Christof Schötz (Potsdam Institue for Climate Impact Research)
Title: Learning Ordinary Differential Equations from Noisy Data
Abstract: We consider the task of learning the right-hand side function f of an ordinary differential equation u'=f(u) from observed (potentially noisy) data. Only nonrestrictive smoothness assumptions are imposed on f, so that any successful estimation procedure must be nonparametric. This problem is approached from two sides: statistical theory and a simulation study. In the theoretical part, for different setups, we present different estimators that achieve the optimal minimax convergence rate. I.e., we show upper and lower bounds on the mean squared error for estimating f. In the simulation study, different approaches are applied to different (low-dimensional) dynamical systems, including chaotic and non-chaotic ones, and their performances are compared.
16 October 2023
Speaker: Ernesto De Vito (University of Genova)
Title: Understanding Neural Networks with Reproducing Kernel Banach Spaces
Abstract: Characterizing the function spaces corresponding to neural networks can provide a way to understand their properties. The talk is devoted to show how the theory of reproducing kernel Banach spaces can be used to characterize the function spaces corresponding to neural networks. In particular, I will show a representer theorem for a class of reproducing kernel Banach spaces, which includes one hidden layer neural networks of possibly infinite width. Furthermore, I will prove that, for a suitable class of ReLU activation functions, the norm in the corresponding reproducing kernel Banach space can be characterized in terms of the inverse Radon transform of a bounded real measure.
The talk is based on on joint work with F. Bartolucci, L. Rosasco and S. Vigogna.
11 August 2023
Speaker: Galen Reeves (Duke University)
Title: Inference from heterogeneous pairwise data
Abstract: High-dimensional inference problems involving heterogeneous pairwise observations arise in a variety of applications, including covariance estimation, clustering, and community detection. In this talk I will present a unified approach for the analysis of these problems that yields exact formulas for both the fundamental and algorithmic limits. The high-level idea is to model the observations using a linear Gaussian channel whose input is the tensor product of the latent variables. The limits of this general model are then described by a finite-dimensional variational formula, which provides a decoupling between the prior information about the latent variables (usually a product measure) and the specific structure of the observations. I will also discus some recent results on computationally efficient methods based on approximate message passing.
10 July 2023
Speaker: Michael Leung (UC Santa Cruz)
Title: Unconfoundedness with Network Interference
Abstract: This paper studies nonparametric estimation of treatment and spillover effects using observational data from a single large network. We consider a model of network interference that allows for peer influence in selection into treatment or outcomes but requires influence to decay with network distance. In this setting, the network and covariates of all units can be potential sources of confounding, in contrast to existing work that assumes confounding is limited to a known, low-dimensional function of these objects. To estimate the first-stage nuisance functions of the doubly robust estimator, we propose to use graph neural networks, which are designed to approximate functions of graph-structured inputs. Under our model of interference, we derive primitive conditions for a network analog of approximate sparsity, which provides justification for the use of shallow architectures.
This is joint work with Pantelis Loupos.
29 June 2023
Speaker: Björn Andres (TU Dresden)
Title: Two generalizations of a multicut problem
Abstract: The NP-hard minimum cost multicut problem is a fundamental mathematical abstraction of the task of clustering in the absence of prior knowledge on the number or size of clusters. This talk is about two generalizations of the problem, two techniques for addressing hardness and some applications. First are cubic correlation clustering and partial optimality. Cubic correlation clustering is a problem whose feasible solutions are the partitions of a finite set and whose objective function attributes costs to triples of its elements. Partial optimality is a technique for solving parts of the problem optimally and efficiently, despite its NP-hardness. Second are the lifted multicut problem and its polyhedral geometry. The lifted multicut problem is a problem whose feasible solutions are the decompositions of a graph and whose objective function is defined with respect to the transitive closure of the graph. We discuss some properties of lifted multicut polytopes and one newly found property of the comprehensively-studied clique partitioning polytope.
Stein, Di Gregorio and A. Partial Optimality in Cubic Correlation Clustering. International Conference on Machine Learning (ICML) 2023 (to appear). https://arxiv.org/pdf/2302.04694
A., Di Gregorio, Irmai and Lange. A Polyhedral Study of Lifted Multicuts. Discrete Optimization 47:100757, 2023. https://doi.org/10.1016/j.disopt.2022.100757
15 May 2023
Speaker: Shuangning Li (Harvard)
Title: Random Graph Asymptotics for Treatment Effect Estimation under Network Interference
Abstract: The network interference model for causal inference places all experimental units at the vertices of an undirected exposure graph, such that treatment assigned to one unit may affect the outcome of another unit if and only if these two units are connected by an edge. This model has recently gained popularity as means of incorporating interference effects into the Neyman--Rubin potential outcomes framework; and several authors have considered estimation of various causal targets, including the direct and indirect effects of treatment. In this paper, we consider large-sample asymptotics for treatment effect estimation under network interference in a setting where the exposure graph is a random draw from a graphon. When targeting the direct effect, we show that---in our setting---popular estimators are considerably more accurate than existing results suggest, and provide a central limit theorem in terms of moments of the graphon. Meanwhile, when targeting the indirect effect, we leverage our generative assumptions to propose a consistent estimator in a setting where no other consistent estimators are currently available. We also show how our results can be used to conduct a practical assessment of the sensitivity of randomized study inference to potential interference effects. Overall, our results highlight the promise of random graph asymptotics in understanding the practicality and limits of causal inference under network interference.
This is joint work with Stefan Wager.
24 April 2023
Speaker: Nicole Mücke (TU Braunschweig)
Title: How much Overparameterization is sufficient to learn Neural Networks?
Abstract: Finding bounds on the amount of sufficient overparameterization and giving bounds on the sample complexity to learn neural networks has become of interest recently. In this talk, I'll give a review on existing results and present refined bounds for the test error for learning shallow neural networks with gradient descent in the lazy training regime. I show that the eigenvalue decay of the neural tangent kernel and the smoothness of the objective are crucial for a better understanding of how many neurons are sufficient for optimal generalization bounds.
This is a joint work with Mike Nguyen (TU Braunschweig).
17 April 2023
Speaker: Yunan Yang (ETH Zürich)
Title: Optimal Transport for Inverse Problems and the Implicit Regularization
Abstract: Optimal transport has been one interesting topic of mathematical analysis since Monge (1781). The problem's close connections with differential geometry and kinetic descriptions were discovered within the past century, and the seminal work of Kantorovich (1942) showed its power to solve real-world problems. Recently, we proposed the quadratic Wasserstein distance from optimal transport theory for inverse problems, tackling the classical least-squares method's longstanding difficulties, such as nonconvexity and noise sensitivity. The work was soon adopted in the oil industry. As we advance, we discover that the advantage of changing the data misfit is more general in a broader class of data-fitting problems by examining the preconditioning and "implicit" regularization effects of different mathematical metrics as the objective function in optimization, as the likelihood function in Bayesian inference, and as the metric space for numerical solution to PDEs.
30 January 2023
Speaker: Max von Renesse (Leipzig University)
Title: The Sinkhole Algorithm for Unbalanced Optimal Transport
Abstract: We present a Schrödinger Problem for Unbalanced Optimal Transport and an Iterative Scaling Algorithm for the associated relaxed Optimal Transport Problem which allows for mass creation and annihilation. Based on joint work with Bernd Sturmfels, Simon Telen, and Francois-Xavier Vialard.
16 January 2023
Speaker: Arthur Jacot (NYU)
Title: Implicit Bias of Large Depth Networks: the Bottleneck Rank
Abstract: Several neural network models are known to be biased towards some notion of sparsity: minimizing rank in linear networks or minimizing the effective number of neurons in the hidden layer of a shallow neural network. I will argue that the correct notion of sparsity for large depth DNNs is the so-called Bottleneck (BN) rank of a (piecewise linear) function $f$, which is the smallest integer $k$ such that there is a factorization $f=g\circ h$ with inner dimension $k$.
First, the representation cost of DNNs converges as the depth goes to infinity to the BN rank over a large family of functions. Second, for sufficiently large depths, the global minima of the $L_{2}$-regularized loss of DNNs are approximately BN-rank 1, in the sense that there is a hidden layer whose representation of the data is approximately one dimensional. When fitting a true function with BN-rank $k$, the global minimizers recover the true rank if $k=1$. If $k>1$ results suggest that the true rank is recovered for intermediate depths.
BN-rank recovery implies that autoencoders are naturally denoising, and that the class boundaries of a DNN classifier have certain topological properties (such as the absence of tripoints when $k=1$). Both of these phenomena are observed empirically in large depths networks.
19 December 2022
Speaker: Michael Muehlebach (MPI for Intelligent Systems, Tuebingen)
Title: Learning-friendly constrained optimization
Abstract: My talk will be divided into two parts. The first part introduces a class of first-order methods for constrained optimization, which arises from analogies to non-smooth dynamical systems. The key underlying idea is to express constraints in terms of velocities instead of positions, which has the algorithmic consequence that optimizations over feasible sets at each iteration are replaced with optimizations over local, sparse convex approximations. The result is a simplified suite of algorithms and an expanded range of possible applications in machine learning.
In the second part of the talk, I will present an analysis of stochastic gradient algorithms for finite-sum minimax optimization. This part will highlight that random reshuffling, which shuffles the data at every epoch, leads to faster convergence rates than uniform sampling, where the data points are sampled with replacement. This can be attributed to a variance reduction caused by the without-replacement sampling, and provides a rigorous explanation for an important and widely-used heuristic.
05 December 2022
Speaker: Jonas Krampe (Cornell University)
Title: Inference for High-Dimensional Time Series
Abstract: In this talk, I will present inference results that can help to analyze the dependence structure of high-dimensional time series. In the first part, I will present inference results for sparse high-dimensional vector autoregressive models and their relation to detecting Granger-causality. In the second part, I will focus on the frequency domain of time series. Parameters like the spectral density matrix and its inverse, the coherence or the partial coherence, encode comprehensively the complex linear relations between the component processes of the multivariate system. I present inference procedures for such parameters in a high-dimensional, time series setup. Towards this goal, we first focus on the derivation of consistent estimators of the coherence and, more importantly, of the partial coherence which possess manageable limiting distributions that are suitable for testing purposes. Statistical tests of the hypothesis that the maximum over frequencies of the coherence, respectively, of the partial coherence, do not exceed a prespecified threshold value are developed. This approach allows for testing hypotheses for individual coherences and/or partial coherences as well as for multiple testing of large sets of such parameters. In the latter case, a consistent procedure to control the false discovery rate is developed.
21 November 2022
Speaker: Keith Levin (University of Wisconsin-Madison)
Title: Bootstrapping Network Data: Conditional versus Marginal Distributions
Abstract: In network analysis, one frequently must perform inference based upon only one sampled network. This poses a challenge for bootstrap-based approaches, which typically require an iid sample. A class of network models called latent space models overcome these difficulties by generating a network based on unobserved geometric structure, but this raises the question of whether inference in such models should be conducted by conditioning on this latent structure or by marginalizing over it. We develop bootstrap schemes for both cases, i.e., conditional and marginal bootstrap methods for network data. We establish bootstrap validity for both schemes for a broad class of network statistics, including modularity, which has not previously been addressed within the network bootstrap literature. Our experiments include simulated data as well as a thorough exploration of a data set arising from Microsoft Bing search data.
7 November 2022
Speaker: Sophie Langer (University of Twente)
Title: On the rate of convergence of deep neural networks
Abstract: Although the application of deep neural networks to real-world problems has become ubiquitous, the question of why they are so effective has not yet been satisfactorily answered. However, some progress has been made in establishing an underlying mathematical foundation. This talk surveys results on statistical risk bounds of deep neural networks. In particular, we focus on the question of when neural networks bypass the curse of dimensionality. Here we discuss results for vanilla feedforward and convolutional neural networks as well as regression and classification settings.
17 October 2022
Speaker: Ziwei Ji (Google Research)
Title: Agnostic Learnability of Halfspaces via Logistic Loss
Abstract: We investigate approximation guarantees provided by logistic regression for the fundamental problem of agnostic learning of homogeneous halfspaces. Previously, for a certain broad class of "well-behaved" distributions, Diakonikolas et al. (2020) proved an tilde{Omega}(OPT) lower bound, while Frei et al. (2021) proved an tilde{O}(sqrt{OPT}) upper bound, where OPT denotes the best zero-one/misclassification risk of a homogeneous halfspace. In this paper, we close this gap by constructing a well-behaved distribution on which the global minimizer of the logistic risk only achieves Omega(sqrt{OPT}) misclassification risk, matching the upper bound in (Frei et al., 2021). On the other hand, we also show that if we impose a radial-Lipschitzness condition in addition to well-behaved-ness on the distribution, logistic regression on a ball of bounded radius reaches tilde{O}(OPT) misclassification risk. Our techniques also show that for any well-behaved distribution, regardless of radial Lipschitzness, we can overcome the lower bound for logistic loss simply at the cost of one additional convex optimization step involving the hinge loss and attain tilde{O}(OPT) misclassification risk. Finally, we will briefly discuss our results on shallow ReLU networks, where we show that logistic regression can even attain the optimal Bayes risk, and the necessary iteration, sample, and architectural complexities of our analysis all scale naturally with a certain complexity measure of the true conditional model.