Seminar Series on the 

Mathematics of Data Science

Department of Applied Mathematics 

With the MDS Seminar Series, we would like to launch a lecture series in which both researchers from the University of Twente and external researchers present their current work in the field of mathematics of data science. The aim is to get to know and understand the research of other groups and disciplines better. It offers the opportunity for regular exchange as well as a basis for possible collaborations. 

Format

Seminars are held on campus and via Teams. All seminars take place every fortnight on Mondays at 4 p.m. unless otherwise stated (see the program below for the dates and the rooms).

If you want to receive Outlook  invites to the seminars, please send an e-mail to Mariëlle Slotboom .

Upcoming seminars

Mo Jan 30   (Carré 2H) Mengwu Guo

Data-driven model reduction through probabilistic machine learning

Efficient and credible multi-query, real-time simulations constitute a critical enabling factor for digital twinning, and data-driven reduced-order modeling is a natural choice for achieving this goal. This talk will discuss two probabilistic methods for the learning of reduced-order dynamics, in which a significantly reduced dimensionality of dynamical systems guarantees improved efficiency, and the endowed uncertainty quantification certifies computational credibility.

 

The first method is the Bayesian reduced-order operator inference, a non-intrusive approach that inherits the formulation structure of projection-based reduced-state governing equations yet without requiring access to the full-order solvers. The reduced-order operators are estimated using Bayesian inference with Gaussian priors, and two fundamentally different strategies of likelihood definition will be discussed. Recovered as posterior Gaussian distributions conditioning on projected state data, the reduced-order operators probabilistically describe a low-dimensional dynamical system for the predominant latent states, and provide a naturally embedded Tikhonov regularization together with a quantification of modeling uncertainties.

 

The second method employs deep kernel learning — a probabilistic deep learning tool that integrates neural networks into manifold Gaussian processes — for the data-driven discovery of low-dimensional latent dynamics from high-dimensional measurements given by noise-corrupted images. This tool is utilized for both the nonlinear dimensionality reduction and the representation of reduced-order dynamics. Numerical results have shown the effectiveness of deep kernel learning in the denoising and uncertainty quantification throughout model reduction. 


Mo Feb 13   (RA2503)         Nicole Mücke


Learning linear operators: Infinite-dimensional regression as a well-behaved non-compact inverse problem

We consider the problem of learning a linear operator θ between two Hilbert spaces from empirical observations, which we interpret as least squares regression in infinite dimensions. We show that this goal can be reformulated as an inverse problem for θ with the undesirable feature that its forward operator is generally non-compact (even if θ is assumed to be compact or of p-Schatten class). However, we prove that, in terms of spectral properties and regularisation theory, this inverse problem is equivalent to the known compact inverse problem associated with scalar response regression. Our framework allows for the elegant derivation of dimension-free rates for generic learning algorithms under Hölder-type source conditions. The proofs rely on the combination of techniques from kernel regression with recent results on concentration of measure for sub-exponential Hilbertian random variables. The obtained rates hold for a variety of practically-relevant scenarios in functional regression as well as nonlinear regression with operator-valued kernels and match those of classical kernel regression with scalar response. 

Mo Feb 27  (RA2503)    Juntong Chen


Robust estimation of a regression function in exponential families

In this talk, our aim is to have a unified treatment of the problem of estimating a regression function in one-parameter exponential families. Besides, we want to go beyond the common assumption that the true distribution of the data exactly belongs to the statistical model we consider.  Our estimation methodology is based on Rho-estimation. Strategies based on a single model and model selection will be discussed, where each model is assumed to be a VC-subgraph class. We present non-asymptotic risk bounds for the resulting estimators and explain the robustness with respect to data contamination, the presence of outliers and model misspecification. To remedy the curse of dimensionality, we handle estimation under structure assumptions on the regression functions. We consider specific models in these cases and derive VC dimensional bounds for them. Combining the existing approximation results, we show that under a suitable parametrization of the exponential family, the rates of convergence we get coincide with those derived in the Gaussian regression setting under the same structure assumption. At the end of the talk, we carry out a simulation study to compare the performance of Rho-estimators to the maximum likelihood estimator and median-based ones. 

Mo March 6 (RA2503) Hanyuan Hang


Bagged k-Distance for Mode-Based Clustering Using the Probability of Localized Level Sets

We propose an ensemble learning algorithm named bagged k-distance for mode-based clustering (BDMBC) by putting forward a new measurement called the probability of localized level sets (PLLS), which enables us to find all clusters for varying densities with a global threshold. On the theoretical side, we show that with a properly chosen number of nearest neighbors k_D in the bagged k-distance, the sub-sample size s, the bagging rounds B, and the number of nearest neighbors k_L for the localized level sets, BDMBC can achieve optimal convergence rates for mode estimation. It turns out that with a relatively small B, the sub-sample size s can be much smaller than the number of training data n at each bagging round, and the number of nearest neighbors k_D can be reduced simultaneously. Moreover, we establish optimal convergence results for the level set estimation of the PLLS in terms of Hausdorff distance, which reveals that BDMBC can find localized level sets for varying densities and thus enjoys local adaptivity. On the practical side, we conduct numerical experiments to empirically verify the effectiveness of BDMBC for mode estimation and level set estimation, which demonstrates the promising accuracy and efficiency of our proposed algorithm.

Mo March 20 (RA2503) Johannes Schmidt-Hieber


Statistical learning in biological neural networks

Compared to artificial neural networks (ANNs), the brain learns faster, generalizes better to new situations and consumes much less energy. ANNs are motivated by the functioning of the brain but differ in several crucial aspects. In particular, it is biologically implausible that the learning of the brain is based on gradient descent. In this talk we look at the brain as a statistical method for supervised learning. The main contribution is to relate the local updating rule of the connection parameters in biological neural networks (BNNs) to a zero-order optimization method. 

The talk is based on arxiv:2301.11777.

Mo April 3   (RA2503)       Martin Holler    


TBA

TBA


TBA

TBA


Generator-network-based regularization in imaging – a function space perspective

Classical regularization approaches for inverse problems in imaging often comprise a natural modeling of images in function space. The most famous example here is probably the Total Variation functional and the corresponding space of functions of Bounded Variation. This perspective on images has the advantage that questions such as well-posedness of the regularization approach or regularity of recovered images can be answered independently of particular discretizations. With the rise of neural-network- and learning-based regularization approaches for imaging, classical methods are becoming increasingly replaced by new techniques. This also raises new challenges, as the mathematical understanding of neural-network- or learning-based regularization approaches, in particular in view of function space modeling, still lacks significantly behind what is known in the classical theory.

The purpose of this talk is to address these challenges from two perspectives. First, we introduce a variational model for image regularization that follows the same principles as the famous deep image prior – but within a simpler, function-space modeling. The latter not only yields results comparable to those of the deep image prior with a reduced number of parameters, but also enables us to obtain precise statements about the regularity of images that can be recovered.

In a second part of the talk, we will then discuss a general regularity analysis for images generated by convolutional-neural-network architectures. As we will show, depending on the the type of weight-penalization using during training, and depending on the network depth, the regularity of images obtained with such approaches does not match the classical modeling of images as piecewise smooth functions. As practical consequence, this suggest to refrain from basic L2 regularization of network weights in case of images being the network output.


TBA

TBA


Mo April 17 (RA2503) Jim Portegies              


Diffusion Variational Autoencoders and small-time asymptotics of the entropy of the heat kernel on a Riemannian manifold

Variational Autoencoders are unsupervised machine learning algorithms, that encode data points to points in a so-called latent space. It is often hoped that the encoding carries meaningful information about the datasets, which is sometimes formulated as that it disentangles latent factors. We have introduced the Diffusion Variational Autoencoder to facilitate this disentanglement, in case the latent space has the structure of a Riemannian manifold. An important element of the algorithm is the efficient approximation to the entropy of the heat kernel on a Riemannian manifold. We will discuss recent work in which we derived general asymptotic expansions for this entropy. This is joint work with Luis Pérez Rey, Vlado Menkovski and Mahefa Ravelonanosy.

Mo May 1 (RA4336 ) Daniel Walter


Towards optimal sensor placement for inverse problems in spaces of measures

In this talk, the inverse problem of estimating an unknown finite number of, e.g., acoustic point sources from few measurements contaminated by Gaussian noise is considered. We present a statistical framework for this type of problem based on a Thikonov-type point estimator involving total variation regularization as well as a suitable mean-squared error based the Hellinger-Kantorovich distance to the ground truth.      

Based on careful linearization arguments, an upper bound on the latter is derived, which allows, first, to derive asymptotic convergence results for the small variance case and, second, can be used as a design criterion in optimal sensor placement for sparse inverse problems.

Mo May 22 (RA4336 ) Martin Wahl


A kernel-based analysis of Laplacian eigenmaps

Laplacian eigenmaps and diffusion maps are nonlinear dimensionality reduction methods that use the eigenvalues and eigenvectors of normalized graph Laplacians. From a mathematical perspective, the main problem is to understand these empirical Laplacians as spectral approximations of the underlying Laplace-Beltrami operator. In this talk, we study Laplacian eigenmaps through the lens of kernel PCA. This leads to novel points of view and allows to leverage results for empirical covariance operators in infinite dimensions.


Mo June 12 (RA4237) Hongwei Wen




Medians of Forests for Robust Density Estimation

We propose an ensemble learning algorithm called medians of forests for robust density estimation (MFRDE), which achieves robustness against outliers through a pointwise median operation on forest density estimators fitted on subsampled datasets. Compared to robust kernel-based methods, the local property of MFRDE enables us to choose larger subsampling sizes, sacrificing less accuracy for density estimation while achieving robustness. On the theoretical side, by introducing new concepts on local outliers, we show that even if the number of outliers reaches a certain polynomial order in the sample size, MFRDE is able to achieve almost the same convergence rate as the same algorithm on uncontaminated data, whereas robust kernel-based methods fail. On the practical side, real data experiments show that MFRDE outperforms existing robust kernel-based methods. Moreover, we apply MFRDE to anomaly detection to showcase a further application.

Mo Jun 26 (RA4336) Steffen Dereich


On the existence of optimal shallow networks

In this talk we discuss existence of global minima in optimisation problems over shallow neural networks. More explicitly, the function class over which we minimise is the family of all functions that can be expressed as artificial residual or feedforward neural networks with one hidden layer featuring a specified number of neurons with ReLU (or Leaky ReLU) activation. We give existence results. Moreover, we provide counterexamples that illustrate the relevance of the assumptions imposed in the theorems.


Mo Jul 3 (RA2334) Dongwei Ye


Non-intrusive reduced-order modelling with surface registration

Computational fluid dynamics is a common tool in cardiovascular science and engineering to simulate, predict and study hemodynamics in arteries. Owing to the complexity and scale of cardiovascular flow problems, especially for patient-specific cases, the model evaluation could be computationally expensive. The domains for these hemodynamic simulations are either generated synthetically based on a simplification of the anatomical geometry or directly segmented from clinical image data. These anatomical geometries of a kind, such as the segments of arteries and organs are similar in terms of general shape but different in details. In this work, a data-driven surrogate model is proposed for the efficient prediction of blood flow simulations on similar but distinct domains. The proposed surrogate model leverages surface registration to parameterise those similar but distinct shapes and formulate corresponding hemodynamics information into geometry-informed snapshots by the diffeomorphism constructed between the reference domain and target domain. A non-intrusive reduced-order model for geometrical parameters is subsequently constructed using proper orthogonal decomposition, and a radial basis function interpolator is trained for predicting the reduced coefficients of the reduced-order model based on reduced coefficients of geometrical parameters of the shape. Two examples of blood flowing through a stenosis and a bifurcation are presented and analysed. The proposed surrogate model demonstrates its accuracy and efficiency in hemodynamics prediction and shows its potential application toward real-time simulation or many-query scenarios.


Mo Sep 4 (RA3237) Gabriel Clara


Dropout Regularization Versus \ell_2-Penalization in the Linear Model

We investigate the statistical behavior of gradient descent iterates with dropout in the linear regression model. In particular, non-asymptotic bounds for expectations and covariance matrices of the iterates are derived. In contrast with the widely cited connection between dropout and \ell_2-regularization in expectation, the results indicate a much more subtle relationship, owing to interactions between the gradient descent dynamics and the additional randomness induced by dropout.


This talk is based on joint work with Sophie Langer and Johannes Schmidt-Hieber

Mo Sep 18  (RA4334) Bernhard Stankewitz 


Early stopping for iterative estimation procedures

Increasingly high-dimensional data sets require that estimation methods do not only satisfy statistical guarantees but also remain computationally feasible. In modern statistics and machine learning early stopping of algorithms has been identified as one of the tools to address such issues. In this context, we present recent results both on the implicit regularization effect of early stopping and the potential of data driven early stopping rules to perform adaptive estimation in a sequential, computationally efficient manner.

Mo Oct 02 (HB 2A)   Jonathan Chirinos Rodriguez


A supervised learning approach to regularization of inverse problems

Selecting the best regularization parameter in inverse problems is a classical and yet challenging problem. In the past years, data-driven techniques have gained popularity because, while circumventing some limitations from classical approaches, they seem to perform better in practice. In this talk, we propose and study a statistical learning approach based on empirical risk minimization. Our main contribution is a theoretical analysis, showing that, provided with enough data, this approach can reach sharp rates. Our analysis draws ideas and techniques from statistical machine learning and regularization theory for ill-posed inverse problems. Finally, we present some numerical simulations that corroborate and illustrate the theoretical findings.  

Mo Oct 23 (SP 1) Ingo Steinwart


TBA

TBA

Mo Nov 6 (RA2237)   Wouter Koolen


Towards Characterizing the First-order Query Complexity of Learning (Approximate) Nash Equilibria in Zero-sum Matrix Games 

In this talk we will look at the complexity of computing mixed strategy equilibria for zero-sum matrix games. We will study the query complexity, asking how much of the matrix has to be read to output an approximate mixed equilibrium? We will review the standard first-order query model, and discuss the upper bounds by Rakhlin and Sridharan (2013).  We will then turn to new lower bounds. We will talk about 


* How this question is delicate (and many lower bound techniques cannot not apply), by demonstrating that one query suffices when the entries of the payoff matrix are guaranteed to be rationals. 


* A lower bound showing that for exact equilibria the entire matrix needs to be read (this takes K queries) 


* New techniques giving the first (yet weak) lower bounds in the approximate case. 


No prior knowledge about games or optimization will be assumed. All are welcome! 

Mo Nov 20 (CR 3D) Francesca Bartolucci


On the variational optimality of neural networks: reproducing kernel Banach spaces and representer theorems for neural networks

Neural networks define functions by composing linear and nonlinear maps in a multi-layer architecture. The functional-analytic study of the function spaces defined by neural networks is a line of research that has recently attracted a lot of attention. Studying the spaces of such functions can provide a new perspective to understand the corresponding learning models and their inductive bias.  We show that neural networks define suitable reproducing kernel Banach spaces. These spaces are equipped with norms that enforce a form of sparsity, enabling them to adapt to potential latent structures within the input data and their representations. In particular, leveraging the theory of reproducing kernel Banach spaces, combined with variational results, we derive representer theorems that justify the finite architectures commonly employed in applications.


Mo Dec 4 (CR 3D) Alessandro Scagliotti


AutoencODEs: an extension of NeurODEs for width-varying Neural Networks

In 2017, it was observed that Residual Neural Networks (ResNets) can be

studied as discretization of continuous-time control systems, which

are often called NeurODEs. In the last years, Control Theory has been

fruitfully applied to study the properties of existing networks, and to

develop new ones. Since the dimension of the phase-space of a NeurODE is

constant, they have not been used so far to model Deep Learning

architectures where the dimensions of the inputs and the outputs vary

along the layers. In particular, this is the case for Autoencoders,

where the dimension of the data is compressed during the encoding phase,

and it increases during the decoding. In our work, we model a

continuous-time Autoencoder, which we call AutoencODE, and we extend to

this case the mean-field control framework already developed for

classical NeurODEs. Moreover, we tackle the case of low Tikhonov

regularization, resulting in possibly non-convex landscapes of the cost

functional. It turns out that most of the results holding globally in

case of high Tikhonov regularization can be recovered in regions where the loss is locally convex.


Mo Dec 11 (CR 3D) Scott Pesme


Saddle-to-Saddle Dynamics in Diagonal Linear Networks

When training neural networks with gradient methods using a small initialisation of the weights, strange types of learning curves appear: the training process makes very little progress for some time, followed by a sharp transition where a new “feature” is suddenly learnt. This behaviour is usually referred to as incremental learning. In this talk, I will show that we can fully describe this phenomenon when considering a toy network architecture. In this simplified setting, we can prove that the gradient flow trajectory jumps from a saddle of the training loss to another. Each visited saddle as well as the jump times can be computed through a recursive algorithm reminiscent of the Homotopy algorithm used for finding the Lasso path.

Mo Jan 29  2024 ( RA 2334) Janusz Meylahn

Quantifying the likelihood of collusion by provably convergent reinforcement learning 

Recent advances in decentralized multiagent reinforcement learning (MARL) have led to the development of algorithms that are provably convergent in a variety of Markov game subclasses. One of these is the Decentralized Q-learning (DQ) algorithm by Arslan and Yüksel (2017) which is provably convergent in weakly acyclic games. In this talk, I will present a new characterization of weak acyclicity and use it to show that the prisoner's dilemma with a memory of one period is weakly acyclic. This new characterization naturally leads to an identification of the basins of attraction of all possible strategy equilibria of the DQ algorithm. Since only a subset of strategy equilibria leads to robust collusion, we can use this to quantify the likelihood of observing algorithmic collusion. Time permitting, I will discuss the effect that fluctuations in the learning process and the addition of a third intermediate action to the prisoner's dilemma have on the likelihood of collusion. 

Mo Feb 5 2024 (RA 2503) Rianne de Heide 


E is the new P

The last decade there has been much attention in the media to the fact that many scientific results are not reproducible, especially in medicine and psychology this is widely acknowledged. Part of the problem is due to the mathematics used for hypothesis testing. The standard methodology is the "p-value based null hypothesis significance testing", despite a myriad of problems surrounding it. We present the E-value, a notion of evidence which overcomes some of the issues. On January 24 we presented our paper "Safe Testing" at the discussion meeting of the Royal Statistical Society in London.


Mo Feb 5 2024  (RA 2503) Claudia Strauch 


Learning to reflect: On data-driven approaches to stochastic optimal control

Reinforcement learning (RL) and stochastic control share the common goal of finding optimal strategies in uncertain environments. RL, which is a subfield of machine learning, has gained popularity in recent years due to its ability to learn optimal behaviours through trial and error. While RL algorithms are actively used in a wide range of domains, formulating theoretical guarantees and interpretable models is a major challenge. In contrast, the mathematical branch of stochastic control provides theoretical solutions to optimal control problems in many scenarios, but their practicability suffers from the standard assumption of known dynamics of the underlying stochastic process.

To overcome this limitation, we propose purely data-driven strategies for stochastic control, which we investigate for ergodic impulse and singular control problems in the context of continuous diffusion processes. In particular, we describe the specific statistical challenges arising in the stochastic control set-up. The exploration vs. exploitation dilemma, which is familiar from RL, plays an essential role in the considerations, and we present some concentration results allowing to deal with it. Finally, we show how these insights can be translated into regret convergence rates of polynomial order for the considered control problems.


Mo Feb 12  2024 (RA 2503) Paul Catala


An Approximate Joint Diagonalization Algorithm for Off-the-Grid Sparse Recovery


Many problems in imaging and data science require to reconstruct, from partial observations, highly concentrated signals, such as pointwise sources or contour lines. This work introduces a novel algorithm for recovering measures supported on such structured domains, given a finite number of their moments. Our approach is based on the traditional singular value decomposition methodology of subspace methods, but lifts their restriction to the framework of Dirac masses, and is able to recover geometrically faithful discrete approximations of measures with density. The crucial step consists in the approximate joint diagonalization of a few non-commuting matrices, which we perform using a quasi-Newton algorithm. Experiments show that our method performs well, not only in the setting of well separated Dirac masses, as predicted by the standard theory of the truncated moment problem, but also in the case of continuous measures, which is not covered by theoretical guarantees and where usual methods empirically fail. We illustrate its applicability in optimal transport problems, where the coupling measure is often localized on the graph of some function.

Mo Feb  19 2024  (RA 2503) Richard Samworth

15:00 - 16:00 hrs

Isotonic subgroup selection

Given a sample of covariate-response pairs, we consider the

subgroup selection problem of identifying a subset of the covariate domain

where the regression function exceeds a pre-determined threshold. We

introduce a computationally-feasible approach for subgroup selection in

the context of multivariate isotonic regression based on martingale tests

and multiple testing procedures for logically-structured hypotheses. Our

proposed procedure satisfies a non-asymptotic, uniform Type I error rate

guarantee with power that attains the minimax optimal rate up to

poly-logarithmic factors. Extensions cover classification, isotonic

quantile regression and heterogeneous treatment effect settings.

Mo Feb  19 2024  (RA 2503) Nicolas Schreuder

16:00 - 17:00 hrs

Fairness in machine learning: a study of the Demographic Parity constraint

In various domains, statistical algorithms trained on personal data take pivotal decisions which influence our lives on a daily basis. Recent studies show that a naive use of these algorithms in sensitive domains may lead to unfair and discriminating decisions, often inheriting or even amplifying biases present in data. In the first part of the talk, I will introduce and discuss the question of fairness in machine learning through concrete examples of biases coming from the data and/or from the algorithms. In a second part, I will demonstrate how statistical learning theory can help us better understand and overcome some of those biases. In particular, I will present a selection of recent results from two of my papers on the Demographic Parity constraint:

- A minimax framework for quantifying risk-fairness trade-off in regression (with E. Chzhen), Ann. Statist. 50(4): 2416-2442 (Aug. 2022). DOI: 10.1214/22-AOS2198;

- Fair learning with Wasserstein barycenters for non-decomposable performance measures (with S. Gaucher and E. Chzhen), AISTATS 2023.


Tue Mar 5 2024  (HB 2A) Merle Behr


 Provable Boolean interaction recovery from tree ensemble obtained via random forests

Random Forests (RFs) are at the cutting edge of supervised machine learning in terms of prediction performance, especially in genomics. Iterative RFs (iRFs) use a tree ensemble from iteratively modified RFs to obtain predictive and stable nonlinear or Boolean interactions of features. They have shown great promise for Boolean biological interaction discovery that is central to advancing functional genomics and precision medicine. However, theoretical studies into how tree-based methods discover Boolean feature interactions are missing. Inspired by the thresholding behavior in many biological processes, we first introduce a discontinuous nonlinear regression model, called the “Locally Spiky Sparse” (LSS) model. Specifically, the LSS model assumes that the regression function is a linear combination of piecewise constant Boolean interaction terms. Given an RF tree ensemble, we define a quantity called “Depth-Weighted Prevalence” (DWP) for a set of signed features S. Intuitively speaking, DWP(S) measures how frequently features in S appear together in an RF tree ensemble. We prove that, with high probability, DWP(S) attains a universal upper bound that does not involve any model coefficients, if and only if S corresponds to a union of Boolean interactions under the LSS model. Consequentially, we show that a theoretically tractable version of the iRF procedure, called LSSFind, yields consistent interaction discovery under the LSS model as the sample size goes to infinity. Finally, simulation results show that LSSFind recovers the interactions under the LSS model, even when some assumptions are violated.

Reference: https://www.pnas.org/doi/10.1073/pnas.2118636119

Co-authors: Yu Wang, Xiao Li, and Bin Yu (UC Berkeley)


Mo Mar 11 2024   (CR 2L) Yi Yu 


Federated Transfer Learning with Differential Privacy

Federated learning is gaining increasing popularity, with data heterogeneity and privacy being two prominent challenges. In this paper, we address both issues within a federated transfer learning framework, aiming to enhance learning on a target data set by leveraging information from multiple heterogeneous source data sets while adhering to privacy constraints. We rigorously formulate the notion of federated differential privacy, which offers privacy guarantee for each data set without assuming a trusted central server.  Under this privacy constraint, we study three classical statistical problems, including univariate mean estimation, low-dimensional linear regression, and high-dimensional linear regression. By investigating the minimax rates and identifying the costs of privacy for these problems, we show that federated differential privacy is an intermediate privacy model between the well established local and central models of differential privacy. Our analyses incorporate data heterogeneity and privacy, highlighting the fundamental costs of both in federated learning and underscoring the benefit of knowledge transfer across data sets.  This is joint work with Mengchu Li (Warwick), Ye Tian (Columbia) and Yang Feng (NYU).


Mo Mar 18  2024  (CR 2L) Gautam Pai


Optimal Transport on the Lie Group of Roto-Translations

The roto-translation group SE(2) has been of active interest in image analysis due to methods that lift the image data to multi-orientation representations defined on this Lie group. This has led to impactful applications of crossing-preserving flows for image de-noising, geodesic tracking, and roto-translation equivariant deep learning.

In this talk, I will enumerate a computational framework for optimal transportation over Lie groups, with a special focus on SE(2). I will describe several theoretical aspects such as the non-optimality of group actions as transport maps, invariance and equivariance of optimal transport, and the quality of the entropic-regularized optimal transport plan using geodesic distance approximations.

Finally, I will illustrate a Sinkhorn like algorithm that can be efficiently implemented using fast and accurate distance approximations of the Lie group and GPU-friendly group convolutions. We report valuable advancements in the experiments on 1) image barycenters, 2) interpolation of planar orientation fields, and 3) Wasserstein gradient flows on SE(2). We observe that our framework of lifting images to SE(2) and optimal transport with left-invariant anisotropic metrics leads to equivariant transport along dominant contours and salient line structures in the image and leads to meaningful interpolations compared to their counterparts on R^2.

*Joint work with Daan Bon, Gijs Bellaard, Olga Mula and Remco Duits from CASA – TU/e. Preprint: https://arxiv.org/abs/2402.15322



Mo Mar 18 2024   (CR 2L) Jiaqi Li


L^2 inference for change points in high-dimensional time series via a Two-Way MOSUM

We propose an inference method for detecting multiple change points in high-dimensional time series, targeting dense or spatially clustered signals. Our method aggregates moving sum (MOSUM) statistics cross-sectionally by an L^2-norm and maximizes them over time. We further introduce a novel Two-Way MOSUM, which utilizes spatial-temporal moving regions to search for breaks, with the added advantage of enhancing testing power when breaks occur in only a few groups. The limiting distribution of an L^2-aggregated statistic is established for testing break existence by extending a high-dimensional Gaussian approximation theorem to spatial-temporal non-stationary processes. Simulation studies exhibit promising performance of our test in detecting non-sparse weak signals. Two applications on equity returns and COVID-19 cases in the United States show the real-world relevance of our algorithms. The R package “L2hdchange” is available on CRAN.


Mo Mar 25 2024 (CR 2L) Shayan Hundrieser


Empirical Optimal Transport: Convergence Rates and Lower Complexity Adaptation

The theory of optimal transport (OT) offers versatile tools for the comparison of probability measures in a geometrically faithful way. In statistical contexts, transport based methodology often relies on estimation of the OT cost through an empirical plug-in approach, which raises questions about its accuracy. The convergence behavior of the empirical OT cost for increasing sample size is dictated by various aspects. These include the intrinsic dimension of the population measures, their concentration, as well as the regularity of the ground cost function. Remarkably, under distinct population measures with different intrinsic dimensions, the convergence rate for the empirical OT cost adapts to the population measures in the most favorable way, being determined by the lower dimensional measure. This phenomenon represents a hallmark feature of empirical optimal transport and is termed "lower complexity adaptation“. The talk is based on joint work with Thomas Staudt and Axel Munk.





Mo Apr 15 2024 (CR 2L) Silke Glas


Model Reduction on Manifolds: a differential geometric framework

Using nonlinear projections and preserving structure in model order reduction (MOR) are currently active research fields. In this paper, we provide a novel differential geometric framework for model reduction on smooth manifolds, which emphasizes the geometric nature of the objects involved. The crucial ingredient is the construction of an embedding for the low-dimensional submanifold and a compatible reduction map, for which we discuss several options. Our general framework allows capturing and generalizing several existing MOR techniques, such as structure preservation for Lagrangian- or Hamiltonian dynamics, and using nonlinear projections that are, for instance, relevant in transport-dominated problems. The joint abstraction can be used to derive shared theoretical properties for different methods, such as an exact reproduction result. To connect our framework to existing work in the field, we demonstrate that various techniques for data-driven construction of nonlinear projections can be included in our framework.


Mo Apr 29 2024 (HB 2A) Patrick Forré


On how to incorporate spacetime symmetries into your neural networks

Many problems in science, like particle physics, electrodynamics, medical imaging, protein engineering, etc., stay unchanged under transformations of the underlying space or spacetime. Deep neural networks that process data from those fields could benefit in terms of data efficiency, parameter complexity and generalization capabilities, if they already incorporated such space (time) symmetries from the start. In this talk we show how this can be achieved for big classes of neural network architectures, like variants of multilayer perceptrons, message passing and convolutional neural networks, etc.




Mo May 13  2024 (RA2503) Robert Beinert


TBA

TBA


Mo Jun 3 2024  Marco Avella Medina

TBA

TBA


Mo Jun 3 2024  Anna Shalova

Choosing the right noise: regularization properties of noise injection

We study the limiting dynamics of noisy gradient descent systems in the overparameterized regime. In this regime the set of global minimizers of the loss is large, and when initialized in a neighbourhood of this zero-loss set a noisy gradient descent algorithm slowly evolves along this set. In some cases this slow evolution has been related to better generalisation properties. We give an explicit characterization of this evolution for the broad class of noisy gradient descent systems. Our results show that the structure of the noise affects not just the form of the limiting process, but also the time scale at which the evolution takes place. We apply our theory to Dropout, label noise and classical SGD (minibatching) noise. We show that dropout and label noise models evolve on two different time scales. At the same time classical SGD yields a trivial evolution on both mentioned time scales, implying that an additional noise is required for regularization.


Mo Jun 10  2024 Mathias Trabs

TBA

TBA


Organizers

Assistant Professor

Mathematics of Imaging and AI group , University of Twente

m.c.carioni(at)utwente.nl

Assistant Professor

Statistics research group , University of Twente

s.langer(at)utwente.nl