Statistics in Data-Centric Engineering (S-DCE)
The Alan Turing Institute, London
The S-DCE seminar series is a weekly online seminar of The Data-Centric Engineering Programme at The Alan Turing Institute. Talks are usually at 11am on Wednesdays though when we have speakers from abroad we occasionally meet earlier or later in order to accommodate time differences. We combine internal speakers from the group at the ATI with invited speakers from all over the world, most commonly presenting their recent research but occasionally a broader survey of a topic. (We used to call ourselves a reading group, but seminar series is more what we've become.) Talks cover a variety of subjects ranging from theoretical statistics to methodological developments, to the engineering applications of machine learning. The group is open to everyone. Please contact the organisers if you would like to join our mailing list, to which we send the link for each online talk.
Past talks are archived below and, further back, at the group's old site https://dce-rg.github.io/
8 June 11:00
Alexander Terenin (University of Cambridge)
Non-Euclidean Matérn Gaussian Processes
In recent years, the machine learning community has become increasingly interested in learning in settings where data lives in non-Euclidean spaces, for instance in applications to physics and engineering, or other settings where it is important that symmetries are enforced. In this talk, we will develop a class of Gaussian process models defined on Riemannian manifolds and graphs, and show how to effectively perform all computations needed to train these models using standard automatic-differentiation-based methods. This gives an effective framework to deploy data-efficient interactive decision-making systems such as Bayesian optimization to settings with symmetries and invariances.
1 June 11:00
Siu Lun Chau (University of Oxford)
Deconditional Downscaling with Gaussian Processes
Refining low-resolution (LR) spatial fields with high-resolution (HR) information, often known as statistical downscaling, is challenging as the diversity of spatial datasets often prevents direct matching of observations. Yet, when LR samples are modeled as aggregate conditional means of HR samples with respect to a mediating variable that is globally observed, the recovery of the underlying fine-grained field can be framed as taking an "inverse" of the conditional expectation, namely a deconditioning problem. In this work, we propose a Bayesian formulation of deconditioning which naturally recovers the initial reproducing kernel Hilbert space formulation from Hsu and Ramos (2019). We extend deconditioning to a downscaling setup and devise efficient conditional mean embedding estimator for multiresolution data. By treating conditional expectations as inter-domain features of the underlying field, a posterior for the latent field can be established as a solution to the deconditioning problem. Furthermore, we show that this solution can be viewed as a two-staged vector-valued kernel ridge regressor and show that it has a minimax optimal convergence rate under mild assumptions. Lastly, we demonstrate its proficiency in a synthetic and a real-world atmospheric field downscaling problem, showing substantial improvements over existing methods.
25 May 11:00
Veit D. Wild (University of Oxford)
18 May 11:00
Guanyang Wang (Rugters University, New Brunswick)
Unbiased Multilevel Monte Carlo methods for intractable distributions: MLMC meets MCMC
Constructing unbiased estimators from MCMC outputs has recently increased much attention in statistics and machine learning communities. However, the existing unbiased MCMC framework only works when the quantity of interest is an expectation. In this work, we propose unbiased estimators for functions of expectations. Our idea is based on the combination of the unbiased MCMC and MLMC methods. We prove the theoretical properties of our estimator. We also illustrate our estimator on several examples, including estimating the ratio of normalizing constants and the nested expectation. This is a joint work with Tianze Wang.
11 May 11:00
Yuchen Zhu (University College London)
Relaxing Observability Conditions in Causal Inference
Causal Inference is necessary in many social science domains for understanding the effects of interventions such as that of a new drug, or that of educational policy changes. A fundamental obstacle in achieving the consistent estimation of such effects is the existence of latent variables. Often practitioners have to deal with such latency with observed covariates which can be seen as unclean records of the latent variable. Moreover, departing from traditional statistical methods, which often exhibits consistency guarantees at the cost of restrictive modelling assumptions, kernel methods are a flexible approach for nonparametric estimation but where guarantees can still be achieved. With these goals in mind, in this talk I will describe ways to formalise the problem and outline kernel-based methods to solve them.
4 May 11:00
Binxin Ru (University of Oxford)
Bayesian Optimisation for Neural Architecture Search
Bayesian optimisation (BO) has been widely used for hyperparameter optimisation but its application in neural architecture search (NAS) is limited due to the non-continuous, high-dimensional and graph-like search spaces. This talk will cover two novel methods to enable effective application of BO on NAS: 1) integrating the Weisfeiler-Lehman graph kernel into a Gaussian process surrogate to naturally handle the graph nature of architectures in a highly data-efficient manner and also afford interpretability by discovering useful network features and their corresponding impact on the network performance and 2) recasting NAS as a problem of finding the optimal network generator instead of a single optimal architecture so as to significantly reduce the search dimension, making NAS amenable to BO.
16 Mar 11:00
Alessandro Rudi (INRIA & ENS)
Representing non-negative functions, with applications in non-convex optimization, probability representation and beyond
Many problems in applied mathematics are expressed naturally in terms of non-negative functions. While linear models are well suited to represent functions with output in R, being at the same time very expressive and flexible, the situation is different for the case of non-negative functions where the existing models lack one of good properties. In this talk we present a rather flexible and expressive model for non-negative functions. We will show direct applications in probability representation and non-convex optimization. In particular, the model allows to derive an algorithm for non-convex optimization that is adaptive to the degree of differentiability of the objective function and achieves optimal rates of convergence. Finally, we show how to apply the same technique to other interesting problems in applied mathematics that can be easily expressed in terms of inequalities.
9 Mar 11:00
Kamyar Azizzadenesheli (Purdue)
Neural Operators: Learn to Solve Partial Differential Equations
Traditional deep neural networks are maps between finite dimension spaces, and hence, are not suitable for modeling phenomena such as those arising from the solution of partial differential equations (PDE). We introduce neural operators that can learn operators, which are maps between infinite dimension spaces. By framing neural operators as non-linear compositions of kernel integrations, we establish that they are universal approximators of operators. They are independents of the resolution or grid of training data and allow for zero-shot generalization to higher resolution evaluations. We find that neural operators can solve turbulent fluid flow, seismic wave equation, co2 storage, and many more hard problems with 100000x speedup compared to numerical solvers. I will outline several applications where neural operators have shown order of magnitude speedup.
2 Mar 11:00
Tim Wolock (Imperial)
Evaluating distributional regression strategies for modelling self-reported sexual age-mixing
Predicting complex data with parsimonious and interpretable models is a persistent challenge in applied statistics. By combining distributional regression with flexible probability distributions, we can use simple linear models to fit to datasets that conventional regression models would predict poorly. In this work, we built the four-parameter sinh-arcsinh distribution into a distributional regression framework to predict self-reported sexual partner age distribution data. These data measure the rate of sexual partnership formation across ages and are an important input to epidemiological models of HIV. To validate our approach, we conducted two model comparison studies on three geographically diverse datasets. In this talk, I will introduce the sinh-arcsinh distribution and provide an overview of the fundamentals of distributional regression, including a brief demonstration of how we have implemented our model in BRMS. I will then describe the design and results of our two model comparison studies. Finally, I will discuss how the framework we have proposed could be extended with well-known hierarchical modelling tools and how distributional regression methods could be applied more broadly.
Paper link: https://elifesciences.org/articles/68318
23 Feb 11:00
Samuel Livingstone (UCL)
The Barker proposal and other locally-balanced Markov chain Monte Carlo algorithms
I will introduce a class of \pi-reversible Markov processes termed ‘locally-balanced’. Any member of the class can be used to design Metropolis—Hastings algorithms. I will discuss a couple of prominent members of the class, one of which is in fact the well-known Metropolis-adjusted Langevin algorithm, and another is an approach that we call the ‘Barker proposal’, which is inspired by Barker’s alternative acceptance rate within the Metropolis—Hastings algorithm. I will explore the pros and cons of each algorithm through some theory and examples, before then discussing how to choose an optimal algorithm within the locally-balanced class. This is based on joint work with Giacomo Zanella, Jure Vogrinc and Max Hird.
16 Feb 14:30
NOTE DIFFERENT TIME
Matthew Reimherr (Penn State)
Pure Differential Privacy in Functional Data Analysis
We consider the problem of achieving pure differential privacy in the context of functional data analysis, or more general nonparametric statistics, where the summary of interest can naturally be viewed as an element of a function space. In this talk I will give a brief overview and motivation for differential privacy before delving into the challenges that arise in the sanitization of an infinite dimensional summary. I will present a new mechanism, called the Independent Component Laplace Process, for achieving privacy followed by examples to mean function estimation and nonparametric density estimation.
9 Feb 11:00
Athénaïs Gautier (Bern)
The Spatial Logistic Gaussian Process, and how estimating spatially dependent distributions can accelerate Bayesian inference
When studying natural or artificial systems, it is common for the response of interest to not be fully determined by the system parameters x, but rather to be random and to follow a probability distribution that depends on x. In this talk we want to show that it is possible to estimate the underlying field based only on a finite number of observations, and that the associated uncertainty quantification can be highly instrumental for Bayesian inversion. The approach that we investigate here generalizes to spatial contexts a class of non-parametric Bayesian density models based on logistic Gaussian processes, and allows modelling (probability) density-valued fields with complex dependences on x while accommodating heterogeneous sample sizes. The Spatial Logistic Gaussian Process (SLGP) main strength is that it draws its flexibility from an underlying Gaussian Process, allowing to incorporate knowledge and structural information within the model, while conserving the non-parametric nature of the later. The considered models allow for instance performing (approximate) posterior simulations of probability density functions as well as jointly predicting multiple moments or other functionals of target distributions. We propose an implementation of the SLGP and investigate ways of using the proposed class of model to speed up Approximate Bayesian Computing (ABC) methods.
2 Feb 11:00
Valentin De Bortoli (Oxford)
Diffusion Schrödinger Bridge with Applications to Score-Based Generative Modeling
Progressively applying Gaussian noise transforms complex data distributions to approximately Gaussian. Reversing this dynamic defines a generative model. When the forward noising process is given by a Stochastic Differential Equation (SDE), Song et al. (2021) demonstrate how the time inhomogeneous drift of the associated reverse-time SDE may be estimated using score-matching. A limitation of this approach is that the forward-time SDE must be run for a sufficiently long time for the final distribution to be approximately Gaussian. In contrast, solving the Schrödinger Bridge problem (SB), i.e. an entropy-regularized optimal transport problem on path spaces, yields diffusions which generate samples from the data distribution in finite time. We present Diffusion SB (DSB), an original approximation of the Iterative Proportional Fitting (IPF) procedure to solve the SB problem, and provide theoretical analysis along with generative modeling experiments. The first DSB iteration recovers the methodology proposed by Song et al. (2021), with the flexibility of using shorter time intervals, as subsequent DSB iterations reduce the discrepancy between the final-time marginal of the forward (resp. backward) SDE with respect to the prior (resp. data) distribution. Beyond generative modeling, DSB offers a widely applicable computational optimal transport tool as the continuous state-space analogue of the popular Sinkhorn algorithm (Cuturi, 2013).
26 Jan 11:00
Harita Dellaporta (Warwick)
Robust Bayesian Inference for Simulator-based Models via the MMD Posterior Bootstrap
Simulator-based models are models for which the likelihood is intractable but simulation of synthetic data is possible. They are often used to describe complex real-world phenomena, and as such can often be misspecified in practice. Unfortunately, existing Bayesian approaches for simulators are known to perform poorly in those cases. In this paper, we propose a novel algorithm based on the posterior bootstrap and maximum mean discrepancy estimators. This leads to a highly-parallelisable Bayesian inference algorithm with strong robustness properties. This is demonstrated through an in-depth theoretical study which includes generalisation bounds and proofs of frequentist consistency and robustness of our posterior. The approach is then assessed on a range of examples including a g-and-k distribution and a toggle-switch model.
15 Dec 11:00
Amy Parkes (Southampton)
As machine learning technology improves, it is increasingly relied upon when making significant decisions which require a high level of trust. Accuracy and interpretability are paramount for trust in regression methods, which comprise a large portion of the field. To apply these methods with confidence there needs to be a certainty that they have modelled the ground truth of a dataset— the correct input-output relationships. Conventional regression error measures, however, do not ensure that the correct relationships are modelled, as they only require accurate point predictions to assign low error to a method. A case study of power prediction for merchant vessels is used to illustrate the problem, where accurate prediction and correct input-output relationship modelling is required, although there is limited understanding of these input-output relationships. A new error measure, the Mean Fit to Median Error, is presented which ensures networks approximate the conditional averages and is applicable to any dataset. Networks reporting low Mean Fit to Median errors model more consistent and correct input-output relationships and are robust to areas of sparse data.
17 Nov 11:00
David Bossens (Southampton)
Traditionally, reinforcement learning is considered within the Markov Decision Process (MDP) framework. This presentation discusses challenges that come up with applying reinforcement learning within unknown long-term environments, including exploration, long-term dependencies, task sequences, and long-term safety constraints. The presentation then proposes solutions that go significantly beyond the traditional MDP framework, including self-improvement, lifelong reinforcement learning, and constrained MDPs.
10 Nov 11:00
Rachel Prudden (Met Office & Exeter)
Gaussian random fields are a commonly used method in spatial statistics. I will give an overview of how they can be applied to problems involving multiple spatial scales, such as super-resolution, and discuss extensions to non-Gaussian data.
03 Nov 11:00
Jonathan Schmidt (Tübingen)
Mechanistic models with differential equations are a key component of scientific applications of machine learning. Inference in such models is usually computationally demanding, because it involves repeatedly solving the differential equation. The main problem here is that the numerical solver is hard to combine with standard inference techniques. Recent work in probabilistic numerics has developed a new class of solvers for ordinary differential equations (ODEs) that phrase the solution process directly in terms of Bayesian filtering. We here show that this allows such methods to be combined very directly, with conceptual and numerical ease, with latent force models in the ODE itself. It then becomes possible to perform approximate Bayesian inference on the latent force as well as the ODE solution in a single, linear complexity pass of an extended Kalman filter / smoother - that is, at the cost of computing a single ODE solution. We demonstrate the expressiveness and performance of the algorithm by training, among others, a non-parametric SIRD model on data from the COVID-19 outbreak.
20 Oct 11:00
George Wynne (Imperial)
Maximum Mean Discrepancy (MMD) and Kernel Stein Discrepancy (KSD) are two kernel-based non-parametric methods for forming a discrepancy between probability measures. Their study has been a very active area in statistical machine learning and increasingly so in computational statistics. The idea for both these methodologies revolves around using kernels to facilitate easy to estimate discrepancies which can then be used as estimators in a wide range of tasks, such as two-sample testing, goodness-of-fit testing, parameter inference, measure transport and MCMC output quality assessment to name but a few. So far though MMD has enjoyed much wider theoretical investigation than KSD, mostly due to the KSD formulation being somewhat more complicated. The aim of this talk is to outline how MMD and KSD are actually more related than one might think. This relationship can then be leveraged to provide conditions for when KSD can separate measures in generality of the base space being a separable Hilbert space. This generality encompasses distributions over function spaces which will be used in numerical examples.
13 Oct 11:00
Joel Dyer (Oxford)
Simulation models of scientific interest often lack a tractable likelihood function, precluding standard likelihood-based statistical inference. As a result, likelihood-free approaches have emerged in recent decades as a means to performing statistical inference for such models, which typically involve comparing simulated and observed data in some fashion. An example is approximate Bayesian computation, in which the pertinence of parameter settings is assessed by some meaningful notion of distance between the simulated and observed data. Time-series data is a particular challenge in this respect, often being high-dimensional and complex in structure. In this talk, we will discuss the use of path signatures as a means to performing likelihood-free inference with time-series simulators. We will first discuss the problem of likelihood-free inference for simulation models and the properties of the path signature. We will then discuss their use in traditional approaches to likelihood-free inference, such as approximate Bayesian computation, and in more recently developed approaches based on the likelihood-ratio trick. In each case, we will present experimental results and discuss some of the properties of path signatures which make them a desirable tool for learning with time-series data.
29 Sept 14:30
28 July 11:00
Johanna Meier (Hannover)
Intractable generative models are models for which the likelihood is unavailable but sampling is possible. Most approaches to parameter inference in this setting require the computation of some discrepancy between the data and the generative model. This is for example the case for minimum distance estimation and approximate Bayesian computation. These approaches require sampling a high number of realisations from the model for different parameter values, which can be a significant challenge when simulating is an expensive operation. In this paper, we propose to enhance this approach by enforcing "sample diversity" in simulations of our models. This will be implemented through the use of quasi-Monte Carlo (QMC) point sets. Our key results are sample complexity bounds which demonstrate that, under smoothness conditions on the generator, QMC can significantly reduce the number of samples required to obtain a given level of accuracy when using three of the most common discrepancies: the maximum mean discrepancy, the Wasserstein distance, and the Sinkhorn divergence. This is complemented by a simulation study which highlights that an improved accuracy is sometimes also possible in some settings which are not covered by the theory.
21 July 11:00
Takuo Matsubara (Newcastle & ATI)
7 July 11:00
Toni Karvonen (ATI)
30 June 11:00
Juan Kuntz Nussio (Warwick)
23 June 11:00
16 June 11:00
09 June 11:00
Maud Lemercier (Warwick)
02 June 11:00
Lorenzo Pacchiardi (Oxford)
26 May 11:00
Hans Kersting (INRIA Paris)
Uncertainty-Aware Numerical Solutions of ODEs by Bayesian Filtering
12 May 11:00
Christian Fröhlich (University of Tübingen, Germany)
Bayesian Quadrature on Riemannian Data Manifolds [URL]
Riemannian manifolds provide a principled way to model nonlinear geometric structure inherent in data. A Riemannian metric on said manifolds determines geometry-aware shortest paths and provides the means to define statistical models accordingly. However, these operations are typically computationally demanding. To ease this computational burden, we advocate probabilistic numerical methods for Riemannian statistics. In particular, we focus on Bayesian quadrature (BQ) to numerically compute integrals over normal laws on Riemannian manifolds learned from data. In this task, each function evaluation relies on the solution of an expensive initial value problem. We show that by leveraging both prior knowledge and an active exploration scheme, BQ significantly reduces the number of required evaluations and thus outperforms Monte Carlo methods on a wide range of integration problems. As a concrete application, we highlight the merits of adopting Riemannian geometry with our proposed framework on a nonlinear dataset from molecular dynamics.
28 Apr 11:00
The prior distribution on parameters of a likelihood is the usual starting point for Bayesian uncertainty quantification. In this paper, we present a different perspective. Given a finite data sample of size n from an infinite population, we focus on the missing remainder of the population as the source of statistical uncertainty, with the parameter of interest being known precisely given the entire population. We argue that the foundation of Bayesian inference is to assign a predictive distribution on remainder of the population conditional on the observed sample, which then induces a distribution on the parameter of interest. Demonstrating an application of martingales, Doob shows that choosing the Bayesian predictive distribution returns the conventional posterior as the distribution of the parameter. Taking this as our cue, we relax the predictive machine, avoiding the need for the predictive to be derived solely from the usual prior to posterior to predictive density formula. We introduce the martingale posterior distribution, which returns Bayesian uncertainty directly on any statistic of interest without the need for the likelihood and prior, and this distribution can be sampled through a computational scheme we name predictive resampling. To that end, we introduce new predictive methodologies for multivariate density estimation, regression and classification that build upon recent work on bivariate copulas.
21 Apr 11:00
Computing the expectation of some kernel function is ubiquitous in machine learning, from the classical theory of support vector machines, to exploiting kernel embeddings of distributions in applications ranging from probabilistic modeling, statistical inference, casual discovery, and deep learning. In all these scenarios, we tend to resort to Monte Carlo estimates as expectations of kernels are intractable in general. In this work, we characterize the conditions under which we can compute expected kernels exactly and efficiently, by leveraging recent advances in probabilistic circuit representations. We first construct a circuit representation for kernels and propose an approach to such tractable computation. We then demonstrate possible advancements for kernel embedding frameworks by exploiting tractable expected kernels to derive new algorithms for two challenging scenarios: 1) reasoning under missing data with kernel support vector regressors; 2) devising a collapsed black-box importance sampling scheme. Finally, we empirically evaluate both algorithms and show that they outperform standard baselines on a variety of datasets.
31 Mar - 14 Apr
24 Mar 11:00
Kernelized Stein discrepancy (KSD), though being extensively used in goodness-of-fit tests and model learning, suffers from the curse-of-dimensionality. We address this issue by proposing the sliced Stein discrepancy and its scalable and kernelized variants, which employs kernel-based test functions defined on the optimal onedimensional projections instead of the full input in high dimensions. When applied to goodness-of-fit tests, extensive experiments show the proposed discrepancy significantly outperforms KSD and various baselines in high dimensions. For model learning, we show its advantages by training an independent component analysis when compared with existing Stein discrepancy baselines. We further propose a novel particle inference method called sliced Stein variational gradient descent (S-SVGD) which alleviates the mode-collapse issue of SVGD in training variational autoencoders.
10 Mar 11:00
This article focuses on numerical issues in maximum likelihood parameter estimation for Gaussian process regression (GPR). This article investigates the origin of the numerical issues and provides simple but effective improvement strategies. This work targets a basic problem but a host of studies, particularly in the literature of Bayesian optimization, rely on off-the-shelf GPR implementations. For the conclusions of these studies to be reliable and reproducible, robust GPR implementations are critical.
03 Mar 11:00
Zhuo Sun (University College London, UK) [URL]
Amortized Bayesian Prototype Meta-learning: A new probabilistic meta-learning approach to few-shot image classification
Probabilistic meta-learning methods recently have achieved impressive success in few-shot image classification. However, they introduce a huge number of random variables for neural network weights and thus severe computational and inferential challenges. In this paper, we propose a novel probabilistic meta-learning method called amortized Bayesian prototype meta-learning. In contrast to previous methods, we introduce only a small number of random variables for latent class prototypes rather than a huge number for network weights; we learn to learn the posterior distributions of these latent prototypes in an amortized inference way with no need for an extra amortization network, such that we can easily approximate their posteriors conditional on few labeled samples, whenever at meta-training or meta-testing stage. The proposed method can be trained end-to-end without any pre-training. Compared with other probabilistic meta-learning methods, our proposed approach is more interpretable with much less random variables, while still be able to achieve competitive performance for few-shot image classification problems on various benchmark datasets. Its excellent robustness and predictive uncertainty are also demonstrated through ablation studies.
24 Feb 11:00
Deep generative models have shown great success when it comes to fitting probabilistic models to complex data. Applications range from computer vision and speech to biogenetic and climate science. Such data is often naturally described on Riemannian manifolds such as spheres, tori, and hyperbolic spaces. Additionally, even when the data live on a Euclidean space, it may have a latent non-Euclidean geometry. Yet, most deep generative models implicitly assume a ﬂat geometry, making them either misspeciﬁed or potentially ill-suited to these situations. To tackle such issues, we introduce Poincaré Variational Auto-Encoders and Riemannian Continuous Normalizing Flows respectively modelling data with underlying hierarchical structure, and parametrising probability measures on smooth manifolds.
17 Feb 11:00
The Bayesian treatment of neural networks dictates that a prior distribution is specified over their weight and bias parameters. This poses a challenge because modern neural networks are characterized by a large number of parameters, and the choice of these priors has an uncontrolled effect on the induced functional prior, which is the distribution of the functions obtained by sampling the parameters from their prior distribution. We argue that this is a hugely limiting aspect of Bayesian deep learning, and this work tackles this limitation in a practical and effective way. Our proposal is to reason in terms of functional priors, which are easier to elicit, and to "tune" the priors of neural network parameters in a way that they reflect such functional priors. Gaussian processes offer a rigorous framework to define prior distributions over functions, and we propose a novel and robust framework to match their prior with the functional prior of neural networks based on the minimization of their Wasserstein distance. We provide vast experimental evidence that coupling these priors with scalable Markov chain Monte Carlo sampling offers systematically large performance improvements over alternative choices of priors and state-of-the-art approximate Bayesian deep learning approaches. We consider this work a considerable step in the direction of making the long-standing challenge of carrying out a fully Bayesian treatment of neural networks, including convolutional neural networks, a concrete possibility.
03 Feb 14:00
Jean Honorio (Purdue University, US) [URL]
Theoretical Foundations of Combinatorial Problems in Machine Learning
Structured prediction can be thought of as a simultaneous prediction of multiple labels. This is often done by maximizing a score function on the space of labels, which decomposes as a sum of pairwise and unary potentials. The above is naturally modeled with a graph, where edges and vertices are related to pairwise and unary potentials, respectively. We consider the generative process proposed by Globerson et al. 2015, and apply it to general connected graphs. We analyze the structural conditions of the graph that allow for the exact recovery of the labels. Our results show that exact recovery is possible and achievable in polynomial time for a large class of graphs. In particular, we show that graphs that are bad expanders can be exactly recovered by adding small edge perturbations coming from the Erdős-Rényi model.
We also extend our results to account for fairness. In contrast to the known trade-offs between fairness and model performance, the addition of the fairness constraint improves the probability of exact recovery. We effectively explain this phenomenon and empirically show how graphs with poor expansion properties, such as grids, are now capable to achieve exact recovery with high probability.
The two results above serve as a gentle introduction to a unifying framework, which uses the power of convex relaxations, Karush-Kuhn-Tucker conditions, primal-dual certificates and concentration inequalities. This framework has allowed us to produce novel algorithms for several NP-hard combinatorial problems, such as learning Bayesian networks, graphical games, learning and inference in structured prediction, and community detection.
27 Jan 11:00
Deep Gaussian processes with importance weighted variational inference are a powerful model and inference scheme which can represent complex, non-Gaussian marginal distributions while maintaining many of the advantages of standard GPs. However, we highlight a potential shortcoming of this approach: the signal-to-noise ratio of the gradient estimates of specific variational parameters can degrade during training, leading to a poorer variational approximation and thus worse predictive performance. In this talk I will give background information on deep Gaussian processes and importance weighted variational inference, and discuss why we might be interested in them. I will then present our investigation into the degraded signal-to-noise ratio during training, providing both theoretical and empirical evidence of the issue, and demonstrating how we can solve it.
20 Jan 11:00
Calibrating stochastic radio channel models to new measurement data is challenging when the likelihood function is intractable. The standard approach to this problem involves sophisticated algorithms for extraction and clustering of multipath components, following which, point estimates of the model parameters can be obtained using specialized estimators. We propose a likelihood-free calibration method using approximate Bayesian computation. The method is based on the maximum mean discrepancy, which is a notion of distance between probability distributions. Our method not only by-passes the need to implement any high-resolution or clustering algorithm, but is also automatic in that it does not require any additional input or manual pre-processing from the user. It also has the advantage of returning an entire posterior distribution on the value of the parameters, rather than a simple point estimate. We evaluate the performance of the proposed method by fitting two different stochastic channel models, namely the Saleh-Valenzuela model and the propagation graph model, to both simulated and measured data. The proposed method is able to estimate the parameters of both the models accurately in simulations, as well as when applied to 60 GHz indoor measurement data.