Presentations are 5 minutes long with 3 more minutes for Q&A . Below you can find the abstracts for the talks.
I will introduce some challenges in the analysis of extreme values and discuss some new ways to tackle them.
Continuous time interaction data, which records both interacting nodes and interaction occurrence time, demands modeling capabilities for any node at any continuous time point. Stochastic block models-based Hawkes processes (HPs) are common choices to model such data, which employ Hawkes processes to model the interactions between nodal pairs, and the static results of stochastic block models to describe the complex background environment (i.e., exogenous effects). However, the constant exogenous effects used in these approaches restrict the models from being able to describe the complex time-evolving environment for the interactions. In this paper, we propose Hawkes-Dirichlet Propagated Processes (Hawkes-DirPPs) to learn the time-evolving exogenous effects in HPs for continuous time interaction data. Hawkes-DirPPs assume each node has a piecewise-constant community membership distribution function, which is updated through a Dirichlet distribution when the node initiates interactions with others. These membership distribution functions are then combined with a community compatibility matrix to denote a time-evolving exogenous rate function in the HPs. We develop a continuous time backward propagating latent counts and forward sampling random variables mechanism to enable efficient Gibbs sampling for Hawkes-DirPPs. The merits of Hawkes- DirPPs include describing time-evolving exogenous effects, modeling complex dependencies between nodes and efficient Gibbs sampling. Advantages of Hawkes-DirPPs have been validated through tasks of link prediction and extensive latent variable visualisations.
Many proposals are now available to model complex data, in particular thanks to the recent advances in computational methodologies and algorithms which allow to work with complicated likelihood function in a reasonable amount of time. However, it is, in general, difficult to analyse data characterised by complicated forms of dependence. Copula models have been introduced as probabilistic tools to describe a multivariate random vector via the marginal distributions and a copula function which captures the dependence structure among the vector components, thanks to the Sklar’s theorem, which states that any d-dimensional absolutely continuous density can be uniquely represented as the product of the marginal distributions and the copula function. Bayesian methods to analyse copula models tend to be computational intensive or to rely on the choice of a particular copula function, in particular because methods of model selection are not yet fully developed in this setting. We will present a general method to estimate some specific quantities of interest of a generic copula by adopting an approximate Bayesian approach based on an approximation of the likelihood function. Our approach is general, in the sense that it could be adapted both to parametric and non-parametric modelling of the marginal distributions and can be generalised in presence of covariates. It also allow to avoid the definition of the copula function. The class of algorithms proposed allows the researcher to model the joint distribution of a random vector in two separate steps: first the marginal distributions and, then, a copula function which captures the dependence structure among the vector components.
We propose a computationally efficient framework for fitting generalized linear models and performing variable selection through a distributed system. The dataset is partitioned into numerous subsets and maximum likelihood estimation is performed on each group. The resulting estimates are combined together to form a pseudo-likelihood approximation to the likelihood function which can be linked to a penalization. A final estimation is conducted via a fast approximate inference method. We illustrate our preliminary investigations on prominent generalized linear models and penalization priors.
This is joint work with Dr Matias Quiroz from University of Technology Sydney and Feng Li from Central University of Finance and Economics.
The response of groundwater levels to varying climate conditions is often analysed with physically-based models of differential equations. However, these models require detailed (and difficult to obtain) knowledge of subsurface structures and processes. Available data for the current project consists of numerous concurrent time series with varying degrees of irregularity in the timing of measurements, limited knowledge of the underlying physical system and processes, and predictors that are not all well measured, therefore these methods are not feasible. Performing data-driven exploratory analysis and prediction of the multiple time series together may perhaps bypass the need for this expensive understanding of the physical system. I will discuss a two-step analysis in which the time series are: first, clustered into groups that exhibit similar patterns over the duration of the study using an unsupervised neural network that is resilient to missing data; and secondly, a supervised neural network is used for prediction, in which the cluster membership of each time series becomes a predictor. The prevalent patterns that are observed in the clusters indicate the groundwater in these areas is responding similarly to the unspecified external forcings.
From the book jacket:
"Ael t'Rllaillieu is a noble – and dangerous – Romulan commander. But when the Romulans kidnap Vulcans to genetically harness their mind power, Ael decides on treason. Captain Kirk, her old enemy, joins her in a secret pact to destroy the research laboratory and free the captive Vulcans. When the Romulans discover their plan, the Neutral Zone seethes with schemes and counter-schemes, sabotage and war!"
Like interstellar diplomacy, mathematical modeling cannot take appearances as truths. A model which cannot accurately answer our questions may have already betrayed us before a single datum is collected. In this talk we shall tap into some Vulcan mind power, and show how to use analytical methods in detecting model treachery before it can destroy our mission.
Spatio-temporal models are widely used in many research areas including ecology. The proliferation in the use of in-situ sensors in river networks is allowing space-time modelling for water quality monitoring almost in real-time. In this work, we introduce a new family of additive spatio-temporal models, in which spatial dependence is established based on stream distance while the temporal autocorrelation is accounted for employing a vector autoregression approach. We illustrate these techniques using a case study of water temperature in the northwestern part of the United States.
Mechanistic models are useful for investigating cause-effect relationships and making predictions or forecasts in any field of science, but in ecology an added complication is that the data is often very noisy. Therefore finding narrowly-constrained estimates of model parameters based on the data is challenging. Bayesian inference approaches which instead estimate the probability distributions for model parameters can explicitly account for this issue, and the resulting distributions can be used to provide probabilistic predictions or forecasts - a topic of growing interest in ecology. In this (brief!) talk, I give a quick overview on how Bayesian inference can be applied to mechanistic models (focusing on ecological dynamics governed by Lokta-Volterra equations as a case study) as well as the types of insights that can be gained from this approach.
Design of experiments is critical for all experimental study yet it does not appear to get much attention in statistical research as perhaps it ought to. In this brief 5-minute talk, I'll talk about a software framework to specify design of comparative experiments that I am working on and implementing as an R-package called edibble.
A monoid is a semigroup with identity. It is known that every finite Markov chain can be expressed as a random walk on a monoid, although the representation is not unique. Moreover, if the monoid has a certain special structure, then the eigenvalues of the transition matrix can be obtained by studying the representation theory of the corresponding monoid. We apply this machinery to study the eigenvalues of the transition matrix of a class of non-local random walks on $Z^n_q$, for arbitrary integer $q\geq 1$.
A new paradigm in time series forecasting has recently emerged, simply named "Global Models". Global models impose the seemingly counterintuitive constraint that all time series in a given set of interest follow the exact same generating process, even when working datasets with thousands of time series. A single predictive function for all time series is picked from a large class (such as large autoregressive linear models, neural networks or decision trees) by fitting it to data in a complete data-driven fashion. We will show that this paradigm can represent complex phenomena with the same accuracy as classical forecasting methods that model each series individually, and provide some theoretical background on the statistical trade-off between classical and global methodologies. Global models exhibit superior accuracy in many practical scenarios, and have been recently applied to the prediction of the COVID main variables (cases, deaths, etc.). The COVID example will be used to illustrate and motivate the methodology.
During an epidemic or pandemic, an important quantity to infer is Reff, the effective reproduction number. Many factors complicate this, one of them being the fact that populations are not homogeneous. To address this, we perform inference at two levels: within-household, and between-household. The within-household inference uses data from the First Few Hundred (FF100 or FFX) infected households. Between-household inference uses the data on each time a new household is infected. I will very briefly discuss the methods used, as well as some lessons learned during the COVID-19 pandemic.