Student Presentations

You can Find the schedule and abstracts below. The appropriate Zoom link can be found in the schedule.

Monday 1st of November

Chair: Behrooz Niknami

Tuesday 2nd of November

Chair: Prosha Rahman

Monday

3:30:00 PM

Achini Erandi - The University of Melbourne

Finding minimum staffing requirement of a blood donor centre

Australian Red Cross Lifeblood collects blood from almost entirely non-remunerated voluntary donors. Therefore, it aims to improve donor satisfaction by reducing waiting times and optimising staff hours at the same time. Aligning donor arrivals, staff capacity, and shifts is a key step towards reducing waiting times. We start by developing a simulation model that captures almost all of the uncertainty in the donation process and use it to compute the average waiting time. Our objective is to implement a method for determining the optimal staff roster based on the predicted staffing demand via two phases. First, we establish minimum staffing requirements to ensure that the system's predicted average waiting time does not exceed a certain limit. In the second phase, we find an optimal staff roster that meets the minimum staffing requirements. I shall present how we apply the simulated annealing algorithm to find the minimum staffing requirements and sensitivity analysis to determine the key parameters.

3:40:00 PM

Wala Areed - Queensland University of Technology

Spatial Statistical Machine Learning Models to Assess Health Vulnerabilities in Children

The health and development of children during their first year of school is known to impact their social, emotional, and academic capabilities throughout and beyond primary education . Physical health, motor development, social and emotional well-being, learning styles, language and communication, cognitive skills, and general knowledge are all considered to be important aspects of a child’s health and development. understanding the factors that affect to the health vulnerabilities among children is a crucial part which help the hospital and managers to make their policy. The aim of this study is to investigate the relationships between health vulnerabilities among children’s in Queensland and educational factors using different spatial statistical machine learning method. In Queensland the percentage of children who are developmentally vulnerable at least one domain in 2018 around 26%, while the percentage of attendance to preschool around 75.4% which is the lowest percentage in all Australia.

3:50:00 PM

Fan Cheng - Monash University

Manifold learning with approximate nearest neighbors

Analyzing high-dimensional with manifold learning algorithms often requires searching for the nearest neighbors of all observations. This presents a computational bottleneck when data size is large or when observations lie in more general metric spaces, such as statistical manifolds. We resolve this problem by proposing a broad range of approximate nearest neighbor (ANN) methods to be used within manifold learning. The novelty of our evaluation of ANN algorithms is the manifold learning setting, with algorithms compared on the basis of embedding accuracy. A second novel contribution is to use ANN for statistical manifolds by exploiting the connection between Hellinger/Total variation distance for discrete distributions and the L2/L1 norm. A thorough empirical investigation of the benchmark MNIST dataset shows that ANN algorithms substantially improve computational time with little to no loss in the accuracy of manifold learning embedding. The result is robust to different manifold learning algorithms, different approximate nearest neighbor algorithms, and different measures of embedding accuracy. The proposed method is applied to learning statistical manifolds of electricity usage. This application demonstrates how underlying structures in high dimensional data, including anomalies, can be visualized and identified, in a way that is scalable to large datasets.

4:00:00 PM

Chenchen Xing - The University of Melbourne

Pricing for perishable goods in a queueing system

We introduce a pricing model for a monopoly retailer selling perishable goods to strategic customers. Customers arrive according to a Poisson process, and upon arrival, they decide whether to purchase perishable goods or leave based on a reward-cost structure. The system is formulated as a level-dependent quasi birth and death process and its steady-state probabilities are derived using matrix geometric methods.

The interaction between customers and the retailer is modeled as a Stackelberg game where the retailer moves first by assigning different prices to fresh and to non-fresh goods in order to maximise total expected revenues. Individual equilibrium behaviors are followed by customers who want to maximise their expected net payoffs. I will present some numerical experiments to examine the sensitivity of several key parameters of the system.

4:10:00 PM

Laurence Davie - Queensland University of Technology

Bayesian Detectability of Airborne Induced Polarisation using Reversible Jump Sequential Monte Carlo

Detection of induced polarisation (IP) effects in airborne electromagnetic (AEM) measurements does not yet have an established methodology. This contribution develops a Bayesian approach to the IP-detectability problem using decoupled transdimensional layered models, and applies an approach novel to geophysics whereby transdimensional proposals are used within the embarrassingly parallelisable and robust static Sequential Monte Carlo (SMC) class of algorithms for the simultaneous inference of parameters and models. Henceforth referring to this algorithm as Reversible Jump Sequential Monte Carlo (RJSMC), the statistical methodological contributions to the algorithm account for adaptivity considerations for multiple models and proposal types, especially surrounding particle impoverishment in unlikely models. Methodological contributions to solid Earth geophysics include the decoupled model approach and proposal of a statistic that use posterior model odds for IP detectability. A case study is included investigating detectability of IP effects in AEM data at a broad scale.

4:20:00 PM

Wathsala Karunarathne - The University of Melbourne

Scheduling Customers When the System Accepts Random Arrivals

We study a single server system with a finite buffer size that accepts scheduled and random customers. Our objective is to determine the optimal schedule for N customers so that the total expected cost of customers' waiting and that of the server idling is minimised, whilst the system's revenue is maximised. We generate results for different numbers of customers for systems with different parameters. Moreover, we investigate the system's performance for different objectives and give a better insight into how the optimal inter-arrival times and the optimal costs vary with different parameters.

4:30:00 PM

Jacob Priddle - Queensland University of Technology

A Bayesian Personalised Risk Model for Liver Fluke Infection

Precise and focused risk assessments of liver fluke infection in cattle can be used to increase awareness and promote management uptake. However, accurate estimation of the risk of liver fluke infection in cattle is challenging. Liver fluke disease in adult cattle is typically subclinical -- meaning infected animals often have no visible symptoms. This is exacerbated by the complex liver fluke lifecycle, which is highly sensitive to climate conditions and requires the presence of the intermediate snail host. The aim of this work was to create a predictive modelling tool that can be used to predict the location-specific risk of infection with respect to climate conditions. The study utilised abattoir data of over 4 million cattle processed between 2016 and 2020 at Teys' abattoirs. A binary indicator variable for liver fluke infection (liver fluke or no liver fluke) was observed at processing for each animal, with no further information as to when the infection may have occurred. By viewing the data within a survival analysis framework, we propose a Bayesian proportional odds model to predict the risk of infection depending on the age and location of the animal. The model updates the estimated risk of infection according to location-specific and time-varying covariates. The model output was used to create personalised risk profiles for each post-code. The work in this study can be used to help inform farmers about the risk of infection on their property.

4:40:00 PM

Ryan Kelly - Queensland University of Technology

Implementing Bayesian Synthetic Likelihood within the Engine for Likelihood-Free Inference

Bayesian Synthetic Likelihood (BSL) is a method in statistics for inferring the parameters of simulation-based models with an intractable likelihood function. The task of writing accessible, high-quality code for computational statistics is vital to research in fields that rely on advanced statistical methods. There is currently an international computer software project called ELFI (Engine for Likelihood-Free Inference) where the community is implementing likelihood-free methods in Python so they are more accessible to practitioners. This talk will cover an overview of how these BSL methods were implemented within ELFI with example applications in ecology and biology.

4:50:00 PM

Prosha Rahman - UNSW Sydney

Linear regression in the big data context

Big data samples are convenient and large, but typically suffer from coverage issues. We can "fill in" the uncovered responses through non-parametric means, thereby creating a pseudo-census. Models, and their associated estimators, can then be fitted to the pseudo-census data. In this seminar we utilise the KNN algorithm to impute the uncovered responses and show its effects in the multivariate linear regression setting.

TUESDAY

11:30 AM

Aishwarya Bhaskaran - University of Technology Sydney

Usable and precise asymptotics for generalized linear mixed model analysis

Despite the large volume of research concerning generalized linear mixed models, there is very little theory concerning the statistical properties of maximum likelihood estimators of these models.

In this talk, an overview of how we derive precise asymptotic results that are directly usable for confidence intervals and Wald hypothesis tests for likelihood-based generalized linear mixed model analysis will be presented. The essence of our approach is to derive the exact leading term behaviour of the Fisher information matrix when both the number of groups and number of observations within each group diverge. This leads to asymptotic normality results with simple studentizable forms.

11:40 PM

Behrooz Niknami - The University of Melbourne

Tractable Perormance Measures for Stochastic Matching Models

Stochastic Matching Models are ledgers that track and match items that are submitted over time. The matches are based on items’ compatibility and a predetermined priority discipline. Depending on these settings, matching models can describe a diverse range of phenomena in business or healthcare, such as the double auctions underlying stock markets or organ donation registers. We will explore how a novel reversibility argument, first proposed by Adan et. al. (2018), can be used to find tractable performance measures for these systems. We will then explain how these could be used to optimise performance by, for instance, minimising congestion.

11:50 AM

Trevor Matthews - The University of Adelaide

Estimating Ambulance Drive Time

SA Ambulance Service has a multi-tiered series of ambulance resources that the Ambulance Dispatcher must allocate to incoming requests for service. My research is aiming to explore predictive modelling to optimise the selection of the right clinical resource based on historical data. Proving that this approach is "better" than the current response process obviously requires some form of analysis.

Theoretical analysis of the benefit of a proposed change in dispatch rules in an Emergency Service is complicated by multiple periodic functions with respect to case frequency, the requirement to model the travel times within the service period and the geo-spatial influences also involved. This, coupled with the risk adverse nature of the provision of these services with respect to changing established practice, may require real-world simulation (or the development of a 'sim-city') of the environment to enable comparison between the current dispatch policies and the proposed changes and resultant cost/benefit analysis in a manner acceptable to the service provider.

Simulation of real-world movement of resources in this simulated environment is not as simple as using online mapping tools. One example of the difficulty is that the online estimates of travel time are not realistic compared to observed, real world, emergency (lights and sirens) driving. This talk will discuss the operational research of interest to the researcher, and a model developed to satisfy the requirement for a reasonable estimation of response time under emergency driving rules.

12:00 PM

Jamie Hogg - Queensland University of Technology

Modelling cancer risk factors and incidence in Australia

There is significant evidence of spatial variation in the incidence of cancer in Australia. However, we are yet to understand the mechanism behind this variation. A common hypothesis is that the spatial variation of cancer incidence may be explained by the spatial variation of the prevalence of common cancer risk factors, such as obesity, smoking, poor diet and insufficient physical activity. To begin to understand the relationship between cancer incidence and risk factors, we must model the relationships at the small area level. The Australia Cancer Atlas presents smoothed SA2-level cancer incidence data, but we do not have access to the corresponding SA2 level cancer risk factor data.

In this talk I will give further details of this gap and introduce the key project goals which aim to tackle it. Finally, I will discuss some of the initial data issues.

12:10 PM

Virginia He - University of Technology Sydney

Bayesian Generalized Additive Models Selection Including a Fast Variational Option

Large and complex data sets will continue to grow due to ongoing technological advancements. For flexible and interpretable regression models for such data, a problem of paramount importance is choosing among the candidate predictors and categorising their effects to be zero, linear or non-linear. Using paradigms such as spike-and-slab (group) least absolute shrinkage and selection operator priors, Markov chain Monte Carlo and mean field variational Bayes we derive scalable algorithms for reliable three-category generalized additive model selection. An R package named gamselBayes accompanies our methodology.

12:20 PM

Owen Forbes - Queensland University of Technology

Extending Bayesian Model Averaging methodology for use with multiple clustering methods

A variety of methods have been developed to combine inference across multiple sets of results for unsupervised clustering, within the ensemble and consensus clustering literature. The approach of reporting results selected from one 'best' model out of several candidate clustering models ignores the uncertainty that arises from model selection, and results in inference that is sensitive to the particular model and parameters chosen, especially with small sample size data. Bayesian model averaging (BMA) is a popular approach for combining results across multiple models that offers some attractive benefits in this setting, including intuitive probabilistic interpretation of an overall cluster structure integrated across multiple sets of clustering results, with quantification of model-based uncertainty.

Previous application of BMA for clustering has been developed in the context of finite mixture models, using the Bayesian Information Criterion (BIC) to approximate model evidence for weighted averaging of results across selected models. In this work we propose an extension to BMA methodology to enable weighted model averaging across results from multiple clustering algorithms, using a combination of clustering internal validation criteria in place of the BIC to weight results from each model. We present results exploring the utility of this approach with a case study applying BMA across results from several popular unsupervised clustering algorithms, to identify robust subgroups of individuals based on electroencephalography (EEG) data. We also use simulated clustering datasets to explore the utility of this technique to identify robust integrated clusters.

12:30 AM

Vektor Dewanto - The University of Queensland

Reinforcement learning from transient states: a discounting-free approach

Typical reinforcement learning works with a discount factor, and for environments whose states are all recurrent (being visited infinitely often). We aim to refine such a tradition by proposing a new policy gradient method. It is capable of finding a policy that is nearly optimal not only in recurrent, but also transient states. It is also discounting-free as it optimizes Veinott's objective (which is originated in dynamics programming). Experimental results provide insights into the fundamental mechanisms of our proposal.

12:40 PM

Raiha Browning - Queensland University of Technology

A Bayesian trans-dimensional approach to model Hawkes processes in discrete time

Hawkes processes are a form of self-exciting process, where events in the process have the ability to trigger future events. They have been considered for numerous applications, including neuroscience, infectious disease, seismology, and terrorism. While these self-exciting processes have a simple formulation, they are able to model incredibly complex phenomena. Traditionally Hawkes processes are a continuous-time process, however, we enable these models to be applied to a wider range of problems by considering a discrete-time variant of Hawkes processes. This analysis presents a novel discrete-time Hawkes process, utilising Bayesian trans-dimensional techniques, in particular a flexible random histogram prior, to estimate the excitation patterns of the process. We illustrate the utility of this model through a substantive case study.

12:50 pm

Punya Alahakoon - The University of Melbourne

Maritime and quarantine in Australia during the flu pandemic of 1918-19: perspectives based on mathematical modelling

The influenza pandemic of 1918-19 was one of the most devastating pandemics of the 20th century. It killed an estimated 50-100 million people worldwide. In Australia, the death toll was estimated to be 15 000.

In late 1918, when the severity of the disease was apparent, as the first line of defence to try and prevent the virus from reaching Australia, the Australian Quarantine Service was established. The vessels that had travelled overseas and inter-state were intercepted and people were examined and quarantined for a few days. Some of these vessels were carrying the infection throughout their voyage and cases were prevalent even by the time the ship arrived at a Quarantine Station. I focus on records submitted to the Quarantine Service by medical officers on board and Quarantine staff regarding the influenza outbreaks. In this talk, I will show how data pre-processing, mathematical modelling, and statistical inference can be used to understand the dynamics of influenza in contained environments such as ships.