We are living in a time of unprecedented change whether it is climate or to the healthcare system. For example, in the environment, this could be ecological change: a warming planet, and intensifying ecosystem change from wildfire to drought, to flooding. At the same time, we suddenly have been collecting and have access to ecological and environmental data at a scale we never did before. We are privileged to have the ability to provide some solutions in reducing and tackling these problems. Advances in machine learning (ML) and artificial intelligence (AI) present an opportunity to build better tools and solutions to help address some of the world's most pressing challenges and deliver positive social impact.
Learn more about our research in applying ML to medical and environmental challenges alongside our exciting research in the area of fundamental machine learning.
Acknowledgement: We would like to thank our collaborators and funders that make these projects possible.
PhD Student: Ben Halstead Pollution from wood burners has serious health implications for residents of rural towns, even in developed countries. Monitoring the level of airborne particulate matter, PM2.5, in these areas often requires making inferences about missing or corrupted readings. Air Quality inference in these cases poses two key challenges. Firstly, air quality displays non-linear spatio-temporal relationships dependent on many factors. Secondly, these factors can evolve over time, changing the distribution of data. For example, changing wind directions can have a large impact on which neighboring sensors are most relevant to inference. Methods incorporating environmental factors to capture these changes, e.g. weather, traffic and points of interest, have found success in urban environments. However, many locations only have access to few if any of these features, thus, inference methods must employ alternate approaches to detect and adapt to changes. We propose a data stream based system, called AirStream, to infer missing PM2.5 levels that is able to detect and adapt to changes in unknown features. We deployed our approach on two air quality studies in New Zealand rural towns, and also tested it on a Beijing benchmark data set. We found gains in inference performance comparing AirStream against seven baseline methods. We further investigate the relationship between the changes we detected and changes in underlying weather conditions. We discovered a strong predictive link between the state of our system and current meteorological conditions. This project is part of Royal Society Marsden Fast-Start. Supervisors: Assoc Prof Yun Sing Koh, Dr Pat Riddle, Prof Mykola Pechenizkiy (TU/e Eindohoven), Prof Albert Bifet (Waikato).
Keywords: Air Pollution, Data Stream Mining, Continual Learning
In partnership with Dr Guy Coulson and Gustavo Olivares | NIWA
hD Student: Olivier Graffeuille. The monitoring of water quality is an important field with impacts on local ecosystems, aquaculture and human health. An efficient way of monitoring water quality is to estimate concentration of water constituents using remote sensing data, such as satellite data. However, this task is difficult, due to (1) the limited labels available to train models, (2) its ill-posed nature whereby different combinations of water constituents can combine to produce the same optical signal, and (3) the limited transferability of models between water bodies with different characteristics. Our research aims to develop machine learning techniques to overcome these challenges. This project is part of MBIE Taiao Programme https://taiao.ai/. Supervisors: Assoc Prof Yun Sing Koh, Dr Jorg Wicker, Dr Moritz K Lehmann (Xerra, Waikato).
Keywords: Water Quality, Semi-supervised learning, Transfer Learning
PhD Student: Ocean Wu (Current), MS Data Science: Johnson Zhou (2021). Postdoc: Thomas Lacombe (2019). Many applications deal with data streams. Data streams can be perceived as a continuous sequence of data instances, often arriving at a high rate. In data streams, the underlying data distribution may change over time, causing decay in the predictive ability of the machine learning models. This phenomenon is known as concept drift.
Moreover, it is common for previously seen concepts to recur in real-world data streams, known as recurrent concept drifts. If a concept reappears, for example a particular weather pattern, previously learnt classifiers can be reused; thus the performance of the learning algorithm can be improved.
Scikit-ika is an open source implementation of methods for handling recurrent concept drifts. It continuously models evolving data streams, providing accurate predictions in real time, using probabilistic networks and meta-information to proactively predict a change in the data stream. The code developed for this project is available on GitHub and released as part of an open-source python library, as stated in the initial proposal, https://scikit-ika.github.io/. This project is funded by ONRG Global. Supervisors: Assoc Prof Yun Sing Koh, Prof Gillian Dobbie
PhD students: Callum Cory (Current). The research aims at developing cutting-edge graph analytics approaches for modelling and analysing brain networks. Existing research has demonstrated prominent capabilities of network-based methods in understanding brains, however graph analytics for brain networks is still in it infancy. The research will develop novel graph-based deep learning models. Compared to shallow models, deep models are more effective in capturing the highly non-linear structures in networks and modelling subtle characteristics. Supervisors: Assoc Prof Yun Sing Koh, Assoc Prof Kelly Ke (NTU, Singapore), Dr Miao Qiao, Dr Diana-Bernavidas-Prado
MSc student: Hamish Huggard (2020) In many machine learning applications, the relationship being modelled may change over time, a phenomenon called concept drift. Most existing approaches to handling concept drift have assumed an artificially narrow specification of the problem. In this project we explore some new approaches to concept drift which introduce several new algorithms to help bridge academic concept drift research and real data science applications. As a motivating example for our investigation, we consider a medical clinic where a decision support system is helping clinicians triage patients referred by GPs. As the triage policy evolves, the decision support system should be able to detect that its model has become outdated, and signal to a human expert that it requires retraining. This project is funded by Precision Driven Health. Supervisors: Assoc Prof Yun Sing Koh, Prof Gillian Dobbie