Selected Research Topics

Appearance-based Gaze Estimation on Handheld Mobile Devices

Domain Adaptation on Human Activity Recognition

ContrasGAN

Shift-GAN

Emotion Recognition with Spiking Neural Networks

Multisensory emotion recognition

Synch-Graph

Acoustic Data Analysis

Multi-sound Classification

Automated call detection for acoustic surveys

Appearance-based Gaze Estimation on Handheld Mobile Devices

We are looking into improving gaze estimation on smartphones using their front camera. We are designing dynamic, personalised, calibration techniques to adjust the prediction algorithm with the change of dynamic conditions and holding postures. We are also interested in designing different types of gaze interfaces and investigating their usability on smartphones to support real-time applications.

Domain Adaptation on Human Activity Recognition

Sensor-based human activity recognition (HAR) is having a significant impact in a wide range of applications in smart city, smart home, and personal healthcare. Such wide deployment of HAR systems often faces the annotation-scarcity challenge. To tackle this problem, we have proposed several unsupervised domain adaptation techniques, where the activity knowledge from a well-annotated domain can be transferred to a new, unlabelled domain.

ContrasGAN

ContrasGAN uses bi-directional generative adversarial networks for heterogeneous feature transfer and contrastive learning to capture distinctive features between classes. We evaluate ContrasGAN on three commonly-used HAR datasets under conditions of cross-body, cross-user, and cross-sensor transfer learning. Experimental results show a superior performance of ContrasGAN on all these tasks over a number of state-of-the-art techniques, with relatively low computational cost.

Shift-GAN

Shift-GAN novelly integrates bidirectional GAN and kernel mean matching to learn intrinsic, robust feature transfer between highly heterogeneous domains. It outperforms 10 state-of-the-art domain adaptation techniques across a large number of human activity recognition tasks.

Xlearn

Assume that we have two users living in two residential settings deployed with different sensors. Each user has been annotated with different activities. XLearn aims to combine their sensor data and activity annotations to learn all these activities on all the users.

UDAR

UDAR combines knowledge- and data-driven techniques to achieve coarse- and fine-grained feature alignment.

Continual Learning

Continuously learning new tasks from new data while preserving knowledge learned from previous tasks

Benchmark

We present a comprehensive, empirical evaluation of recent continual learning techniques in a task-incremental setting. Our evaluation has uncovered the following findings.

The rehearsal techniques significantly outperform regularisation techniques on our selected datasets, and the regularisation terms alone are not able to retain the old knowledge. The regularisation terms that tackle class imbalance and inter-class separation are most effective for HAR.
Most of the rehearsal techniques do not need to store many samples in memory (e.g., 4 or 6 samples per class), and often random sampling can outperform the other sophisticated, computation-expensive techniques.
The selected continual learning techniques are not sensitive to training data size, and training with 30% of each dataset can achieve good accuracy.
The computation cost for most of the selected techniques is relatively low; i.e., around 33 seconds for training each incremental task. Therefore, these techniques are affordable on resource-constrained devices.

HAR-GAN

HAR-GAN is a continual learning technique for HAR. It does not require a prior knowledge on what new activity classes might be and it does not require to store historical data by leveraging the use of GAN to generate sensor data on the previously learned activities. We have evaluated HAR-GAN on four third-party, public datasets collected on binary sensors and accelerometers. Our extensive empirical results demonstrate the effectiveness of HAR-GAN in continual activity recognition and shed insight on the future challenges.

Emotion Recognition with Spiking Neural Networks

Emotion understanding represents a core aspect of human communication. Our social behaviours are closely linked to expressing our emotions and understanding others’ emotional and mental states through social signals. We are exploring different bio-inspired models for multisensory integration in emotion recognition; that is, different ways of integrating visual and audio signals for predicting human emotions.

Multisensory emotion recognition

We proposed three multisensory integration models, based on different pathways of multisensory integration in the brain; that is, integration by convergence, early cross-modal enhancement, and integration through neural synchrony. The proposed models are designed and implemented using third-generation neural networks, Spiking Neural Networks (SNN).

Synch-Graph

Synch-Graph is a novel bio-inspired approach based on neural synchrony in audio-visual multisensory integration in the brain. We model multisensory interaction using spiking neural networks (SNN) and explore the use of Graph Convolutional Networks (GCN) to represent and learn neural synchrony patterns.

Acoustic Data Analysis

Audio sensing can contribute to daily activity recognition by detecting the use of appliances like coffee machines or microwaves. It also helps to identify environmental context by detecting ambient sound. We are designing algorithms to classify multiple sound sources and learning temporal patterns in acoustic signal such as animal calls.

Multi-sound Classification

We explore different approaches in multi-sound classification, and propose a stacked classifier based on the recent advance in deep learning. We evaluate our proposed approach in a comprehensive set of experiments on both sound effect and real-world datasets. The results have demonstrated that our approach can robustly identify each sound category among mixed acoustic signals, without the need of any a prior knowledge about the number and signature of sounds in the mixed signals.

Automated call detection for acoustic surveys

A convolutional recurrent neural network (CRNN) is proposed to learn temporal correlations between gibbon call syllables.

Google Sites

Report abuse