Research
Research Overview
More details will be updated soon!
An emerging research topic in AI consists of designing systems that take inspiration from biological brains. With architectures constantly growing in size, training a single Artificial Neural Network (ANN) today consumes a prohibitive amount of energy. Despite consuming only about 12W, human brains exhibit impressive capabilities, such as life-long learning.
❓ How can the human brain perform general and complex tasks?
Neurons in the human brain process and communicate using sparse spiking signals over time. By taking a probabilistic and Bayesian perspective, biologically inspired Spiking Neural Networks (SNNs) can exhibit learning mechanisms similar to those applied in brains, enabling efficient online inference and learning.
By taking a probabilistic approach, our works [SPM19] provide a framework of probabilistic SNNs via probabilistic graphical models by covering probabilistic models, learning rules, and applications. The probabilistic framework can leverage multiple random samples during both learning and inference [TNNLS22], enabling online and local training and robustifying decisions to quantify uncertainty, which are features that deterministic models cannot provide. With the Bayesian approach, we propose to equip each synaptic weight with a probabilistic distribution [FCN22], capturing the epistemic uncertainty induced by the lack of knowledge of the data.
Probabilistic deep generative models have emerged as a pivotal area in machine learning to generate data samples from complex distributions and learn meaningful representations from unlabelled data in various applications such as audio, images, video, and text.
Two common approaches, variational autoencoders (VAEs) and generative adversarial networks (GANs), have demonstrated remarkable capabilities in generating realistic data, with some limitations: (i) simplified posteriors, (ii) training instability, and (iii) fail to balance meaningful latent representations and inference quality.
We propose Multi-Adversarial Autoencoders (MAAE) to address these issues. MAAE builds upon a hybrid architecture of VAE and GAN, known as adversarial autoencoders (AAE), by employing multiple discriminators, each independently trained to perform adversarial training against the encoder instead of directly addressing the KL penalty in VAE, ensuring smoother model training.
Compared to various VAE-based models, our method demonstrates a latent space relatively closer to the prior, showing better inference quality. At the same time, MAAE achieves a lower error rate on a semi-supervised task applied to the learned latent vectors, indicating it obtains meaningful and informative learned representations from given inputs.
Graph Neural Networks (GNNs)-based Learning and Inference
LinkFND: Simple Framework for False Negative Detection in Recommendation Tasks with Graph Contrastive Learning
The main goal of the recommendation system is to provide accurate and diverse items to users. Datasets of recommendation tasks usually carry complex relational information between users and items, which can be effectively represented via a bipartite graph structure.
Graph neural network (GNN) based collaborative filtering can fully utilize such relational information and learn generalized representations of users and items by aggregating neighborhood embeddings via a multi-hop manner; then obtain similarity scores between user and item embeddings for accurate and diverse recommendations.
StyleBoost: A Study of Personalizing Text-to-Image Generation in Any Style using DreamBooth
In recent years, remarkable progress has been made in the field of text-to-image generation, with models like Stable Diffusion showcasing the ability to create visual images from natural language prompts. Moreover, the demand for personalized text-to-image models has grown substantially.
To meet this need, our work focuses on refining pre-trained text-to-image models to generate images that reflect distinct "art styles" based on textual descriptions. Our method, called StyleBoost, relies on a curated dataset of 15-20 images representing the target style we want to fine-tune, and some auxiliary images to boost the personalization.
The key idea is to establish a robust connection between a unique token identifier (referring to the target style) and a diverse range of stylistic attributes within the visual domain, allowing the creation images that align with specific styles as prompted by text, thereby enhancing personalization and style adaptation in the generation process.
Sound-based Sleep Pattern Analysis using Deep Learning
Sound-based sleep staging by exploiting real-world unlabeled data [ICLR workshop 2023]
Sleep staging using end-to-end deep learning model based on nocturnal sound for mobile devices [NSS22]
Real-time detection of sleep apnea based on breathing sounds and prediction reinforcement using home noises [JMIR23]
Prediction of sleep stages via deep learning using smartphone audio recordings in home environments [JMIR23]
SimFLE: Simple Facial Landmark Encoding for Self-Supervised Facial Expression Recognition in the Wild
Recent FER-W methods have focused on supervised learning, which requires a large amount of labeled data for training, but the visual complexity and inherent ambiguity of facial expressions impede curating large-scale labeled facial images.
Therefore, we explore a method to train accurate FER-W models in a self-supervised way. Specifically, we combine contrastive learning (CL) and masked image modeling (MIM) to enhance the model's attentiveness toward facial landmarks.
With semantic masking, FaceMAE estimates facial landmark regions at patch-level and encodes effective representations from them. Our SimFLE tends to be more attentive to facial landmarks when compared to supervised baseline and other self-supervised methods, which leads to higher downstream performance.
Efficient and Lightweight AI Models and Methods
coming soon!