Continual Generalized Category Discovery (CGCD) faces a critical challenge: incrementally learning new classes from unlabeled data streams while preserving knowledge of old classes. Existing methods struggle with catastrophic forgetting, especially when unlabeled data mixes known and novel categories. We address this by analyzing C-GCD’s forgetting dynamics through a Bayesian lens, revealing that covariance misalignment between old and new classes drives performance degradation. Building on this insight, we propose VB-CGCD, a novel framework that integrates variational inference with covariance-aware nearest-class-mean classification. VB-CGCD adaptively aligns class distributions while suppressing pseudo-label noise via stochastic variational updates.
ICLR 2025
Paper Code
One-Shot Federated Learning with Tsetlin Machine (FedTMOS) is a novel data-free OFL framework that eliminates server-side training, addressing key limitations of existing OFL methods. Unlike approaches relying on knowledge distillation or model fusion, FedTMOS leverages the low-complexity and class-adaptive nature of the Tsetlin Machine to reassign class-specific weights through inter-class maximization. This results in balanced server models with minimal communication and latency. Experiments show FedTMOS outperforms state-of-the-art baselines by over 7%, reduces communication costs by 2.3× and lowers server latency by 75×, making it an efficient and practical OFL solution.
Respiratory audio, such as coughing and breathing sounds, holds significant potential for healthcare applications but suffers from limited labeled data for model development. To address this, the OPERA framework introduces the first open respiratory acoustic foundation model system, using large-scale unlabeled data (~136K samples, 400+ hours) to pretrain generalizable models. OPERA includes a benchmark of 19 downstream health tasks, where its models outperform existing audio-pretrained models on 16 tasks and generalize well to new datasets and modalities. This work demonstrates the strong promise of foundation models for respiratory health and provides an open resource to advance research in this field.
EWSN 2024
Paper Code
Adaptive Tsetlin Machine (AdaTM) is the first end-to-end continual learning solution solely relying on propositional logic operations, suited for edge computing devices. AdaTM is constructed through dynamically expanding model architecture to accommodate new learning tasks. Furthermore, we implemented a class-balance memory buffer and optimal states selection techniques to combat knowledge fading and introduced a clause confidence score-based pruning strategy for scalability. Importantly, AdaTM’s adaptability is accommodated without the need for computationally expensive recalibrations commonly associated with neural networks, leading to high-efficiency gains. The adaptability and efficiency set the AdaTM apart, making it particularly well-suited for real-world applications where resources are constrained, such as edge devices and for on-device learning.
On-device training is essential for user personalization and privacy. With the pervasiveness of IoT devices and microcontroller units (MCUs), this task becomes more challenging due to the constrained memory and compute resources and the limited availability of labeled user data. Nonetheless, prior works neglect the data scarcity issue, require excessively long training time (e.g. a few hours), or induce substantial accuracy loss (≥10%). TinyTrain is an on-device training approach that drastically reduces training time by selectively updating parts of the model and explicitly coping with data scarcity. TinyTrain introduces a task-adaptive sparse-update method that dynamically selects the layer/channel to update based on a multi-objective criterion that jointly captures user data, the memory, and the compute capabilities of the target device, leading to high accuracy on unseen tasks with reduced computation and memory footprint.
Long-term respiratory illnesses like Chronic Obstructive Pulmonary Disease (COPD) and Asthma are commonly diagnosed with the gold standard spirometry, which is a lung health test that requires specialized equipment and trained healthcare experts, making it expensive and difficult to scale. Moreover, blowing into a spirometer can be quite hard for people suffering from pulmonary illnesses. To solve the aforementioned limitations, we introduce MMLung, an approach that leverages information obtained from multiple audio signals by combining multiple tasks and different modalities performed on the microphone of a smartphone to estimate lung function. Our proposed approach achieves the best mean absolute percentage error (MAPE) of 1.3% on a cohort of 40 participants. Compared to the reported performances (5%–10% MAPE) on lung health estimation using smartphones, MMLung shows that practical lung health estimation is viable by combining multiple tasks utilizing multiple modalities.
LifeLearner is a hardware-aware continual learning (CL) system designed for resource-constrained embedded and IoT devices, enabling real-time adaptation in applications like user personalization and household robotics. It tackles challenges such as limited labeled data, memory, and compute by combining meta-learning with rehearsal strategies to preserve accuracy. LifeLearner also employs a mix of lossless and lossy compression to significantly reduce memory and storage needs. Optimized for hardware-specific constraints, it lowers latency and energy consumption. The system achieves accuracy within 2.8% of an Oracle baseline while reducing memory usage by 178.7× and cutting latency and energy by over 80%. It has been successfully deployed on edge devices and microcontrollers, demonstrating its practical efficiency and scalability.
YONO is a novel system designed to enable efficient multi-task learning on resource-constrained microcontrollers (MCUs) by compressing and executing multiple deep neural network models in-memory. Unlike most embedded deep learning solutions that focus on single-task performance, YONO addresses the multi-task demands of IoT devices by using product quantization (PQ) to store model weights in shared codebooks and introducing new network optimization strategies to maximize compression while preserving accuracy. Its online component allows real-time model execution and switching without external storage, achieving compression rates up to 12.37× with minimal accuracy loss, and significantly reducing latency and energy usage by over 93%. YONO demonstrates strong generalizability across different architectures and unseen datasets, making it a promising solution for scalable, multi-task IoT applications.
SonicASL is a real-time gesture recognition system that can recognize sign language gestures on the fly, leveraging front-facing microphones and speakers added to commodity earphones worn by someone facing the person making the gestures. In a user study (N=8), we evaluate the recognition performance of various sign language gestures at both the word and sentence levels. Given 42 frequently used individual words and 30 meaningful sentences, SonicASL can achieve an accuracy of 93.8% and 90.6% for word-level and sentence-level recognition, respectively. The proposed system is tested in two real-world scenarios: indoor (apartment, office, and corridor) and outdoor (sidewalk) environments with pedestrians walking nearby. The results show that our system can provide users with an effective gesture recognition tool with high reliability against environmental factors such as ambient noises and nearby pedestrians.