Distribued Artifical Intelligent Framework
achievement
How to jointly train a machine learning model in a distributed network while keeping the data private and secure? Our research builds efficient and scalable frameworks to address this problem. These frameworks keep both the data and the model information-theoretically private, while allowing efficient parallelization of training across distributed data-owners/workers as well as guaranteeing fast convergence. For doing so, we are researching:
Federated Learning
Continual Learning
Secure Multiparty Computing (MPC), Differential Privacy
Optimization Theory
Information / Coding Theory
Federated Learning in the Age of Foundation Model (FM)
Foundation Models (FMs) including GPT, LLAMA, SORA, and etc., have demonstrated remarkable success in a various applications, driven by their ability to leverage vast amount of data for pre-training as well as fine-tuning (or adaptation) for target tasks. However, optimizing FMs often requires access to sensitive data, raising privacy concerns and limiting their applicability in certain domains. To address challenges, we can leverage federated learning and distributed AI framework to enable privacy-preserving and collaborative learning across multiple data owners. We research the potential benefits and challenges of integrating FL into FMs, including:
Efficient parameter fine-tuning (EPFT)
Privacy leakage in FMs and privacy protection techniques
Model compression, quantization
Knowledge distillation
Generic and Personalized Models
Multi-modality
Federated Learning in Low Earth Orbit (LEO) Satellite Networks
Large-scale deployments of low Earth orbit (LEO) satellites collect massive amount of Earth imageries and sensor data, which can empower machine learning (ML) to address global challenges such as real-time disaster navigation and mitigation. However, it is often infeasible to download all the high-resolution images and train these ML models on the ground because of limited downlink bandwidth, sparse connectivity, and regularization constraints on the imagery resolution. To address these challenges, we leverage Federated Learning (FL), where ground stations and satellites collaboratively train a global ML model without sharing the captured images on the satellites. We are making the following contributions:
Identify the fundamental challenges in applying FL to satellite constellations
Formulate an optimization problem to maximize model convergence rate
Propose federated learning framework for time-varying and deterministic topology
On-device Learning for mobile/edge devices
Machin Learning has provided significant improvement in many applications including computer vision (CV), natural language processing (NLP) due to large volumen of training dataset combining with increase of computing resources. Recently, on-device learning has emerged new paradigm to make edge devices "smarter" and more effieicnet by observing changes in the data collected and self-adjusting/reconfiguaring the devices' operating model. However, these edge devices are still suffering from limited computing resources. To address this challenge, we are studying:
Model compression,
Tansfer learning,
Split learning,
Multi-task learning.
We plan to extend the application of on-device learning from 5G/6G cellular systems to small Language Model (SLM), and eventually to Large Language Model (LLM).