Multimodal Orchestration of Artificial Intelligence (MOAI) Lab 🗿
The MOAI Lab aims to build foundation models that orchestrate multiple modalities, including vision, language, and physical interaction. Our research spans two main areas: 1) AI Foundation, developing core deep learning algorithms, and 2) AI Application, applying foundation models to solve real-world problems. Some might say MOAI stands for “Mo’s AI Lab”.
We are currently focusing on the following research directions. If you are interested in these projects, please contact us for an internship:
1. Enhancing multimodal models by incorporating structures
- Multimodal models grounded in mid-level representations (follow-up of CAST and SHED)
- Multimodal models for handling ad-hoc categories (follow-up of OAK)
- Exploring other priors in domains such as language, audio, and robotics
2. Extending multimodal models to vertical domains
- Multimodal understanding and generation for specialist domains (follow-up of S-CLIP)
- Measuring uncertainties in multimodal domains (follow-up of CSI and RoPAWS)
- Identifying and mitigating biases in multimodal models (follow-up of B2T)