Artificial Intelligence (AI) refers to the capability of machines to perform tasks that typically require human intelligence, such as learning, reasoning, perception, interaction, and creative content generation. From personalized virtual assistant to self-driving vehicles, AI is reshaping how we live and work.
AI Technology Landscape
What is Multimodal AI?
Multimodal AI refers to machine learning models capable of processing and integrating information from multiple modalities or types of data. These modalities can include text, images, audio, video and other forms of sensory input.
At the Multimodal AI Lab, we advance AI through three core research themes, each focusing on a unique modality or functional approach:
Spatial Intelligence and Generative Network (SIGN): Computer Vision
Image processing and computer vision
Detection and tracking
Classification and recognition
Waveform Analysis and Vocal Engineering (WAVE): Acoustic Signal Processing
Underwater acoustics signal processing
Speech processing
Seismic signal processing
Audio and waveform signal processing
Agentic Understanding and Retrieval Architecture (AURA): Large Language Model
Large Language Model
LLM-based virtual assistant
Agentic AI
Personalized human-like agent.
AI is revolutionizing our world, and Multimodal AI brings us closer to truly intelligent, human-like systems. At our lab, SIGN, WAVE, and AURA represent how we fuse visual, auditory, and cognitive modalities into impactful, robust, and trustworthy AI applications. We’re building the future, one that’s more perceptive, adaptive, and responsibly intelligent.