Multimodal LLM

Multi-modal LLM

Large Language Models (LLMs) are AI systems trained on large-scale text data to understand, generate, and reason over language, and they are increasingly being extended to multimodal settings that combine text with images and other inputs. Our lab studies LLMs not only as language generators, but as reasoning engines that can be adapted for practical decision-making and embodied AI tasks.

One line of our work investigates efficient model merging, where a top-performing multilingual LLM is combined with a Korean-specialized model using the DARE technique to improve reasoning ability without full retraining. This approach showed consistent gains across benchmark tasks, including strong improvements on reasoning-heavy datasets such as GSM8K and MT-Bench.

We also study multimodal LLMs for disaster-response applications, where aerial imagery is interpreted to detect survivors and hazards and to generate safe paths for unmanned ground vehicles. By leveraging few-shot in-context learning and chain-of-thought prompting, our system improves detection and path planning without requiring additional fine-tuning.

Overall, our lab explores how LLMs and multimodal LLMs can become more efficient, adaptive, and useful for real-world reasoning and autonomous decision-making systems.

This study proposes a simple yet effective way to improve top-performing LLMs by merging them with a specialized Korean language model using the DARE technique. The merged model achieved an average improvement of 1.69% across six benchmark tasks and showed more than 20% gain on GSM8K, indicating stronger reasoning ability. It also performed well on MT-Bench, suggesting improved effectiveness in multi-turn, real-world reasoning tasks.

Overall, the results suggest that integrating a Korean-specialized model can enhance reasoning performance, and that DARE is a practical method for efficient model merging.

This study explores the use of Multimodal Large Language Models (MLLMs) in intelligent disaster-response systems for manned-unmanned teaming. The proposed system analyzes aerial imagery to identify survivors and hazards, and to generate safe navigation paths for unmanned ground vehicles. By using few-shot in-context learning and chain-of-thought prompting, the system avoids additional model training while improving detection and path-planning performance.

Experimental results show up to 86% mAP in survivor and hazard identification, and a doubling of path generation success rate from 20% to 40%.

Google Sites

Report abuse