Multi-modal LLM


Large Language Models (LLMs) are AI systems trained on large-scale text data to understand, generate, and reason over language, and they are increasingly being extended to multimodal settings that combine text with images and other inputs. Our lab studies LLMs not only as language generators, but as reasoning engines that can be adapted for practical decision-making and embodied AI tasks.

One line of our work investigates efficient model merging, where a top-performing multilingual LLM is combined with a Korean-specialized model using the DARE technique to improve reasoning ability without full retraining. This approach showed consistent gains across benchmark tasks, including strong improvements on reasoning-heavy datasets such as GSM8K and MT-Bench.

We also study multimodal LLMs for disaster-response applications, where aerial imagery is interpreted to detect survivors and hazards and to generate safe paths for unmanned ground vehicles. By leveraging few-shot in-context learning and chain-of-thought prompting, our system improves detection and path planning without requiring additional fine-tuning.

Overall, our lab explores how LLMs and multimodal LLMs can become more efficient, adaptive, and useful for real-world reasoning and autonomous decision-making systems.