Ph. D. Candidate Mar. 2020 - Present
Electrical Engineering, Korea Advance Institute of Science and Technology (KAIST) Daejeon, South Korea
- Advisor: Prof. Yong Man Ro
B.S. Mar. 2014 - Feb. 2020
Electrical and Electronic Engineering, Yonsei University Seoul, South Korea
Machine Learning
- Deep Learning
- Computer Vision
- Object Detection
- Multimodal Learning
- Multimodal Large Language Model
Currently, I am focused on exploring methods to expand the exceptional capabilities of Large Language Models (LLMs) into multimodal applications.
Specifically, my research aims to enhance the robustness of Multimodal Large Language Models (MLLMs) and improve their performance in challenging environments for object detection.
1. ReFoCUS: Reinforcement-guided Frame Optimization for Contextual Understanding
Hosu Lee*, Junho Kim*, Hyunjun Kim, Yong Man Ro (* Equal Contribution)
Under Review
2. DIP-R1: Deep Inspection and Perception with RL Looking Through and Understanding Complex Scenes
Sungjune Park*, Hyunjun Kim*, Junho Kim, Seongho Kim, and Yong Man Ro (* Equal Contribution)
Under Review
3. Language-guided Learning for Object Detection Tackling Multiple Variations in Aerial Images
Sungjune Park, Hyunjun Kim, Beomchan Park, and Yong Man Ro
Under Review
4. Look Every Frame All at Once: Video-Mamba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing
Hosu Lee*, Junho Kim*, Hyunjun Kim, Yong Man Ro (* Equal Contribution)
Under Review
5. SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis
Junho Kim*, Hyunjun Kim*, Hosu Lee, and Yong Man Ro (* Equal Contribution)
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
6. Personalized lip reading: Adapting to your unique lip movements with vision and language
Jeong Hun Yeo, Chae Won Kim, Hyunjun Kim, Hyeongseop Rha, Seunghee Han, Wen-Huang Cheng, Yong Man Ro
Proceedings of the AAAI Conference on Artificial Intelligence 39 (9), 9472-9480
7. CODE: Contrasting self-generated description to combat hallucination in large multi-modal models
Junho Kim*, Hyunjun Kim*, Hosu Lee, and Yong Man Ro (* Equal Contribution)
Advances in Neural Information Processing Systems (NeurIPS), 2024
8. Weather-Aware Drone-View Object Detection via Environmental Context Understanding
Hyunjun Kim, Dahye Lee, Sungjune Park, and Yong Man Ro
IEEE International Conference on Image Processing (ICIP), 2024
9. Integrating Language-Derived Appearance Elements with Visual Cues in Pedestrian Detection
Sungjune Park*, Hyunjun Kim*, and Yong Man Ro (* Equal Contribution)
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2024
10. Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank
Sungjune Park*, Hyunjun Kim*, and Yong Man Ro (* Equal Contribution)
Pattern Recognition (PR), 2024
11. Speaker-adaptive lip reading with user-dependent padding
Minsu Kim, Hyunjun Kim, and Yong Man Ro
European Conference on Computer Vision (ECCV), 2022