Hyunwoo Kim

Hyunwoo Kim
김현우

📘 CV

I am currently a M.S student under the supervision of Professor Yukyung Choi, at Robotics and Computer Vision Lab, Sejong University.

My research aims to advance real-world applicable multimodal video understanding by developing learning frameworks that remain robust under limited supervision and unconstrained user interactions.

I focus on building models that can learn meaningful temporal and semantic representations from visual (audio), and language signals, even when precise annotations are unavailable or when inputs deviate from predefined formats.

Weakly-supervised temporal action localization: I study methods that leverage only weak video-level labels to localize actions in time, enabling scalable learning without costly frame-level annotations while maintaining robustness to noisy and ambiguous visual evidence.
Audio-Visual Question Answering (AVQA): I investigate question-aware multimodal reasoning that remains robust to diverse and previously unseen rephrasings of questions, aiming to move beyond template-specific behavior toward more flexible and generalizable real-world question answering.

Education & Career

Internship
- NAVER Cloud, Seongnam, Korea (Feb. 2025 - Aug. 2025)
  - Vision Understanding (Multimodal LLM) / HyperCLOVA X
M. S.
- Sejong University, Seoul, Korea (Mar. 2024 - )
  - Dept. of Intelligent Mechatronics Engineering (Advisor: Yukyung Choi)
B. S.
- Sejong University, Seoul, Korea (Mar. 2018 - Feb. 2024)
  - Dept. of Intelligent Mechatronics Engineering (Advisor: Yukyung Choi)

Publications

[Under Review]
Question Decomposition for Adaptive Spatio-Temporal Reasoning in
Audio-Visual Question Answering

Hyunwoo Kim, Intaek Shin and Yukyung Choi

[arXiv 2025]
HyperCLOVA X THINK Technical Report

NAVER Cloud HyperCLOVA X Team (Core contributor)
[Paper] [Tech Blog]

[Pattern Recognition 2025]
Enhancing Visual Representation of Untrimmed Videos
by Counteracting Visuality-Threatening Content

Gwangjin Lee, Won Jo, Hyunwoo Kim and Yukyung Choi
[Paper] [Github]

[ACM MM 2024]
Probabilistic Vision-Language Representation for
Weakly Supervised Temporal Action Localization

Geuntaek Lim, Hyunwoo Kim, Joonsoo Kim and Yukyung Choi
[Paper] [Github]

[AAAI 2024]
VVS: Video-to-Video Retrieval with Irrelevant Frame Suppression

Won Jo, Geuntaek Lim, Gwangjin Lee, Hyunwoo Kim, Byungsoo Ko and Yukyung Choi
[Paper] [Github]

Projects

스마트 비전 활용 AI 솔루션 기술 개발
- Feb. 2024 -
Development of MPEG CDVA-based Multi-Machine Vision Task Software
- Electronics and Telecommunications Research Institute (ETRI), Apr. 2023 - Nov. 2023
Development of MPEG CDVA-based Video Search Software
- Electronics and Telecommunications Research Institute (ETRI), Apr. 2022 - Nov. 2022

Patents

Video Feature Descriptor Generation Method and Apparatus
- 동영상 특징 서술자 생성 방법 및 그 장치 (10-2860121), 등록 완료, Sep, 2025
Method and Apparatus for Temporal Action Localization via Probabilistic Video Feature Description
- 동영상의 확률적 특징 기반 행동 구간 탐지 방법 및 그 장치 (10-2024-0125424), 출원 완료, Sep, 2024
Method and Apparatus of Extracting Video Feature Descriptors With Distractor Frame Suppression
- 방해자 프레임 억제를 적용한 비디오 특징 추출 방법 및 그 장치 (10-2023-0177999), 출원 완료, Dec. 2023
Video-Level Descriptors Extraction Method and Apparatus for Content-Based Video Retrieval
- 동영상 검색을 위한 동영상 특징 서술자 추출 방법 및 그 장치 (10-2022-0152725), 출원 완료, Nov. 2022

동영상 특징 서술자 생성 방법
- Oct. 2025

Page updated

Report abuse