International Conference

2026

"Adaptive Auxiliary Prompt Blending for Target-Faithful Diffusion Generation"

Kwanyoug Lee, SeungJu Cha, Yebin Ahn, Hyunwoo Oh, Sungho Koh, Dong-Jin Kim

IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2026. (25.42% accept rate)

Received Honorable Mention, 32nd Samsung Humantech Paper Awards

"SAIL: Similarity-Aware Guidance and Inter-Caption Augmentation-based Learning for Weakly-Supervised Dense Video Captioning"

Ye-Chan Kim, SeungJu Cha, Si-Woo Kim, Minju Jeon, HynGee Kim, Dong-Jin Kim

IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2026. (25.42% accept rate)

[PDF]

Also presented at "Workshop on Synthetic Data for Computer Vision" in conjunction with CVPR 2026.

"Follow the Saliency: Supervised Saliency for Retrieval-augmented Dense Video Captioning"

Seung hee Choi, MinJu Jeon, Hyunwoo Oh, Jihwan Lee, Dong-Jin Kim

IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2026. (25.42% accept rate)

[PDF] [code]

"ADAPT: Attention Driven Adaptive Prompt Scheduling and InTerpolating Orthogonal Complements for Rare Concepts Generation"

Kwanyoung Lee, Hyunwoo Oh, SeungJu Cha, Sungho Koh, Dong-Jin Kim

IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) Findings, 2026.

[PDF]

2025

"ScaleDiff: Higher-Resolution Image Synthesis via Efficient and Model-Agnostic Diffusion"

Sungho Koh, SeungJu Cha, Hyunwoo Oh, Kwanyoung Lee, Dong-Jin Kim

Neural Information Processing Systems (NeurIPS), 2025. (24.52% accept rate)

[PDF] [code]

"Sali4Vid: Saliency-Aware Video Reweighting and Adaptive Caption Retrieval for Dense Video Captioning"

MinJu Jeon, Si-Woo Kim, Ye-Chan Kim, HyunGee Kim, Dong-Jin Kim

International Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025. (long, main) (22.16% accept rate)

[PDF]

Also presented at "Workshop on Multi-Modal Reasoning for Agentic Intelligence" in conjunction with ICCV 2025.

"SIDA: Synthetic Image Driven Zero-shot Domain Adaptation"

Ye-Chan Kim, SeungJu Cha, Si-Woo Kim, Taewhan Kim, Dong-Jin Kim

ACM International Conference on Multimedia (MM), 2025. (23.45% accept rate) (Oral)

[PDF]

Also presented at "Workshop on Curated Data for Efficient Learning" in conjunction with ICCV 2025.

"SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning"

Si-Woo Kim, MinJu Jeon, Ye-Chan Kim, Soeun Lee, Taewhan Kim, Dong-Jin Kim

ACM International Conference on Multimedia (MM), 2025. (23.45% accept rate) (Oral)

[PDF] [code]

Also presented at "Workshop on Curated Data for Efficient Learning" in conjunction with ICCV 2025.

"CatchPhrase: EXPrompt-Guided Encoder Adaptation for Audio-to-Image Generation"

Hyunwoo Oh, SeungJu Cha, Kwanyoung Lee, Si-Woo Kim, Dong-Jin Kim

ACM International Conference on Multimedia (MM), 2025. (23.45% accept rate) (Oral)

[PDF]

"VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness"

SeungJu Cha, Kwanyoung Lee, Ye-Chan Kim, Hyunwoo Oh, Dong-Jin Kim

IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2025. (22.1% accept rate)

[PDF] [code]

Also presented at "Workshop on AI for Creative Visual Content Generation Editing and Understanding" in conjunction with CVPR 2025.

"ViPCap: Retrieval Text-based Visual Prompts for Lightweight Image Captioning"

Taewhan Kim, Soeun Lee, Si-Woo Kim, Dong-Jin Kim

AAAI Conference on Artificial Intelligence (AAAI), 2025. (23.4% accept rate)

[PDF]

Also presented at "Workshop on Adaptive Foundation Models" in conjunction with NeurIPS 2024.

2024

"IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning"

{Soeun Lee*, Si-Woo Kim*}, Taewhan Kim, Dong-Jin Kim (* Co-first authors)

International Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024. (long, main) (20.8% accept rate)

[PDF] [code]

Also presented at "Workshop on Adaptive Foundation Models" and "Workshop on Video-Language Models" in conjunction with NeurIPS 2024.

"Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality"

Youngtaek Oh, Jae Won Cho, Dong-Jin Kim, In So Kweon, Junmo Kim

International Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024. (long, main) (20.8% accept rate)

[PDF]

~ 2023

"Technical Report of NICE Challenge at CVPR 2023: Retrieval-based Data Discovery and Fusion for Zero-shot Image Captioning"

Youngtaek Oh, Jae Won Cho, Dong-Jin Kim, In So Kweon, Jumno Kim

preprint, 2023.

[PDF] [code]

2nd place in the NICE Challenge at CVPR 2023

"Generative Bias for Robust Visual Question Answering"

Jae Won Cho, Dong-Jin Kim, Hyeonggon Ryu, and In So Kweon,

IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2023. (25.78% accept rate)

[PDF] [code]

- Received Bronze Prize, 28th Samsung Humantech Paper Awards (Top 2.8%)
- Received Excellent Paper Award, IW-FCV 2023
- Also presented at "Workshop on Open-Domain Reasoning Under Multi-Modal Settings" in conjunction with CVPR 2023

"Self-Sufficient Framework for Continuous Sign Language Recognition"

YeongJun Jang, Youngtaek Oh, Jae Won Cho, Myungchul Kim, Dong-Jin Kim, In So Kweon, and Joon Son Chung

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.

[PDF] [Project]

"Signing Outside the Studio: Benchmarking Background Robustness for Continuous Sign Language Recognition"

YeongJun Jang, Youngtaek Oh, Jae Won Cho, Dong-Jin Kim, Joon Son Chung, and In So Kweon

British Machine Vision Conference (BMVC), 2022.

[PDF] [Project] [code]

"DASO: Distribution-Aware Semantics-Oriented Pseudo-label for Imbalanced Semi-Supervised Learning"

YoungTaek Oh, Dong-Jin Kim, and In So Kweon.

IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2022. (25.3% accept rate)

[PDF] [Project] [code]

Also presented at "Workshop on Learning with Limited Labelled Data for Image and Video Understanding" in conjunction with CVPR 2022.

"Single-Modal Entropy based Active Learning for Visual Question Answering"

{Dong-Jin Kim*, Jae Won Cho*}, Jinsoo Choi, Yunjae Jung, and In So Kweon (* Co-first authors)

British Machine Vision Conference (BMVC), 2021.

[PDF]

"LabOR: Labeling Only if Required for Domain Adaptive Semantic Segmentation"

Inkyu Shin, Dong-Jin Kim, Jae Won Cho, Sanghyun Woo, KwanYong Park, and In So Kweon

IEEE International Conference on Computer Vision (ICCV), 2021. (Oral) (3% accept rate)

[PDF]

Winner of Qualcomm Innovation Fellowship 2021.

"Dealing with Missing Modalities in the Visual Question Answer-Difference Prediction Task through Knowledge Distillation"

Jae Won Cho, Dong-Jin Kim, Yunjae Jung, Jinsoo Choi, and In So Kweon

IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) Multimodal Learning and Applications Workshop, 2021.

[PDF]

Also presented at "Visual Question Answering Workshop" and "VizWiz Grand Challenge Workshop" in conjunction with CVPR 2021.

"Detecting Human-Object Interactions with Action Co-occurrence Priors"

Dong-Jin Kim, Xiao Sun, Jinsoo Choi, Stephen Lin, and In So Kweon,

European Conference on Computer Vision (ECCV), 2020. (27% accept rate)

[PDF] [Project] [code] [Slides] [Video] [Poster]

Received Silver Prize, 26th Samsung Humantech Paper Awards (Top 1.6%)
Also presented at "The 2nd workshop on Video Turing Test: Toward Human-Level Video Story Understanding" in conjunction with ECCV 2020.

"Image Captioning with Very Scarce Supervised Data: Adversarial Semi-Supervised Learning Approach"

Dong-Jin Kim, Jinsoo Choi, Tae-Hyun Oh, and In So Kweon.

International Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019. (23.8% accept rate)

[PDF] [Project] [Slides] [Poster]

Also, presented at "Language&Vision " and "Visual Question Answering and Dialog " Workshops in conjunction with CVPR 2019, and "CLVL: 3rd Workshop on Closing the Loop Between Vision and Language" in conjunction with ICCV 2019.

"Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning"

Dong-Jin Kim, Jinsoo Choi, Tae-Hyun Oh, and In So Kweon.

IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2019. (25.2% accept rate)

[PDF] [Project] [Dataset] [code] [Slides] [Poster]

Extension of this work received Qualcomm Innovation Award 2019.
Also presented at "Language&Vision" and "Visual Question Answering and Dialog" Workshops in conjunction with CVPR 2019.

"Disjoint Multi-task Learning between Heterogeneous Human-centric Tasks"

Dong-Jin Kim, Jinsoo Choi, Tae-Hyun Oh, Youngjin Yoon, and In So Kweon.

IEEE Winter Conference on Applications of Computer Vision (WACV), 2018. (Oral)

[PDF]

Page updated

Google Sites

Report abuse