"ScaleDiff: Higher-Resolution Image Synthesis via Efficient and Model-Agnostic Diffusion"
Sungho Koh, SeungJu Cha, Hyunwoo Oh, Kwanyoung Lee, Dong-Jin Kim
Neural Information Processing Systems (NeurIPS), 2025. (24.52% accept rate)
[PDF]
"Sali4Vid: Saliency-Aware Video Reweighting and Adaptive Caption Retrieval for Dense Video Captioning"
MinJu Jeon, Si-Woo Kim, Ye-Chan Kim, HyunGee Kim, Dong-Jin Kim
International Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025. (long, main) (22.16% accept rate)
[PDF]
"SIDA: Synthetic Image Driven Zero-shot Domain Adaptation"
Ye-Chan Kim, SeungJu Cha, Si-Woo Kim, Taewhan Kim, Dong-Jin Kim
ACM International Conference on Multimedia (MM), 2025. (??% accept rate)
[PDF]
Also presented at "Workshop on Curated Data for Efficient Learning" in conjunction with ICCV 2025.
"SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning"
Si-Woo Kim, MinJu Jeon, Ye-Chan Kim, Soeun Lee, Taewhan Kim, Dong-Jin Kim
ACM International Conference on Multimedia (MM), 2025. (??% accept rate)
[PDF]
Also presented at "Workshop on Curated Data for Efficient Learning" in conjunction with ICCV 2025.
"CatchPhrase: EXPrompt-Guided Encoder Adaptation for Audio-to-Image Generation"
Hyunwoo Oh, SeungJu Cha, Kwanyoung Lee, Si-Woo Kim, Dong-Jin Kim
ACM International Conference on Multimedia (MM), 2025. (??% accept rate)
[PDF]
"VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness"
SeungJu Cha, Kwanyoung Lee, Ye-Chan Kim, Hyunwoo Oh, Dong-Jin Kim
IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2025. (22.1% accept rate)
Also presented at "Workshop on AI for Creative Visual Content Generation Editing and Understanding" in conjunction with CVPR 2025.
"ViPCap: Retrieval Text-based Visual Prompts for Lightweight Image Captioning"
Taewhan Kim, Soeun Lee, Si-Woo Kim, Dong-Jin Kim
AAAI Conference on Artificial Intelligence (AAAI), 2025. (23.4% accept rate)
[PDF]
Also presented at "Workshop on Adaptive Foundation Models" in conjunction with NeurIPS 2024.
"IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning"
{Soeun Lee*, Si-Woo Kim*}, Taewhan Kim, Dong-Jin Kim (* Co-first authors)
International Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024. (long, main) (20.8% accept rate)
Also presented at "Workshop on Adaptive Foundation Models" and "Workshop on Video-Language Models" in conjunction with NeurIPS 2024.
"Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality"
Youngtaek Oh, Jae Won Cho, Dong-Jin Kim, In So Kweon, Junmo Kim
International Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024. (long, main) (20.8% accept rate)
[PDF]
"Generative Bias for Robust Visual Question Answering"
Jae Won Cho, Dong-Jin Kim, Hyeonggon Ryu, and In So Kweon,
IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2023. (25.78% accept rate)
Received Bronze Prize, 28th Samsung Humantech Paper Awards (Top 2.8%)
Received Excellent Paper Award, IW-FCV 2023
Also presented at "Workshop on Open-Domain Reasoning Under Multi-Modal Settings" in conjunction with CVPR 2023
"DASO: Distribution-Aware Semantics-Oriented Pseudo-label for Imbalanced Semi-Supervised Learning"
YoungTaek Oh, Dong-Jin Kim, and In So Kweon.
IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2022. (25.3% accept rate)
Also presented at "Workshop on Learning with Limited Labelled Data for Image and Video Understanding" in conjunction with CVPR 2022.
"Single-Modal Entropy based Active Learning for Visual Question Answering"
{Dong-Jin Kim*, Jae Won Cho*}, Jinsoo Choi, Yunjae Jung, and In So Kweon (* Co-first authors)
British Machine Vision Conference (BMVC), 2021.
[PDF]
"LabOR: Labeling Only if Required for Domain Adaptive Semantic Segmentation"
Inkyu Shin, Dong-Jin Kim, Jae Won Cho, Sanghyun Woo, KwanYong Park, and In So Kweon
IEEE International Conference on Computer Vision (ICCV), 2021. (Oral) (3% accept rate)
[PDF]
Winner of Qualcomm Innovation Fellowship 2021.
"Dealing with Missing Modalities in the Visual Question Answer-Difference Prediction Task through Knowledge Distillation"
Jae Won Cho, Dong-Jin Kim, Yunjae Jung, Jinsoo Choi, and In So Kweon
IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) Multimodal Learning and Applications Workshop, 2021.
[PDF]
Also presented at "Visual Question Answering Workshop" and "VizWiz Grand Challenge Workshop" in conjunction with CVPR 2021.
"Detecting Human-Object Interactions with Action Co-occurrence Priors"
Dong-Jin Kim, Xiao Sun, Jinsoo Choi, Stephen Lin, and In So Kweon,
European Conference on Computer Vision (ECCV), 2020. (27% accept rate)
[PDF] [Project] [code] [Slides] [Video] [Poster]
Received Silver Prize, 26th Samsung Humantech Paper Awards (Top 1.6%)
Also presented at "The 2nd workshop on Video Turing Test: Toward Human-Level Video Story Understanding" in conjunction with ECCV 2020.
"Image Captioning with Very Scarce Supervised Data: Adversarial Semi-Supervised Learning Approach"
Dong-Jin Kim, Jinsoo Choi, Tae-Hyun Oh, and In So Kweon.
International Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019. (23.8% accept rate)
[PDF] [Project] [Slides] [Poster]
Also, presented at "Language&Vision " and "Visual Question Answering and Dialog " Workshops in conjunction with CVPR 2019, and "CLVL: 3rd Workshop on Closing the Loop Between Vision and Language" in conjunction with ICCV 2019.
"Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning"
Dong-Jin Kim, Jinsoo Choi, Tae-Hyun Oh, and In So Kweon.
IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2019. (25.2% accept rate)
[PDF] [Project] [Dataset] [code] [Slides] [Poster]
Extension of this work received Qualcomm Innovation Award 2019.
Also presented at "Language&Vision" and "Visual Question Answering and Dialog" Workshops in conjunction with CVPR 2019.
"Disjoint Multi-task Learning between Heterogeneous Human-centric Tasks"
Dong-Jin Kim, Jinsoo Choi, Tae-Hyun Oh, Youngjin Yoon, and In So Kweon.
IEEE Winter Conference on Applications of Computer Vision (WACV), 2018. (Oral)
[PDF]