Jaehyung Kim

I am a postdoctoral fellow at the Language Technology Institute, Carnegie Mellon University (CMU), working with Yiming Yang. Before joining CMU, I earned my Ph.D. at KAIST under the supervision of Prof. Jinwoo Shin. I also worked closely with Prof. Dongyeop Kang at the University of Minnesota. During my Ph.D., I had the chance to intern or collaborate with various industry and academic labs, including Naver AI, Meta AI, and SAIT. I am a recipient of Qualcomm Innovation Fellowship Korea (2021, 2022) from three of my papers. I also receive Silver prize from Samsung Humantech Paper Awards.

Contact: wogudehowl@gmail.com (or jaehyun4@andrew.cmu.edu)
Google Scholar, CV (update on 24.02)

🔥 News

Sep 2024: I will be joining the Department of Artificial Intelligence at Yonsei University as an Assistant Professor!
Jul 2024: P2T is accepted to COLM 2024!
Jan 2024: Two papers (SuRe and HOMER) are accepted to ICLR 2024, including 1 first-authored paper (SuRe)!
Jan 2024: I started my post-doc at CMU-LTI!
Dec 2023: Meta-Crafting is accepted to AAAI 2024!
Oct 2023: I successfully defended my PhD thesis!
Oct 2023: RoAST is accepted to EMNLP 2023!
Jul 2023: I will attend ACL 2023 and ICML 2023 in person to present 2 first-authored papers (infoVerse and Prefer-to-Classify). Please let me know if you want to meet or chat there!

📑 Research

My research goal is to enhance machine learning (ML) and NLP frameworks to be more accurate and reliable in real-world scenarios, via data-centric approaches. Recently, I’ve been mostly interested in large language models (LLMs), especially in their alignment, adaptation (e.g., personalization), and development. While my recent focus has mainly been on NLP and LLMs, I’m also interested in improving ML frameworks in other domains.

(C: Conference, W: Workshop, P: Preprint, *: Equal contribution)

2024
[P8] Learning to Correct for QA Reasoning with Black-box LLMs [pdf][code]

Jaehyung Kim, Dongyoung Kim, Yiming Yang

Arxiv Preprint 2024

[P7] Few-shot Personalization of LLMs with Mis-aligned Responses [pdf][code]

Jaehyung Kim, Yiming Yang

Arxiv Preprint 2024

[P6] Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [pdf][code]

Jaehyun Nam, Kyuyoung Kim, Seunghyuk Oh, Jihoon Tack, Jaehyung Kim, Jinwoo Shin

Arxiv Preprint 2024

[P5] Aligning Large Language Models with Self-generated Preference Data [pdf][code]

Dongyoung Kim, Kimin Lee, Jinwoo Shin, Jaehyung Kim

Arxiv Preprint 2024

[P4] Online Adaptation of Language Models with a Memory of Amortized Contexts [pdf][code]

Jihoon Tack, Jaehyung Kim, Eric Mitchell, Jinwoo Shin, Yee Whye Teh, and Jonathan Richard Schwarz

Arxiv Preprint 2024

[P3] SelectLLM: Can LLMs Select Important Instructions to Annotate? [pdf][code]

Ritik Sachin Parkar*, Jaehyung Kim*, Jong Inn Park, and Dongyeop Kang

Arxiv Preprint 2024

[P2] Under the Surface: Tracking the Artifactuality of LLM-Generated Data [pdf][code][project]

Debarati Das*, Karin De Langis*, Anna Martin*, Jaehyung Kim*, Minhwa Lee*, Zae Myung Kim*, Shirley Hayati, Risako Owan, Bin Hu, Ritik Parkar, Ryan Koo, Jonginn Park, Aahan Tyagi, Libby Ferland, Sanjali Roy, Vincent Liu, and Dongyeop Kang

Arxiv Preprint 2024

[W2] Meta-Crafting: Improved Detection of Out-of-distributed Texts via Crafting Metadata Space

Ryan Koo, Yekyung Kim, Dongyeop Kang, and Jaehyung Kim

AAAI Conference on Artificial Intelligence (AAAI) Student Abstract and Poster Program 2024

[C13, W1] Tabular Transfer Learning via Prompting LLMs

Jaehyun Nam, Woomin Song, Seong Hyeon Park, Jihoon Tack, Sukmin Yun, Jaehyung Kim, Kyu Hwan Oh, and Jinwoo Shin
Conference on Language Modeling (COLM) 2024

[C12] Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs [pdf][code]

Woomin Song*, Seunghyuk Oh*, Sangwoo Mo, Jaehyung Kim, Sukmin Yun, Jung-Woo Ha, and Jinwoo Shin
International Conference on Learning Representations (ICLR) 2024

[C11] SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs [pdf][code][slide][poster]

Jaehyung Kim, Jaehyun Nam, Sangwoo Mo, Jongjin Park, Sang-Woo Lee, Minjoon Seo, Jung-Woo Ha, and Jinwoo Shin
International Conference on Learning Representations (ICLR) 2024

2023

[W1] Semi-supervised Tabular Classification via In-context Learning of Large Language Models [pdf]

Jaehyun Nam, Woomin Song, Seong Hyeon Park, Jihoon Tack, Sukmin Yun, Jaehyung Kim, and Jinwoo Shin
ICML Workshop on Efficient Systems for Foundation Models (ES-FoMo) 2023

[C10] RoAST: Robustifying Language Models via Adversarial Perturbation with Selective Training [pdf][code][slide][poster]

Jaehyung Kim, Yuning Mao, Rui Hou, Hanchao Yu, Davis Liang, Pascale Fung, Qifan Wang, Fuli Feng, Lifu Huang, and Madian Khabsa
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) 2023

[C9] A Universal Framework for Dataset Characterization with Multidimensional Meta-information [pdf][code][slide][poster]

Jaehyung Kim, Yekyung Kim, Karin Johanna Denton de Langis, Jinwoo Shin, and Dongyeop Kang
Annual Meeting of the Association for Computational Linguistics (ACL) 2023

[C8] Prefer to Classify: Improving Text Classifiers via Auxiliary Preference Learning [pdf][code][slide][poster]

Jaehyung Kim, Jinwoo Shin, and Dongyeop Kang

International Conference on Machine Learning (ICML) 2023

[C7] Everyone’s Voice Matters: Quantifying Annotation Disagreement Using Demographic Information [pdf][code]

Ruyuan Wan, Jaehyung Kim, and Dongyeop Kang

AAAI Conference on Artificial Intelligence (AAAI) 2023 (Oral Presentation)

~ 2022

[C6] Time Is MattEr: Temporal Self-supervision for Video Transformers [pdf][code]

Sukmin Yun, Jaehyung Kim, Dongyoon Han, Hwanjun Song, Jung-Woo Ha, and Jinwoo Shin

International Conference on Machine Learning (ICML) 2022
Winners, Qualcomm Innovation Fellowship Korea 2022

[C5] Patch-level Representation Learning for Self-supervised Vision Transformers [pdf][code]

Sukmin Yun, Hankook Lee, Jaehyung Kim, and Jinwoo Shin

Conference on Computer Vision and Pattern Recognition (CVPR) 2022 (Oral Presentation)

[C4] What Makes Better Augmentation Strategies? Augment Difficult but Not too Different [pdf][code][slide][poster]

Jaehyung Kim, Dongyeop Kang, Sungsoo Ahn, and Jinwoo Shin

International Conference on Learning Representations (ICLR) 2022
Silver Prize, Samsung Humantech Paper Awards 2021
Winners, Qualcomm Innovation Fellowship Korea 2022

[C3] Spread Spurious Attribute: Improving Worst-group Accuracy with Spurious Attribute Estimation [pdf]

Junhyun Nam, Jaehyung Kim, Jaeho Lee, and Jinwoo Shin
International Conference on Learning Representations (ICLR) 2022

[C2] Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning [pdf][code][slide][poster]

Jaehyung Kim, Youngbum Hur, Sejun Park, Eunho Yang, Sung Ju Hwang, and Jinwoo Shin
Neural Information Processing Systems (NeurIPS) 2020
Winners, Qualcomm Innovation Fellowship Korea 2021

[C1] M2m: Imbalanced Classification via Major-to-minor Translation [pdf][code][slide]

Jaehyung Kim*, Jongheon Jeong*, and Jinwoo Shin
Conference on Computer Vision and Pattern Recognition (CVPR) 2020

[P1] Simplified Stochastic Feedforward Neural Networks [pdf]

Kimin Lee, Jaehyung Kim, Song Chong, and Jinwoo Shin
ArXiv Preprint 2017

💻 Work Experiences

Collaborator, Naver AI, Jeongja, South Korea - with Sang-Woo Lee, Minjoon Seo, and Jung-Woo Ha (Nov. 2022 - May. 2023)

Research Intern, Meta AI, Seattle, WA - with Madian Khabsa (May. 2022 - Aug. 2022)

Visiting Student, University of Minnesota, Minneapolis, MN - with Dongyeop Kang (Feb. 2022 - May. 2022)

Visiting Student, Samsung Advanced Institute of Technology, Suwon, South Korea - with Eunho Yang and Sung Ju Hwang (Jan. 2020 - Mar. 2020)

🏆 Awards

Winner ($5,000), Qualcomm Innovation Fellowship Korea 2022
Silver Prize ($7,000), Samsung Humantech Paper Awards 2021
Winner ($5,000), Qualcomm Innovation Fellowship Korea 2021

3rd place, ILSVRC (ImageNet Challenge) on Object Detection 2017

👨‍🎓 Academic Services

Conference Reviewer

AAAI Conference on Artificial Intelligence (AAAI): 2021, 2022, 2023

Association for Computational Linguistics (ACL) Rolling Review: 2022, 2023, 2024
Conference on Computer Vision and Pattern Recognition (CVPR): 2023
Conference on Empirical Methods in Natural Language Processing (EMNLP): 2022, 2023

International Conference on Learning Representations (ICLR): 2022, 2023, 2024
International Conference on Machine Learning (ICML): 2021, 2022, 2023, 2024
Neural Information Processing Systems (NeurIPS): 2021, 2022, 2023, 2024

Journal Reviewer

Transactions on Machine Learning Research (TMLR)

IEEE Transactions on Neural Networks and Learning Systems (TNNLS)

🎤 Invited Talks

Designing New Effective Inference Algorithms for Large Language Models

Amazon Artificial General Intelligence (AGI) (Mar. 2024)
Ulsan National Institute of Science and Technology (UNIST) (May. 2024)

Deep Learning with Imbalanced Datasets

Samsung Electronics Data & Information Technology Center (Oct. 2021)

Multi-aspect Analysis on Data Informativeness

Summer 2021 Presentation Minnesota NLP Group (Aug. 2021)

Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning

NeurIPS 2020 Social ML in Korea (Dec. 2020)