Jaehyung Kim
I am a postdoctoral fellow at the Language Technology Institute, Carnegie Mellon University (CMU), working with Yiming Yang. Before joining CMU, I earned my Ph.D. at KAIST under the supervision of Prof. Jinwoo Shin. I also worked closely with Prof. Dongyeop Kang at the University of Minnesota. During my Ph.D., I had the chance to intern or collaborate with various industry and academic labs, including Naver AI, Meta AI, and SAIT. I am a recipient of Qualcomm Innovation Fellowship Korea (2021, 2022) from three of my papers. I also receive Silver prize from Samsung Humantech Paper Awards.
Contact: wogudehowl@gmail.com (or jaehyun4@andrew.cmu.edu)
Google Scholar, CV (update on 24.02)
🔥 News
Jan 2024: Two papers (SuRe and HOMER) are accepted to ICLR 2024, including 1 first-authored paper (SuRe)!
Jan 2024: I started my post-doc at CMU-LTI!
Dec 2023: Meta-Crafting is accepted to AAAI 2024!
Oct 2023: I successfully defended my PhD thesis!
Oct 2023: RoAST is accepted to EMNLP 2023!
Jul 2023: I will attend ACL 2023 and ICML 2023 in person to present 2 first-authored papers (infoVerse and Prefer-to-Classify). Please let me know if you want to meet or chat there!
Jun 2023: SPROUT is accepted to ICML 2023 Workshop on Efficient Systems for Foundation Models (ES-FoMO)!
📑 Research
My research goal is to enhance ML/NLP framework to be more reliable under challenging yet realistic scenarios. Recently, I’m mostly interested in large language models (LLMs), especially on their alignment and development. Previously, I tried to achieve this goal by designing the improved algorithms with data-centric perspective (e.g., long-tailed distribution, limited data, bias, distribution shift, etc) and human-in-the-loop pipeline (e.g., learning from human feedback). While my recent focus has mainly been within NLP, I’m also interested in improving ML framework in other domains
(C: Conference, W: Workshop, P: Preprint, *: Equal contribution)
Jihoon Tack, Jaehyung Kim, Eric Mitchell, Jinwoo Shin, Yee Whye Teh, and Jonathan Richard Schwarz
Arxiv Preprint 2024
Ritik Sachin Parkar*, Jaehyung Kim*, Jong Inn Park, and Dongyeop Kang
Arxiv Preprint 2024
[P2] Under the Surface: Tracking the Artifactuality of LLM-Generated Data [pdf][code][project]
Debarati Das*, Karin De Langis*, Anna Martin*, Jaehyung Kim*, Minhwa Lee*, Zae Myung Kim*, Shirley Hayati, Risako Owan, Bin Hu, Ritik Parkar, Ryan Koo, Jonginn Park, Aahan Tyagi, Libby Ferland, Sanjali Roy, Vincent Liu, and Dongyeop Kang
Arxiv Preprint 2024
[W2] Meta-Crafting: Improved Detection of Out-of-distributed Texts via Crafting Metadata Space
Ryan Koo, Yekyung Kim, Dongyeop Kang, and Jaehyung Kim
AAAI Conference on Artificial Intelligence (AAAI) Student Abstract and Poster Program 2024 (to appear)
[C12] Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs [pdf][code]
Woomin Song*, Seunghyuk Oh*, Sangwoo Mo, Jaehyung Kim, Sukmin Yun, Jung-Woo Ha, and Jinwoo Shin
International Conference on Learning Representations (ICLR) 2024 (to appear)
[C11] SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs [pdf][code][slide][poster]
Jaehyung Kim, Jaehyun Nam, Sangwoo Mo, Jongjin Park, Sang-Woo Lee, Minjoon Seo, Jung-Woo Ha, and Jinwoo Shin
International Conference on Learning Representations (ICLR) 2024 (to appear)
2023
[W1] Semi-supervised Tabular Classification via In-context Learning of Large Language Models [pdf]
Jaehyun Nam, Woomin Song, Seong Hyeon Park, Jihoon Tack, Sukmin Yun, Jaehyung Kim, and Jinwoo Shin
ICML Workshop on Efficient Systems for Foundation Models (ES-FoMo) 2023
[C10] RoAST: Robustifying Language Models via Adversarial Perturbation with Selective Training [pdf][code][slide][poster]
Jaehyung Kim, Yuning Mao, Rui Hou, Hanchao Yu, Davis Liang, Pascale Fung, Qifan Wang, Fuli Feng, Lifu Huang, and Madian Khabsa
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) 2023
[C9] A Universal Framework for Dataset Characterization with Multidimensional Meta-information [pdf][code][slide][poster]
Jaehyung Kim, Yekyung Kim, Karin Johanna Denton de Langis, Jinwoo Shin, and Dongyeop Kang
Annual Meeting of the Association for Computational Linguistics (ACL) 2023
[C8] Prefer to Classify: Improving Text Classifiers via Auxiliary Preference Learning [pdf][code][slide][poster]
Jaehyung Kim, Jinwoo Shin, and Dongyeop Kang
Ruyuan Wan, Jaehyung Kim, and Dongyeop Kang
AAAI Conference on Artificial Intelligence (AAAI) 2023 (Oral Presentation)
2022
Sukmin Yun, Jaehyung Kim, Dongyoon Han, Hwanjun Song, Jung-Woo Ha, and Jinwoo Shin
Winners, Qualcomm Innovation Fellowship Korea 2022
Sukmin Yun, Hankook Lee, Jaehyung Kim, and Jinwoo Shin
Conference on Computer Vision and Pattern Recognition (CVPR) 2022 (Oral Presentation)
Jaehyung Kim, Dongyeop Kang, Sungsoo Ahn, and Jinwoo Shin
International Conference on Learning Representations (ICLR) 2022
Silver Prize, Samsung Humantech Paper Awards 2021
Winners, Qualcomm Innovation Fellowship Korea 2022
[C3] Spread Spurious Attribute: Improving Worst-group Accuracy with Spurious Attribute Estimation [pdf]
Junhyun Nam, Jaehyung Kim, Jaeho Lee, and Jinwoo Shin
International Conference on Learning Representations (ICLR) 2022
~2020
Jaehyung Kim, Youngbum Hur, Sejun Park, Eunho Yang, Sung Ju Hwang, and Jinwoo Shin
Jaehyung Kim*, Jongheon Jeong*, and Jinwoo Shin
Conference on Computer Vision and Pattern Recognition (CVPR) 2020
Kimin Lee, Jaehyung Kim, Song Chong, and Jinwoo Shin
ArXiv Preprint 2017
💻 Work Experiences
Collaborator, Naver AI, Jeongja, South Korea - with Sang-Woo Lee, Minjoon Seo, and Jung-Woo Ha (Nov. 2022 - May. 2023)
Research Intern, Meta AI, Seattle, WA - with Madian Khabsa (May. 2022 - Aug. 2022)
Visiting Student, University of Minnesota, Minneapolis, MN - with Dongyeop Kang (Feb. 2022 - May. 2022)
Visiting Student, Samsung Advanced Institute of Technology, Suwon, South Korea - with Eunho Yang and Sung Ju Hwang (Jan. 2020 - Mar. 2020)
👨🎓 Academic Services
Conference Reviewer
AAAI Conference on Artificial Intelligence (AAAI): 2021, 2022, 2023
Association for Computational Linguistics (ACL) Rolling Review: 2022, 2023
Conference on Computer Vision and Pattern Recognition (CVPR): 2023
Conference on Empirical Methods in Natural Language Processing (EMNLP): 2022, 2023
International Conference on Learning Representations (ICLR): 2022, 2023, 2024
International Conference on Machine Learning (ICML): 2021, 2022, 2023, 2024
Neural Information Processing Systems (NeurIPS): 2021, 2022, 2023
Journal Reviewer
Transactions on Machine Learning Research (TMLR)
IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
🎤 Invited Talks
Designing New Effective Inference Algorithms for Large Language Models
Amazon Artificial General Intelligence (AGI) (Mar. 2024)
Deep Learning with Imbalanced Datasets
Samsung Electronics Data & Information Technology Center (Oct. 2021)
Multi-aspect Analysis on Data Informativeness
Summer 2021 Presentation Minnesota NLP Group (Aug. 2021)
Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning
NeurIPS 2020 Social ML in Korea (Dec. 2020)