Jaehyung Kim
I am a postdoctoral fellow at the Language Technology Institute, Carnegie Mellon University (CMU), working with Yiming Yang. Before joining CMU, I earned my Ph.D. at KAIST under the supervision of Prof. Jinwoo Shin. I also worked closely with Prof. Dongyeop Kang at the University of Minnesota. During my Ph.D., I had the chance to intern or collaborate with various industry and academic labs, including Naver AI, Meta AI, and SAIT. I am a recipient of Qualcomm Innovation Fellowship Korea (2021, 2022) from three of my papers. I also receive Silver prize from Samsung Humantech Paper Awards.
Contact: wogudehowl@gmail.com (or jaehyun4@andrew.cmu.edu)
Google Scholar, CV (update on 24.02)
🔥 News
Sep 2024: I will be joining the Department of Artificial Intelligence at Yonsei University as an Assistant Professor!
Jul 2024: P2T is accepted to COLM 2024!
Jan 2024: Two papers (SuRe and HOMER) are accepted to ICLR 2024, including 1 first-authored paper (SuRe)!
Jan 2024: I started my post-doc at CMU-LTI!
Dec 2023: Meta-Crafting is accepted to AAAI 2024!
Oct 2023: I successfully defended my PhD thesis!
Oct 2023: RoAST is accepted to EMNLP 2023!
Jul 2023: I will attend ACL 2023 and ICML 2023 in person to present 2 first-authored papers (infoVerse and Prefer-to-Classify). Please let me know if you want to meet or chat there!
📑 Research
My research goal is to enhance machine learning (ML) and NLP frameworks to be more accurate and reliable in real-world scenarios, via data-centric approaches. Recently, I’ve been mostly interested in large language models (LLMs), especially in their alignment, adaptation (e.g., personalization), and development. While my recent focus has mainly been on NLP and LLMs, I’m also interested in improving ML frameworks in other domains.
(C: Conference, W: Workshop, P: Preprint, *: Equal contribution)
Jaehyung Kim, Dongyoung Kim, Yiming Yang
Arxiv Preprint 2024
Jaehyung Kim, Yiming Yang
Arxiv Preprint 2024
[P6] Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [pdf][code]
Jaehyun Nam, Kyuyoung Kim, Seunghyuk Oh, Jihoon Tack, Jaehyung Kim, Jinwoo Shin
Arxiv Preprint 2024
Dongyoung Kim, Kimin Lee, Jinwoo Shin, Jaehyung Kim
Arxiv Preprint 2024
Jihoon Tack, Jaehyung Kim, Eric Mitchell, Jinwoo Shin, Yee Whye Teh, and Jonathan Richard Schwarz
Arxiv Preprint 2024
Ritik Sachin Parkar*, Jaehyung Kim*, Jong Inn Park, and Dongyeop Kang
Arxiv Preprint 2024
[P2] Under the Surface: Tracking the Artifactuality of LLM-Generated Data [pdf][code][project]
Debarati Das*, Karin De Langis*, Anna Martin*, Jaehyung Kim*, Minhwa Lee*, Zae Myung Kim*, Shirley Hayati, Risako Owan, Bin Hu, Ritik Parkar, Ryan Koo, Jonginn Park, Aahan Tyagi, Libby Ferland, Sanjali Roy, Vincent Liu, and Dongyeop Kang
Arxiv Preprint 2024
[W2] Meta-Crafting: Improved Detection of Out-of-distributed Texts via Crafting Metadata Space
Ryan Koo, Yekyung Kim, Dongyeop Kang, and Jaehyung Kim
AAAI Conference on Artificial Intelligence (AAAI) Student Abstract and Poster Program 2024
[C13, W1] Tabular Transfer Learning via Prompting LLMs
Jaehyun Nam, Woomin Song, Seong Hyeon Park, Jihoon Tack, Sukmin Yun, Jaehyung Kim, Kyu Hwan Oh, and Jinwoo Shin
Conference on Language Modeling (COLM) 2024
[C12] Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs [pdf][code]
Woomin Song*, Seunghyuk Oh*, Sangwoo Mo, Jaehyung Kim, Sukmin Yun, Jung-Woo Ha, and Jinwoo Shin
International Conference on Learning Representations (ICLR) 2024
[C11] SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs [pdf][code][slide][poster]
Jaehyung Kim, Jaehyun Nam, Sangwoo Mo, Jongjin Park, Sang-Woo Lee, Minjoon Seo, Jung-Woo Ha, and Jinwoo Shin
International Conference on Learning Representations (ICLR) 2024
2023
[W1] Semi-supervised Tabular Classification via In-context Learning of Large Language Models [pdf]
Jaehyun Nam, Woomin Song, Seong Hyeon Park, Jihoon Tack, Sukmin Yun, Jaehyung Kim, and Jinwoo Shin
ICML Workshop on Efficient Systems for Foundation Models (ES-FoMo) 2023
[C10] RoAST: Robustifying Language Models via Adversarial Perturbation with Selective Training [pdf][code][slide][poster]
Jaehyung Kim, Yuning Mao, Rui Hou, Hanchao Yu, Davis Liang, Pascale Fung, Qifan Wang, Fuli Feng, Lifu Huang, and Madian Khabsa
Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) 2023
[C9] A Universal Framework for Dataset Characterization with Multidimensional Meta-information [pdf][code][slide][poster]
Jaehyung Kim, Yekyung Kim, Karin Johanna Denton de Langis, Jinwoo Shin, and Dongyeop Kang
Annual Meeting of the Association for Computational Linguistics (ACL) 2023
[C8] Prefer to Classify: Improving Text Classifiers via Auxiliary Preference Learning [pdf][code][slide][poster]
Jaehyung Kim, Jinwoo Shin, and Dongyeop Kang
Ruyuan Wan, Jaehyung Kim, and Dongyeop Kang
AAAI Conference on Artificial Intelligence (AAAI) 2023 (Oral Presentation)
~ 2022
Sukmin Yun, Jaehyung Kim, Dongyoon Han, Hwanjun Song, Jung-Woo Ha, and Jinwoo Shin
Winners, Qualcomm Innovation Fellowship Korea 2022
Sukmin Yun, Hankook Lee, Jaehyung Kim, and Jinwoo Shin
Conference on Computer Vision and Pattern Recognition (CVPR) 2022 (Oral Presentation)
Jaehyung Kim, Dongyeop Kang, Sungsoo Ahn, and Jinwoo Shin
International Conference on Learning Representations (ICLR) 2022
Silver Prize, Samsung Humantech Paper Awards 2021
Winners, Qualcomm Innovation Fellowship Korea 2022
[C3] Spread Spurious Attribute: Improving Worst-group Accuracy with Spurious Attribute Estimation [pdf]
Junhyun Nam, Jaehyung Kim, Jaeho Lee, and Jinwoo Shin
International Conference on Learning Representations (ICLR) 2022
Jaehyung Kim, Youngbum Hur, Sejun Park, Eunho Yang, Sung Ju Hwang, and Jinwoo Shin
Jaehyung Kim*, Jongheon Jeong*, and Jinwoo Shin
Conference on Computer Vision and Pattern Recognition (CVPR) 2020
Kimin Lee, Jaehyung Kim, Song Chong, and Jinwoo Shin
ArXiv Preprint 2017
💻 Work Experiences
Collaborator, Naver AI, Jeongja, South Korea - with Sang-Woo Lee, Minjoon Seo, and Jung-Woo Ha (Nov. 2022 - May. 2023)
Research Intern, Meta AI, Seattle, WA - with Madian Khabsa (May. 2022 - Aug. 2022)
Visiting Student, University of Minnesota, Minneapolis, MN - with Dongyeop Kang (Feb. 2022 - May. 2022)
Visiting Student, Samsung Advanced Institute of Technology, Suwon, South Korea - with Eunho Yang and Sung Ju Hwang (Jan. 2020 - Mar. 2020)
👨🎓 Academic Services
Conference Reviewer
AAAI Conference on Artificial Intelligence (AAAI): 2021, 2022, 2023
Association for Computational Linguistics (ACL) Rolling Review: 2022, 2023, 2024
Conference on Computer Vision and Pattern Recognition (CVPR): 2023
Conference on Empirical Methods in Natural Language Processing (EMNLP): 2022, 2023
International Conference on Learning Representations (ICLR): 2022, 2023, 2024
International Conference on Machine Learning (ICML): 2021, 2022, 2023, 2024
Neural Information Processing Systems (NeurIPS): 2021, 2022, 2023, 2024
Journal Reviewer
Transactions on Machine Learning Research (TMLR)
IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
🎤 Invited Talks
Designing New Effective Inference Algorithms for Large Language Models
Amazon Artificial General Intelligence (AGI) (Mar. 2024)
Ulsan National Institute of Science and Technology (UNIST) (May. 2024)
Deep Learning with Imbalanced Datasets
Samsung Electronics Data & Information Technology Center (Oct. 2021)
Multi-aspect Analysis on Data Informativeness
Summer 2021 Presentation Minnesota NLP Group (Aug. 2021)
Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning
NeurIPS 2020 Social ML in Korea (Dec. 2020)