Jaehyung Kim

I am a postdoctoral fellow at the Language Technology Institute, Carnegie Mellon University (CMU), working with Yiming Yang. Before joining CMU, I earned my Ph.D. at KAIST under the supervision of Prof. Jinwoo Shin. I also worked closely with Prof. Dongyeop Kang at the University of Minnesota. During my Ph.D., I had the chance to intern or collaborate with various industry and academic labs, including Naver AI, Meta AI, and SAIT. I am a recipient of Qualcomm Innovation Fellowship Korea (2021, 2022) from three of my papers. I also receive Silver prize from Samsung Humantech Paper Awards.  

🔥 News 

📑 Research 

My research goal is to enhance ML/NLP framework to be more reliable under challenging yet realistic scenarios. Recently, I’m mostly interested in large language models (LLMs), especially on their alignment and development. Previously, I tried to achieve this goal by designing the improved algorithms with data-centric perspective (e.g., long-tailed distribution, limited data, bias, distribution shift, etc) and human-in-the-loop pipeline (e.g., learning from human feedback). While my recent focus has mainly been within NLP, I’m also interested in improving ML framework in other domains

(C: Conference, W: Workshop, P: Preprint, *: Equal contribution)

2024
[P4] Online Adaptation of Language Models with a Memory of Amortized Contexts [pdf][code]

[P3] SelectLLM: Can LLMs Select Important Instructions to Annotate? [pdf][code]

[P2] Under the Surface: Tracking the Artifactuality of LLM-Generated Data [pdf][code][project]

[W2] Meta-Crafting: Improved Detection of Out-of-distributed Texts via Crafting Metadata Space

[C12] Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs [pdf][code]

[C11] SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs [pdf][code][slide][poster]

2023

[W1] Semi-supervised Tabular Classification via In-context Learning of Large Language Models [pdf]

[C10] RoAST: Robustifying Language Models via Adversarial Perturbation with Selective Training [pdf][code][slide][poster]

[C9] A Universal Framework for Dataset Characterization with Multidimensional Meta-information [pdf][code][slide][poster]

[C8] Prefer to Classify: Improving Text Classifiers via Auxiliary Preference Learning [pdf][code][slide][poster]

[C7] Everyone’s Voice Matters: Quantifying Annotation Disagreement Using Demographic Information [pdf][code]

2022

[C6] Time Is MattEr: Temporal Self-supervision for Video Transformers [pdf][code]

[C5] Patch-level Representation Learning for Self-supervised Vision Transformers [pdf][code]

[C4] What Makes Better Augmentation Strategies? Augment Difficult but Not too Different [pdf][code][slide][poster]

[C3] Spread Spurious Attribute: Improving Worst-group Accuracy with Spurious Attribute Estimation [pdf]

~2020

[C2] Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning [pdf][code][slide][poster]

[C1] M2m: Imbalanced Classification via Major-to-minor Translation [pdf][code][slide]

[P1] Simplified Stochastic Feedforward Neural Networks [pdf]

💻 Work Experiences

Collaborator, Naver AI, Jeongja, South Korea - with Sang-Woo Lee, Minjoon Seo, and Jung-Woo Ha (Nov. 2022 - May. 2023)

Research Intern, Meta AI, Seattle, WA - with Madian Khabsa (May. 2022 - Aug. 2022)

Visiting Student, University of Minnesota, Minneapolis, MN - with Dongyeop Kang (Feb. 2022 - May. 2022)

Visiting Student, Samsung Advanced Institute of Technology, Suwon, South Korea - with Eunho Yang and Sung Ju Hwang (Jan. 2020 - Mar. 2020)

🏆 Awards

👨‍🎓 Academic Services

Conference Reviewer

Journal Reviewer

🎤 Invited Talks

Designing New Effective Inference Algorithms for Large Language Models

Deep Learning with Imbalanced Datasets

Multi-aspect Analysis on Data Informativeness

Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning