Jaehyung Kim

I am a postdoctoral fellow at the Language Technology Institute, Carnegie Mellon University (CMU), working with Yiming Yang. Before joining CMU, I earned my Ph.D. at KAIST under the supervision of Prof. Jinwoo Shin. I also worked closely with Prof. Dongyeop Kang at the University of Minnesota. During my Ph.D., I had the chance to intern or collaborate with various industry and academic labs, including Naver AI, Meta AI, and SAIT. I am a recipient of Qualcomm Innovation Fellowship Korea (2021, 2022) from three of my papers. I also receive Silver prize from Samsung Humantech Paper Awards.  

🔥 News 

📑 Research 

My research goal is to enhance machine learning (ML) and NLP frameworks to be more accurate and reliable in real-world scenarios, via data-centric approaches. Recently, I’ve been mostly interested in large language models (LLMs), especially in their alignment, adaptation (e.g., personalization), and development. While my recent focus has mainly been on NLP and LLMs, I’m also interested in improving ML frameworks in other domains.

(C: Conference, W: Workshop, P: Preprint, *: Equal contribution)

[P6] Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [pdf][code]

[P5] Aligning Large Language Models with Self-generated Preference Data [pdf][code]

[P4] Online Adaptation of Language Models with a Memory of Amortized Contexts [pdf][code]

[P3] SelectLLM: Can LLMs Select Important Instructions to Annotate? [pdf][code]

[P2] Under the Surface: Tracking the Artifactuality of LLM-Generated Data [pdf][code][project]

[W2] Meta-Crafting: Improved Detection of Out-of-distributed Texts via Crafting Metadata Space

[C12] Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs [pdf][code]

[C11] SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs [pdf][code][slide][poster]


[W1] Semi-supervised Tabular Classification via In-context Learning of Large Language Models [pdf]

[C10] RoAST: Robustifying Language Models via Adversarial Perturbation with Selective Training [pdf][code][slide][poster]

[C9] A Universal Framework for Dataset Characterization with Multidimensional Meta-information [pdf][code][slide][poster]

[C8] Prefer to Classify: Improving Text Classifiers via Auxiliary Preference Learning [pdf][code][slide][poster]

[C7] Everyone’s Voice Matters: Quantifying Annotation Disagreement Using Demographic Information [pdf][code]


[C6] Time Is MattEr: Temporal Self-supervision for Video Transformers [pdf][code]

[C5] Patch-level Representation Learning for Self-supervised Vision Transformers [pdf][code]

[C4] What Makes Better Augmentation Strategies? Augment Difficult but Not too Different [pdf][code][slide][poster]

[C3] Spread Spurious Attribute: Improving Worst-group Accuracy with Spurious Attribute Estimation [pdf]


[C2] Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning [pdf][code][slide][poster]

[C1] M2m: Imbalanced Classification via Major-to-minor Translation [pdf][code][slide]

[P1] Simplified Stochastic Feedforward Neural Networks [pdf]

💻 Work Experiences

Collaborator, Naver AI, Jeongja, South Korea - with Sang-Woo Lee, Minjoon Seo, and Jung-Woo Ha (Nov. 2022 - May. 2023)

Research Intern, Meta AI, Seattle, WA - with Madian Khabsa (May. 2022 - Aug. 2022)

Visiting Student, University of Minnesota, Minneapolis, MN - with Dongyeop Kang (Feb. 2022 - May. 2022)

Visiting Student, Samsung Advanced Institute of Technology, Suwon, South Korea - with Eunho Yang and Sung Ju Hwang (Jan. 2020 - Mar. 2020)

🏆 Awards

👨‍🎓 Academic Services

Conference Reviewer

Journal Reviewer

🎤 Invited Talks

Designing New Effective Inference Algorithms for Large Language Models

Deep Learning with Imbalanced Datasets

Multi-aspect Analysis on Data Informativeness

Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning