HYU Natural Language Processing (NLP) Laboratory

Welcome to Natural Language Processing (NLP) Lab. at Hanyang University.

We study various approaches and problems with regard to natural language, chiefly based on machine learning and AI technologies.

We are looking for MS/Ph.D. students (and interns) who are self-motivated and passionate about doing research in NLP.

Please submit your information on this page if you are interested in applying for our lab.

News!

(25/11/19) One paper has been accepted to KSC 2025. Congratulations to Jisu!

(25/09/17) One paper has been accepted to HCLT 2025. Congratulations to Yuri!

(25/08/21) Four papers—three in the Main Track and one in the Findings Track—have been accepted at EMNLP 2025. This marks a new record for the number of papers we are presenting at a single conference! 🎉 Congrats to all of our students who contributed as authors!

(25/08/21) Changhyeon has graduated with his Master's degrees. Wish him all the best!

(25/08/04) Honored to announce that HYU NLP will participate in the consortium led by Naver Cloud to conduct research and development for the National AI Foundation Model Project (독자 AI 파운데이션 모델 프로젝트).

(25/05/16) Three papers—two in the Main Track and one in the Industry Track—have been accepted at ACL 2025. One main paper is the result of a collaboration with SNU, while the other stems from our internal project. Also pleased to have our first Industry Track paper, produced in collaboration with Hyundai Engineering and Jenti AI. Congratulations to our students Seunghee, Changhyeon, and Minseo!

(25/02/27) Honored to share that HYU NLP will receive a three-year grant (Outstanding Young Scientist Grants) from the NRF for the project titled "Conversational Agents with Hyper Long-Term Memory"!

(25/02/21) Deokyeong and Kang Min have graduated with their Master's degrees. Wish them all the best!

Recent Publications

Beyond Task-Oriented and Chitchat Dialogues: Proactive and Transition-Aware Conversational Agents (EMNLP 2025)

Abstract

Conversational agents have traditionally been developed for either task-oriented dialogue (TOD) or open-ended chitchat, with limited progress in unifying the two. Yet, real-world conversations naturally involve fluid transitions between these modes. To address this gap, we introduce TACT (TOD-And-Chitchat Transition), a dataset designed for transition-aware dialogue modeling that incorporates structurally diverse and integrated mode flows. TACT supports both user- and agent-driven mode switches, enabling robust modeling of complex conversational dynamics. To evaluate an agent’s ability to initiate and recover from mode transitions, we propose two new metrics—Switch and Recovery. Models trained on TACT outperform baselines in both intent detection and mode transition handling. Moreover, applying Direct Preference Optimization (DPO) to TACT-trained models yields additional gains, achieving 75.74% joint mode-intent accuracy and a 70.1% win rate against GPT-4o in human evaluation. These results demonstrate that pairing structurally diverse data with DPO enhances response quality and transition control, paving the way for more proactive and transition-aware conversational agents.

MAGIC: A Multi-Hop and Graph-Based Benchmark for Inter-Context Conflicts in Retrieval-Augmented Generation (EMNLP 2025 Findings)

Abstract

Knowledge conflict often arises in retrieval-augmented generation (RAG) systems, where retrieved documents may be inconsistent with one another or contradict the model's parametric knowledge. Existing benchmarks for investigating the phenomenon have notable limitations, including a narrow focus on the question answering setup, heavy reliance on entity substitution techniques, and a restricted range of conflict types. To address these issues, we propose a knowledge graph (KG)-based framework that generates varied and subtle conflicts between two similar yet distinct contexts, while ensuring interpretability through the explicit relational structure of KGs. Experimental results on our benchmark, MAGIC, provide intriguing insights into the inner workings of LLMs regarding knowledge conflict: both open-source and proprietary models struggle with conflict detection- especially when multi-hop reasoning is required -and often fail to pinpoint the exact source of contradictions. Finally, we present in-depth analyses that serve as a foundation for improving LLMs in integrating diverse, sometimes even conflicting, information.

Memorization or Reasoning? Exploring the Idiom Understanding of LLMs (EMNLP 2025)

Abstract

Idioms have long posed a challenge due to their unique linguistic properties, which set them apart from other common expressions. While recent studies have leveraged large language models (LLMs) to handle idioms across various tasks, e.g., idiom-containing sentence generation and idiomatic machine translation, little is known about the underlying mechanisms of idiom processing in LLMs, particularly in multilingual settings. To this end, we introduce MIDAS, a new large-scale dataset of idioms in six languages, each paired with its corresponding meaning. Leveraging this resource, we conduct a comprehensive evaluation of LLMs’ idiom processing ability, identifying key factors that influence their performance. Our findings suggest that LLMs rely not only on memorization but also adopt a hybrid approach that integrates contextual cues and reasoning, especially when processing compositional idioms. This implies that idiom understanding in LLMs emerges from an interplay between internal knowledge retrieval and reasoning-based inference.

Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models (EMNLP 2025)

Abstract

Large language models often retain unintended content, prompting growing interest in knowledge unlearning. Recent approaches emphasize localized unlearning, which restricts parameter updates to specific regions in an effort to remove target knowledge while preserving unrelated general knowledge. However, their effectiveness remains uncertain due to the lack of robust and thorough evaluation of the trade-off between the competing goals of unlearning. In this paper, we begin by revisiting existing localized unlearning approaches. We then conduct controlled experiments to rigorously evaluate whether local parameter updates causally contribute to unlearning. Our findings reveal that the set of parameters that must be modified for effective unlearning is not strictly determined, challenging the core assumption of localized unlearning that parameter locality is inherently indicative of effective knowledge removal.

FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning (ACL 2025)

Abstract

Real-world decision-making often requires integrating and reasoning over information from multiple modalities. While recent multimodal large language models (MLLMs) have shown promise in such tasks, their ability to perform multi-hop reasoning across diverse sources remains insufficiently evaluated. Existing benchmarks, such as MMQA, face challenges due to (1) data contamination and (2) a lack of complex queries that necessitate operations across more than two modalities, hindering accurate performance assessment. To address this, we present Financial Cross-Modal Multi-Hop Reasoning (FCMR), a benchmark created to analyze the reasoning capabilities of MLLMs by urging them to combine information from textual reports, tables, and charts within the financial domain. FCMR is categorized into three difficulty levels—Easy, Medium, and Hard— facilitating a step-by-step evaluation. In particular, problems at the Hard level require precise cross-modal three-hop reasoning and are designed to prevent the disregard of any modality. Experiments on this new benchmark reveal that even state-of-the-art MLLMs struggle, with the best-performing model (Claude 3.5 Sonnet) achieving only 30.4% accuracy on the most challenging tier. We also conduct analysis to provide insights into the inner workings of the models, including the discovery of a critical bottleneck in the information retrieval phase.

Page updated

Google Sites

Report abuse