Yuan Yuan

MSE in Computer and Information &
MSE in Data Science
University of Pennsylvania

Yuan Yuan is a Master student pursuing a joint degree in Computer and Information Science and Data Science at the University of Pennsylvania. He has worked with Prof. Lyle Ungar, Prof. Harry Zhang, and Prof. Dan Roth on multiple research projects. In the summer of 2025, he will join Meta as a software development engineer intern in its Seattle office. In his spare time, he loves doing stand-up comedy in NYC.

He has experience in both natural language processing and multimodality, with research interests in language model reasoning, personalization, and LLM agent.

He received his bachelor degree in Political Science with a minor in Mathematics at Vassar College in 2021. He received The Julia Flitner Lamb Prize in 2020, which was awarded to excellence in the study of Political Science. He is also a member of Phi Beta Kappa.

Research Projects

Unpacking Persona Effects in Conversational Agents

Young-Min Cho, Yuan Yuan, Lyle Ungar

Ongoing (aiming for EMNLP 2025)

This research project investigates the effects of persona in conversational agents, focusing on how defining personas through gradients—such as varying levels of empathy—shapes agent behavior and whether persona traits interact across dimensions. It explores whether adjusting traits like empathy produces meaningful shifts in how agents respond, and whether combinations of traits, such as empathy and conciseness, influence each other—for example, questioning if being more empathetic leads to greater wordiness or if conciseness and informativeness are inherently linked. The project aims to deepen understanding of how personality traits shape conversational dynamics and how agents can better align with human perceptions of persona.

TurnaboutLLM: A Detective Reasoning Benchmark from Detective Games

Yuan Yuan*, Muyu He*, Adil Shahid, Jiani Huang, Ziyang Li, Li Zhang

EMNLP 2025 Main

We collected and cleaned a detective textual game called Ace Attorney to construct a long-context reasoning benchmark (over 50k tokens for each question); our initial findings revealed that most prominent LLMs were not able to answer these questions properly (accuracy less than 0.4)

ControlText: Unlocking Controllable Fonts in Multilingual Text Rendering without Font Annotations

Bowen Jiang*, Yuan Yuan*, Xinyi Bai, Zhuoqun Hao, Alyson Yin, Yaojie Hu, Wenyu Liao, Lyle Ungar, Camillo J. Taylor

EMNLP 2025 Findings

We presented a new text rendering and editing algorithm for diffusion models that improve text generation and allow users to specifyfonts to generate in images; our approach preserves the font features by using a segmentation model and additional image filterings, alleviating the need of any ground-truth font labels; submitted to ACL.

Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale

Bowen Jiang, Zhuoqun Hao, Young-Min Cho, Bryan Li, Yuan Yuan, Sihao Chen, Lyle Ungar, Camillo J. Taylor, Dan Roth

COLM 2025 Main

We introduce PersonaMem, a personalization benchmark that features scalable and persona-oriented multi-session user-LLM conversations, as well as fine-grained in-situ user query types designed to evaluate LLM capabilities in memorizing, tracking, and incorporating users’ dynamic profiles into personalized responses across diverse scenarios.

Multilingual Retrieval Augmented Generation for Culturally-Sensitive Tasks: A Benchmark for Cross-lingual Robustness

Bryan Li, Fiona Luo, Samar Haider, Adwait Agashe, Tammy Li, Runqi Liu, Muqing Miao, Shriya Ramakrishnan, Yuan Yuan, Chris Callison-Burch

ACL 2025 Main

The paradigm of retrieval-augmented generated (RAG) helps mitigate hallucinations of large language models (LLMs). However, RAG also introduces biases contained within the retrieved documents. These biases can be amplified in scenarios which are multilingual and culturally-sensitive, such as territorial disputes. In this paper, we introduce BordIRLines, a benchmark consisting of 720 territorial dispute queries paired with 14k Wikipedia documents across 49 languages. To evaluate LLMs' cross-lingual robustness for this task, we formalize several modes for multilingual retrieval. Our experiments on several LLMs reveal that retrieving multilingual documents best improves response consistency and decreases geopolitical bias over using purely in-language documents, showing how incorporating diverse perspectives improves robustness. Also, querying in low-resource languages displays a much wider variance in the linguistic distribution of response citations. Our further experiments and case studies investigate how cross-lingual RAG is affected by aspects from IR to document contents. We release our benchmark and code to support further research towards ensuring equitable information access across languages at this https URL.

Towards Rationality in Language and Multimodal Agents: A Survey

Bowen Jiang, Yangxinyu Xie, Xiaomeng Wang, Yuan Yuan, Zhuoqun Hao, Xinyi Bai, Weijie J. Su, Camillo J. Taylor, Tanwi Mallick

NAACL 2025 main

Unlike reasoning that aims to draw conclusions from premises, rationality ensures that those conclusions are reliably consistent, have anorderability of preference, and are aligned with evidence from various sources and logical principles. This survey is the first to comprehensively explore the notion of rationality in language and multimodal agents, inspired from cognitive science.

Stand-up Comedy

First Open Mic in NYC!

Page updated

Google Sites

Report abuse