Bowen Jiang (Lauren)

Bowen Jiang (Lauren)
she/her

Ph.D. Candidate at University of Pennsylvania

Research Intern at Microsoft

Lauren is a Ph.D. candidate in Computer and Information Science at the University of Pennsylvania, fortunate to be advised by Prof. Camillo J. Taylor and to collaborate with Prof. Dan Roth. She is also a research intern at Microsoft. She has practical expertise in LLM post-training via reinforcement fine-tuning, personalization and user understanding, multimodality, large-scale data synthesis, and social intelligence.

She received her bachelor's degree with highest honors in Electrical and Computer Engineering at the University of Illinois Urbana-Champaign in 2021, under the invaluable guidance of Prof. Yoram Bresler and Prof. Samni Koyejo. She was also awarded the Bronze Tablet, which recognizes the top three percent of her graduating class. She was also a visiting student at the Argonne National Laboratory mentored by Dr. Tanwi Mallick.

Updates

⭐ [09-2025] I am on the job market seeking research opportunities in industry for 2026, including intern and full-time positions.
🚨 [09-2025] PersonaMem-v2: Implicit Persona is now available on HuggingFace! It is the most cutting-edge LLM-personalization benchmark that focuses on implicit user preferences in long conversations. It attracted 1000+ downloads per month!
🚀 [07-2025] We have a new journal paper accepted at Nature npj Climate Action, titled "MARSHA: Multi-Agent RAG System for Hazard Adaptation" to support LLMs for scientific research in critical domains.
👩🏻‍💻 [05-2025] I will be joining Microsoft Office of Applied Research in Redmond, WA as a research intern, working on fundamental reinforcement finetuning algorithms and collaboration environments.
🚨 [04-2025] We released a new LLM personalization benchmark named PersonaMem accepted at COLM 2025, featuring 180+ persona-oriented multi-session user-model conversations, dynamic user preference updates, and long context up to 1M tokens. We experimented GPT-4.1/4.5, o4-mini, o1, Gemini-2, DeepSeek-R1-607B, LLaMA-4, Claude-3.7, and other SOTA models.
🎤 [03-2025] I’ll be giving a talk on LLM personalization at MASC-SLL 2025, hosted at Penn State University.
🚀 [09-2024] We have a new publication at EMNLP 2024 titled "A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners". This work proposes the concept of "Token Bias" to question the generalization of reasoning in LLMs.

🙆‍♀️ [09-2024] I am going to give a talk at the Penn Wharton AI & Analytics Initiative's Research & Education Symposium.

Education

University of Pennsylvania

Ph.D. in Computer and Information Science, August 2021 - Present

M.S. in Computer and Information Science, August 2021 - December 2023

Advisor: Prof. Camillo J. Taylor - Director of the GRASP Lab; Associate Dean of School of Engineering and Applied Science; Raymond S Markowitz President’s Distinguished Professor

GPA 3.99/4.00

University of Illinois Urbana-Champaign

B.S. in Electrical Engineering with Highest Honors and Bronze Tablet award, August 2017 - May 2021

Minor in Computer Science

Minor in Mathematics

GPA 3.99/4.00, Major GPA 4.00/4.00

Columbia University in the City of New York

Summer program for high school students, July - August 2016

Digital Filmmaking - From Initial Concept to Final Edit

Work Experience

Microsoft Corporation

Research Intern, part-time, August 2025 - December 2025

Office of Applied Research, Redmond, WA (Remote)

Mentor: Dr. Sihao Chen Manager: Dr. Longqi Yang

Post-training, reinforcement learning, social intelligence, and collaborative environment.

Microsoft Corporation

Research Intern, full-time, May 2025 - August 2025

Office of Applied Research, Redmond, WA

Mentor: Dr. Sihao Chen Manager: Dr. Longqi Yang

Post-training, reinforcement learning, social intelligence, and collaborative environment.

Argonne National Laboratory

Visiting Student, part-time, September 2024 - May 2025

Mathematics and Computer Science Division, Lemont, IL (Remote)

Mentor: Dr. Tanwi Mallick

LLMs for scientific applications, multimodality, and multi-agent systems.

University of Pennsylvania

Teaching Assistant of Prof. Mark Yatskar, Prof. Osbert Bastani, Prof. Surbhi Goel, and Prof. Eric Wong

CIS 5190 Applied Machine Learning, September 2023 - May 2024

Selected Research

Please refer to my Google Scholar page for a complete list of publications.

To be released soon.

Generalized Conversational Self-Play through Reinforcement Learning towards Social Intelligence

This work explores how AI agents can learn the dynamics of real-world social interactions and group utilities, where conversations usually evolve over long horizons and involve many participants. Through scalable environment simulations that mirror real-world collaboration and competition, it introduces a generalizable reinforcement fine-tuning framework that enables agents to develop strategic planning and adaptive communication skills over multiple dialogue turns. The approach allows agents to navigate dynamic, complex interactions autonomously without human supervision.

To be released soon.

🤗 Huggingface (1000+ downloads per month) 🐰 Cite Our Work

ImplicitPersona: Towards Personalized Intelligence by Learning Implicit User Personas via Reinforcement Learning

This work is also known as PersonaMem-v2.

AI cannot always satisfy every user, but personalization offers a path towards pluralistic alignment. In real world, users rarely state their preferences explicitly to the chatbots, so PersonaMem-v2: Implicit Persona focuses on preferences revealed indirectly through long, cross-scenario conversations. For instance, a user may unintentionally mention food preference in the email content while only asking chatbot for improving the writing of the email. How well can chatbots catch such implicit personas?

@article{jiang2025know,

title={Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale},

author={Jiang, Bowen and Hao, Zhuoqun and Cho, Young-Min and Li, Bryan and Yuan, Yuan and Chen, Sihao and Ungar, Lyle and Taylor, Camillo J and Roth, Dan},

journal={arXiv preprint arXiv:2504.14225},

year={2025}

}

🚀 Project Page 📖 Paper 👩🏻‍💻 Github 🍠 RedNote

🤗 Huggingface 🐰 Cite Our Work

Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale

Bowen Jiang, Zhuoqun Hao, Young-Min Cho, Bryan Li, Yuan Yuan, Sihao Chen, Lyle Ungar, Camillo J. Taylor, Dan Roth University of Pennsylvania & Microsoft

COLM 2025 This work is also known as PersonaMem.

A short version has been accepted at the 12th Mid-Atlantic Student Colloquium on Speech, Language and Learning (MASC-SLL 2025) [Oral]

We introduce PersonaMem, a personalization benchmark that features scalable and persona-oriented multi-session user-LLM conversations, as well as fine-grained in-situ user query types designed to evaluate LLM capabilities in memorizing, tracking, and incorporating users’ dynamic profiles into personalized responses across diverse scenarios.

@article{jiang2025know,

title={Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale},

author={Jiang, Bowen and Hao, Zhuoqun and Cho, Young-Min and Li, Bryan and Yuan, Yuan and Chen, Sihao and Ungar, Lyle and Taylor, Camillo J and Roth, Dan},

journal={arXiv preprint arXiv:2504.14225},

year={2025}

}

📖 Paper 🐰 Cite Our Work

Can LLMs Grasp Implicit Cultural Values? Benchmarking LLMs' Metacognitive Cultural Intelligence with CQ-Bench

Ziyi Liu, Priyanka Dey, Jen-tse Huang, Zhenyu Zhao, Bowen Jiang, Rahul Gupta, Yang Liu, Jieyu Zhao

University of Southern California & Amazon AGI & John Hopkins University & University of Pennsylvania

Under Review

CQ-Bench puts LLMs’ cultural intelligence to the test: gauging whether they can read between the lines of global conversations. Despite near-human performance in explicit value recognition, models still stumble on subtle attitude detection. With just 500 culturally nuanced samples, even smaller models can outperform larger ones from reinforcement finetuning.

@article{liu2025can,

title={Can LLMs Grasp Implicit Cultural Values? Benchmarking LLMs' Metacognitive Cultural Intelligence with CQ-Bench},

author={Liu, Ziyi and Dey, Priyanka and Zhao, Zhenyu and Huang, Jen-tse and Gupta, Rahul and Liu, Yang and Zhao, Jieyu},

journal={arXiv preprint arXiv:2504.01127},

year={2025}

}

📖 Paper 👩‍🏫 Poster 🎬 Short Video 👩🏻‍💻 Github 🍠 RedNote 🐰 Cite Our Work

A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners

Bowen Jiang, Yangxinyu Xie, Zhuoqun Hao, Xiaomeng Wang, Tanwi Mallick, Weijie J. Su, Camillo J. Taylor, Dan Roth

University of Pennsylvania & Argonne National Laboratory

EMNLP 2024 Main

A short version has also been accepted to the NeurIPS 2024 Workshop on Statistical Foundations of LLMs and Foundation Models, EMNLP 2024 GenBench Workshop & ICML 2024 Workshop on LLMs and Cognition.

We propose a new perspective to evaluation the LLMs' logical reasoning abilities beyond accuracy benchmarks. Our findings reveal that LLMs primarily rely on token biases and superficial patterns rather than true reasoning. Using a hypothesis testing approach with statistical guarantees, we highlight the need for caution when interpreting their generalization in reasoning tasks.

@article{jiang2024peek, title={A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners}, author={Jiang, Bowen and Xie, Yangxinyu and Hao, Zhuoqun and Wang, Xiaomeng and Mallick, Tanwi and Su, Weijie J and Taylor, Camillo J and Roth, Dan}, journal={arXiv preprint arXiv:2406.11050}, year={2024}}

📖 Paper 📖 Arxiv 🐰 Cite Our Work

MARSHA: Multi-Agent RAG System for Hazard Adaptation

Yangxinyu Xie, Bowen Jiang, Tanwi Mallick, Joshua David Bergerson, John K. Hutchison, Duane R. Verner, Jordan Branham, M. Ross Alexander, Robert B. Ross, Yan Feng, Leslie-Anne Levy, Weijie Su, Camillo J. Taylor

Argonne National Laboratory & University of Pennsylvania

Nature - npj Climate Action

A short version titled "WildfireGPT: Tailored Large Language Model for Wildfire Analysis" has been accepted to the NeurIPS 2024 Workshop on Tackling Climate Change with Machine Learning & EMNLP 2024 Workshop on NLP for Positive Impact

We propose a Retrieval-Augmented Generation (RAG)-based multi-agent LLM system to support analysis and decision-making in the context of natural hazards and extreme weather events. As a proof of concept, we present WildfireGPT, a specialized system focused on wildfire scenarios. The architecture employs a user-centered, multi-agent design to deliver tailored risk insights across diverse stakeholder groups.

@article{xie2025rag,

title={A rag-based multi-agent llm system for natural hazard resilience and adaptation},

author={Xie, Yangxinyu and Jiang, Bowen and Mallick, Tanwi and Bergerson, Joshua David and Hutchison, John K and Verner, Duane R and Branham, Jordan and Alexander, M Ross and Ross, Robert B and Feng, Yan and others},

journal={arXiv preprint arXiv:2504.17200},

year={2025}

@article{xie2024wildfiregpt,

title={WildfireGPT: Tailored Large Language Model for Wildfire Analysis},

journal={arXiv preprint arXiv:2402.07877},

year={2024}

}

📖 Paper 👩🏻‍💻 Github

🤗 Huggingface 🐰 Cite Our Work

GeoGrid-Bench: Can Foundation Models Understand Multimodal Gridded Geo-Spatial Data?

Bowen Jiang, Yangxinyu Xie, Xiaomeng Wang, Jiashu He, Joshua Bergerson, John K. Hutchison, Jordan Branham, Camillo J. Taylor, Tanwi Mallick Argonne National Laboratory & University of Pennsylvania

Under Review

We present GeoGrid-Bench, a benchmark designed to evaluate the ability of foundation models to understand geo-spatial data in the grid structure. Geo-spatial datasets pose distinct challenges due to their dense numerical values, strong spatial and temporal dependencies, and unique multimodal representations including tabular data, heatmaps, and geographic visualizations. To assess how foundation models can support scientific research in this domain, GeoGrid-Bench features large-scale, real-world data covering 16 climate variables across 150 locations and extended time frames. The benchmark includes approximately 3,200 question-answer pairs, systematically generated from 8 domain expert-curated templates to reflect practical tasks encountered by human scientists.

@article{jiang2025geogrid, title={GeoGrid-Bench: Can Foundation Models Understand Multimodal Gridded Geo-Spatial Data?}, author={Jiang, Bowen and Xie, Yangxinyu and Wang, Xiaomeng and He, Jiashu and Bergerson, Joshua and Hutchison, John K and Branham, Jordan and Taylor, Camillo J and Mallick, Tanwi}, journal={arXiv preprint arXiv:2505.10714}, year={2025}}

📖 Paper 👩🏻‍💻 Github 🐰 Cite Our Work

ControlText: Unlocking Controllable Fonts in Multilingual Text Rendering without Font Annotations

Bowen Jiang, Yuan Yuan, Xinyi Bai, Zhuoqun Hao, Alyson Yin, Yaojie Hu, Wenyu Liao, Lyle Ungar, Camillo J. Taylor

University of Pennsylvania & Cornell University & University of California Irvine

EMNLP 2025 Findings

This work demonstrates that diffusion models can achieve font-controllable multilingual text

rendering using just raw images without font label annotations. By integrating a conditional diffusion model with a text segmentation model, the method captures font styles in pixel space in a self-supervised manner, allowing user-specified font customization without ground-truth labels.

@article{jiang2025controltext,

title={ControlText: Unlocking Controllable Fonts in Multilingual Text Rendering without Font Annotations},

author={Jiang, Bowen and Yuan, Yuan and Bai, Xinyi and Hao, Zhuoqun and Yin, Alyson and Hu, Yaojie and Liao, Wenyu and Ungar, Lyle and Taylor, Camillo J},

journal={arXiv preprint arXiv:2502.10999},

year={2025}

}

📖 Paper 👩🏻‍💻 Github 🐰 Cite Our Work

Towards Rationality in Language and Multimodal Agents: A Survey

Bowen Jiang, Yangxinyu Xie, Xiaomeng Wang, Yuan Yuan, Zhuoqun Hao, Xinyi Bai, Weijie J. Su, Camillo J. Taylor, Tanwi Mallick University of Pennsylvania & Cornell University & Argonne National Laboratory

NAACL 2025 Main

A short version titled "Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey" has been accepted to the ICML 2024 Workshop on LLMs and Cognition.

Unlike reasoning that aims to draw conclusions from premises, rationality ensures that those conclusions are reliably consistent, have an orderability of preference, and are aligned with evidence from various sources and logical principles. This survey is the first to comprehensively explore the notion of rationality in language and multimodal agents, analyzing how designs in existing agents and agent systems contribute to advancing certain key axioms of rationality.

@article{jiang2024towards,

title={Towards Rationality in Language and Multimodal Agents: A Survey},

author={Jiang, Bowen and Xie, Yangxinyu and Wang, Xiaomeng and Yuan, Yuan and Hao, Zhuoqun and Bai, Xinyi and Su, Weijie J and Taylor, Camillo J and Mallick, Tanwi},

journal={arXiv preprint arXiv:2406.00252},

year={2024}

}

@article{jiang2024multi,

title={Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey},

author={Jiang, Bowen and Xie, Yangxinyu and Wang, Xiaomeng and Su, Weijie J and Taylor, Camillo J and Mallick, Tanwi},

journal={arXiv preprint arXiv:2406.00252},

year={2024}

}

🚀 Project Page 📖 Paper 🐰 Cite Our Work

Vysics: Object Reconstruction Under Occlusion by Fusing Vision and Contact-Rich Physics

Bibit Bianchini, Minghan Zhu, Mengti Sun, Bowen Jiang, Camillo Jose Taylor, Michael Posa

University of Pennsylvania

RSS 2025

A short version titled " Instance-Agnostic Geometry and Contact Dynamics Learning" has been accepted to the IROS 2023 Workshop on Leveraging Models for Contact-Rich Manipulation

We introduce Vysics, a vision-and-physics framework for a robot to build an expressive geometry and dynamics model of a single rigid body, using a seconds-long RGBD video and the robot’s proprioception. It uses a vision-based tracking and reconstruction method, BundleSDF, to estimate the trajectory and the visible geometry from an RGBD video, and an odometry-based model learning method, Physics Learning Library (PLL), to infer the “physible” geometry from the trajectory through implicit contact dynamics optimization.

@article{bianchini2025vysics,

title={Vysics: Object Reconstruction Under Occlusion by Fusing Vision and Contact-Rich Physics},

author={Bianchini, Bibit and Zhu, Minghan and Sun, Mengti and Jiang, Bowen and Taylor, Camillo J and Posa, Michael},

journal={arXiv preprint arXiv:2504.18719},

year={2025}

}

@article{sun2023instance,

title={Instance-Agnostic Geometry and Contact Dynamics Learning},

author={Sun, Mengti and Jiang, Bowen and Bianchini, Bibit and Taylor, Camillo Jose and Posa, Michael},

journal={arXiv preprint arXiv:2309.05832},

year={2023}

}

📖 Paper 👩🏻‍💻 Github 🐰 Cite Our Work

Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge

Bowen Jiang, Zhijun Zhuang, Shreyas S. Shivakumar, Camillo J. Taylor

University of Pennsylvania

WACV 2025

A short version titled "Hierarchical Relationships: A New Perspective to Enhance Scene Graph Generation" has been accepted to the NeurIPS 2023 Workshop on New Frontiers in Graph Learning & Workshop on Queer in AI.

We develop plug-and-play modules that enhance state-of-the-art scene graph generation methods to new levels of performance. Our approach integrates LLMs to critique predictions and reduce common sense violations, alongside a Bayesian classification scheme that leverages a hierarchical structure in relations for improved performance

@article{jiang2023enhancing,

title={Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge},

author={Jiang, Bowen and Zhuang, Zhijun and Taylor, Camillo Jose},

journal={arXiv preprint arXiv:2311.12889},

year={2023}

}

@article{jiang2023scene,

title={Scene graph generation from hierarchical relationship reasoning},

author={Jiang, Bowen and Taylor, Camillo J},

journal={arXiv preprint arXiv:2303.06842},

year={2023}

}

📖 Paper 👩🏻‍💻 Github 🐰 Cite Our Work

Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering

Bowen Jiang, Zhijun Zhuang, Yuan Yuan, Shreyas S. Shivakumar, Dan Roth, Camillo J. Taylor

University of Pennsylvania

CVPR 2024 Workshop on Computer Vision in the Wild [Spotlight Oral] and Workshop on Multimodal Foundation Models

We explore the zero-shot capabilities of foundation models in Visual Question Answering in the open world, and proposed an adaptive multi-agent system to address their limitations in object detection and counting. Instead of fine-tuning foundation models for specific datasets, our approach uses specialized agents as tools.

@article{jiang2024multi,

title={Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering},

author={Jiang, Bowen and Zhuang, Zhijun and Shivakumar, Shreyas S and Roth, Dan and Taylor, Camillo J},

journal={arXiv preprint arXiv:2403.14783},

year={2024}

}

📖 Paper 🐰 Cite Our Work

Batch Active Learning from the Perspective of Sparse Approximation

Maohao Shen, Bowen Jiang, Jacky Yibo Zhang, Oluwasanmi Koyejo

MIT, University of Pennsylvania, Standord University

NeurIPS 2022 Workshop on human in the Loop Learning

@article{shen2022batch,

title={Batch active learning from the perspective of sparse approximation},

author={Shen, Maohao and Jiang, Bowen and Zhang, Jacky Yibo and Koyejo, Oluwasanmi},

journal={arXiv preprint arXiv:2211.00246},

year={2022}

}

Academic Service

Reviewer for the Journal of Machine Learning Research
Reviewer for the IEEE Transactions on Multimedia
Reviewer for the IEEE Sensors Journal
Reviewer for IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR) 2026
Reviewer for IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025
Reviewer for the ICLR 2025 Representational Alignment Workshop
Reviewer for the NeurIPS 2024 Workshop on Behavioral ML
Reviewer for the IEEE/CVF CVPR 2024 Workshop on Scene Graphs and Graph Representation Learning
Reviewer for the IEEE/CVF CVPR 2024 Workshop on What is Next in Multimodal Foundation Models
Reviewer for the Elsevier Journal of Tunneling and Underground Space Technology
Reviewer for the IEEE/CVF ICCV 2023 Workshop on Scene Graphs and Graph Representation Learning
IEEE ICRA 2022 student volunteer in oral sessions and workshops
Teaching Assistant of Penn CIS 5190 Applied Machine Learning, September 2023 - April 2024
Laboratory Assistant of UIUC ECE 210 Analog Signal Processing, August 2018 - December 2018