Dian Yu 于典
NLP Researcher
Tencent AI Lab
E-mail: yudiandoris (AT) gmail (DOT) com
Research Interests
My research interests include improving the reasoning abilities and alignment of large language models (LLMs) across domains (primarily via mid/post-training stages) and developing methods for their faithful and reliable evaluation of LLMs' performance. I am also interested in leveraging and constructing large-scale, high-quality synthesized or human-labeled data for scaling reinforcement learning with verifiable or non-verifiable rewards.
Selected Preprints
Expanding RL with Verifiable Rewards Across Diverse Domains. Yi Su, Dian Yu, Linfeng Song, Juntao Li, Haitao Mi, Zhaopeng Tu, Min Zhang, Dong Yu. [paper] [resource]
One Token to Fool LLM-as-a-Judge. Yulai Zhao*, Haolin Liu*, Dian Yu, S.Y. Kung, Haitao Mi, Dong Yu. [paper] [resource]
Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values. Dian Yu, Yulai Zhao, Kishan Panaganti, Linfeng Song, Haitao Mi, Dong Yu. [paper] [resource]
Scaling Synthetic Data Creation with 1,000,000,000 Personas. Tao Ge, Xin Chan, Xiaoyang Wang, Dian Yu, Haitao Mi, and Dong Yu. [paper] [resource]
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning. Zhiwei He, Tian Liang, Jiahao Xu, Qiuzhi Liu, Xingyu Chen, Yue Wang, Linfeng Song, Dian Yu, Zhenwen Liang, Wenxuan Wang, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu. [paper] [resource]
Selected Publications
Improving LLM General Preference Alignment via Optimistic Online Mirror Descent . Yuheng Zhang, Dian Yu, Tao Ge, Linfeng Song, Zhichen Zeng, Haitao Mi, Nan Jiang, Dong Yu. [paper] (NeurIPS 2025) (spotlight)
Thoughts Are All Over the Place: On the Underthinking of Long Reasoning LLMs. Yue Wang, Qiuzhi Liu, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He, Linfeng Song, Dian Yu, Juntao Li, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, and Dong Yu. [paper] (NeurIPS 2025) (spotlight)
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning. Yuheng Zhang, Dian Yu, Baolin Peng, Linfeng Song, Ye Tian, Mingyue Huo, Nan Jiang, Haitao Mi, Dong Yu. (ICLR 2025) (oral). [paper]
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search. Murong Yue, Wenlin Yao, Haitao Mi, Dian Yu, Ziyu Yao, Dong Yu. (ICLR 2025). [paper]
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs. Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, and Dong Yu. (ICML 2025). [paper]
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing. Ye Tian, Baolin Peng, Linfeng Song, Lifeng Jin, Dian Yu, Lei Han, Haitao Mi, Dong Yu. (NeurIPS 2024). [paper]
Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models. Jiaao Chen, Xiaoman Pan, Dian Yu, Kaiqiang Song, Xiaoyang Wang, Dong Yu, Jianshu Chen. (EMNLP 2024 findings). [paper]
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning. Zhihan Zhang, Tao Ge, Zhenwen Liang, Wenhao Yu, Dian Yu, Mengzhao Jia, Dong Yu, and Meng Jiang. (EMNLP 2024). [paper] [code]
MinT: Boosting Generalization in Mathematical Reasoning via Multi-View Fine-Tuning. Zhenwen Liang, Dian Yu, Xiaoman Pan, Wenlin Yao, Qingkai Zeng, Xiangliang Zhang, and Dong Yu. (LREC-COLING 2024). [paper] [code]
More Than Spoken Words: Nonverbal Message Extraction and Generation. Dian Yu, Xiaoyang Wang, Wanshun Chen, Nan Du, Longyue Wang, Haitao Mi, and Dong Yu. (EMNLP 2023).
Document-Level Machine Translation with Large Language Models. Longyue Wang, Chenyang Lyu, Tianbo Ji, Zhirui Zhang, Dian Yu, Shuming Shi, Zhaopeng Tu. (EMNLP 2023).
Knowledge-in-context: Towards knowledgeable semi-parametric language models. Xiaoman Pan, Wenlin Yao, Hongming Zhang, Dian Yu, Dong Yu, Jianshu Chen. (ICLR 2023) (spotlight).
End-to-End Chinese Speaker Identification. Dian Yu, Ben Zhou, and Dong Yu.(NAACL 2022) (oral). [paper] [code]
Improving Machine Reading Comprehension with Contextualized Commonsense Knowledge. Kai Sun*, Dian Yu*, Jianshu Chen, Dong Yu, and Claire Cardie. (ACL 2022). [paper] [code]
Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question-Answering Data. Dian Yu, Kai Sun, Dong Yu, and Claire Cardie. (EMNLP 2021 findings). [paper] [code]
CLUE: A Chinese Language Understanding Evaluation Benchmark. Liang Xu, Hai Hu, Xuanwei Zhang, Lu Li, Chenjie Cao, Yudong Li, Yechen Xu, Kai Sun, Dian Yu, Cong Yu, Yin Tian, Qianqian Dong, Weitang Liu, Bo Shi, Yiming Cui, Junyi Li, Jun Zeng, Rongzhao Wang, Weijian Xie, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Shaoweihua Liu, Zhe Zhao, Qipeng Zhao, Cong Yue, Xinrui Zhang, Zhengliang Yang, Kyle Richardson, and Zhenzhong Lan. (COLING 2020). [paper][code]
Dialogue-Based Relation Extraction. Dian Yu*, Kai Sun*, Claire Cardie, and Dong Yu. (ACL 2020). [paper] [code]
Recurrent Chunking Mechanisms for Long-Text Machine Reading Comprehension . HongyuGong, Yelong Shen, Dian Yu, Jianshu Chen, and Dong Yu. 2020. (ACL 2020). [paper] [code]
Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension. Kai Sun, Dian Yu, Dong Yu, and Claire Cardie. 2020. (TACL). [paper] [code]
MultiSumm: Towards a Unified Model for Multi-Lingual Abstractive Summarization. Yue Cao, Xiaojun Wan, Jin-ge Yao, and Dian Yu. (AAAI 2020).
Evidence Sentence Extraction for Machine Reading Comprehension. Hai Wang, Dian Yu, Kai Sun, Jianshu Chen, Dong Yu, David McAllester, and Dan Roth. 2019. (CoNLL 2019). [paper] [resource]
Improving Machine Reading Comprehension with General Reading Strategies. Kai Sun, Dian Yu, Dong Yu and Claire Cardie. 2019. (NAACL-HLT 2019) (oral). [code]
DREAM: A Challenge Dataset and Models for Dialogue-Based Reading Comprehension. Kai Sun, Dian Yu, Jianshu Chen, Dong Yu, Yejin Choi, Claire Cardie. 2019. (TACL 2019). [dataset]
Unsupervised Graph-Based Relation Extraction and Validation for Knowledge Base Population. Dian Yu. 2017. PhD Dissertation. Rensselaer Polytechnic Institute.
Open Relation Extraction and Grounding. Dian Yu, Lifu Huang, and Heng Ji. 2017. (IJCNLP 2017) (oral).
Unsupervised Person Slot Filling based on Graph Mining. Dian Yu, Heng Ji. 2016. (ACL 2016) (oral).
Modeling Truth Existence in Truth Discovery. Shi Zhi, Bo Zhao, Wenzhu Tong, Jing Gao, Dian Yu, Heng Ji and Jiawei Han. 2015. (KDD 2015).
Detecting Deceptive Groups Using Conversations and Network Analysis. Dian Yu, Yulia Tyshchuk, Heng Ji and William Wallace. 2015. (ACL-IJCNLP 2015). [games]
Why Read if You can Scan: Scoping Strategy for Biographical Fact Extraction. Dian Yu, Heng Ji, Sujian Li and Chin-Yew Lin. 2015. (NAACL-HLT 2015) (short). [triggers]
The Wisdom of Minority: Unsupervised Slot Filling Validation based on Multi-dimensional Truth-Finding. Dian Yu, Hongzhao Huang, Taylor Cassidy, Heng Ji, Chi Wang, Shi Zhi, Jiawei Han, Clare Voss and Malik Magdon-Ismail. 2014. (COLING 2014) (oral).
Resolving Entity Morphs in Censored Data. Hongzhao Huang, Zhen Wen, Dian Yu, Heng Ji, Yizhou Sun, Jiawei Han and He Li. 2013. (ACL 2013).
Professional Services
Program Committee:
ACL (2017-2021), NAACL-HLT (2016, 2018, 2019), COLING (2020), EMNLP (2018-2020), AAAI (2019, 2020) , EACL (2021) , ICASSP (2022)
Journal:
NLE (2019, 2021), JAIR (2018, 2019), TASLP (2019)
Senior Area Chair:
AACL-IJCNLP (2022): Question Answering
Junior Area Chair/Action Editor/Meta-Reviewer:
NAACL-HLT (2021): Information Extraction, EMNLP (2021): Information Extraction, ACL (2022),
ICASSP (2023, 2024), LREC-COLING (2024): Information Extraction, LREC (2026): Information Extraction
ARR (2024-2025)
Education
09/2013-09/2017 Ph.D. in Computer Science, Rensselaer Polytechnic Institute (Advisor: Prof. Heng Ji)
09/2012-07/2013 Ph.D. in Computer Science, The Graduate Center, CUNY (Advisor: Prof. Heng Ji)
09/2008-07/2012 B.Eng. in Communication Engineering, Beijing University of Posts and Telecommunications
Work Experience
Tencent AI Lab, Bellevue, WA
Senior researcher Nov. 2017 - present
Bosch Research, Palo Alto, CA
Research intern May 2015 - Aug. 2015
Mentor: Dr. Lin Zhao, Dr. Kui Xu
Knowledge Mining Group, Microsoft Research Asia, Beijing, China
Research intern Jun. 2014 - Sep. 2014
Mentor: Dr. Chin-Yew Lin
Language Computing & Web Mining Group, Peking University, Beijing, China
Undergraduate research intern Aug. 2011 - July 2012
Mentor: Prof. Xiaojun Wan