My name is Hao Cheng . 

I'm a researcher at Microsoft Research and Affiliate Faculty at the University of Washington

Prior to this, I completed my PhD at the University of Washington working with Mari Ostendorf, and got my MSc under the supervision of Dale Schuurmans and Csaba Szepesvári at the University of Alberta. 

Email (for company related): {my_last_name}.Hao@microsoft.com 

Email (others): {my_first_name}cheng@outlook.com 


Research Interest:

In general, my research interest centers around natural language processing and machine learning


Our team Sounding Board is the 2017 Alexa Prize Winner!

(For details and media coverage, check out more on this link)    

Updates:


Professional Service

Organizing Committee

Volunteer Chairs for NAACL 2021

Program Committee & Editorial Team 

Area Chair/Meta-Reviewer: ACL (2023), EMNLP(2023, 2022), AAAI (2023),  COLING (2022)

Reviewer: 

--[Journal] Transactions of the Association for Computational Linguistics (TACL)

--[Conference] NeurIPS (2023), ACL Roling Review (2021), ACL (2017-2022), EMNLP (2019-2021), NAACL (2019, 2021), AACL (2020), COLING (2018), IJCAI (2015).


Papers [Google Scholar

[Preprint]

Automatic Hallucination Assessment for Aligned Large Language Models via Transferable Adversarial Attacks

Xiaodong Yu, Hao Cheng, Xiaodong Liu, Dan Roth, Jianfeng Gao.

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, Jianfeng Gao.

DIALGEN: Collaborative Human-LM Generated Dialogues for Improved Understanding of Human-Human Conversations

Bo-Ru Lu, Nikita Haduong, Chia-Hsuan Lee, Zeqiu Wu, Hao Cheng, Paul Koester, Jean Utke, Tao Yu, Noah A. Smith, Mari Ostendorf.

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

Baolin Peng, Michel Galley, Pengcheng He, Hao Cheng, Yujia Xie, Yu Hu, Qiuyuan Huang, Lars Liden, Zhou Yu, Weizhu Chen, Jianfeng Gao

Pre-training Transformers for Knowledge Graph Completion

Sanxing Chen, Hao Cheng, Xiaodong Liu, Jian Jiao, Yangfeng Ji, Jianfeng Gao.

Fast-ELECTRA for Efficient Pre-training

Chengyu Dong, Liyuan Liu, Hao Cheng, Jingbo Shang, Jianfeng Gao, Xiaodong Liu. 

A Survey of Knowledge-Intensive NLP with Pre-Trained Language Models

Da Yin, Li Dong, Hao Cheng, Xiaodong Liu, Kai-Wei Chang, Furu Wei, Jianfeng Gao.

Language Models as Inductive Reasoners

Zonglin Yang, Li Dong, Xinya Du, Hao Cheng, Erik Cambria, Xiaodong Liu, Jianfeng Gao, Furu Wei.


[2023]

Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models

Pan Lu, Baolin Peng, Hao Cheng, Michel Galley, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Jianfeng Gao. 

In Proc.  of the Neural Information Processing Systems (NeurIPS), 2023.

Augmenting Language Models with Long-Term Memory

Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei.

In Proc.  of the Neural Information Processing Systems (NeurIPS), 2023.

Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding

Yu Zhang*, Hao Cheng*, Zhihong Shen, Xiaodong Liu, Ye-Yi Wang, Jianfeng Gao. 

In Findings of Conf. Empirical Methods in Natural Language Processing (EMNLP-Findings), 2023.

Understand and Modularize Generator Optimization in ELECTRA-style Pretraining

Chengyu Dong, Liyuan Liu, Hao Cheng, Jingbo Shang, Jianfeng Gao, Xiaodong Liu

In Proc. International Conference on Machine Learning (ICML), 2023.

Chain-of-Skills: A Configurable Model for Open-domain Question Answering

Kaixin Ma*, Hao Cheng*, Yu Zhang, Xiaodong Liu, Eric Nyberg, Jianfeng Gao [*Equal contribution]

In Proc. Assoc. for Computational Linguistics (ACL), 2023.

Task-Aware Specialization for Efficient and Robust Dense Retrieval for Open-Domain Question Answering

Hao Cheng,  Hao Fang, Xiaodong Liu, Jianfeng Gao. 

In Proc. Assoc. for Computational Linguistics (ACL), 2023.

Fine-Tuning Large Neural Language Models for Biomedical Natural Language Processing

Robert Tinn*, Hao Cheng*, Yu Gu, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, Hoifung Poon.  [*Equal contribution]

Patterns, 2023

Optimizing Bi-Encoder for Named Entity Recognition via Contrastive Learning [Code]

Sheng Zhang, Hao Cheng, Jianfeng Gao, Hoifung Poon. 

In Proc. International Conference on Learning Representations (ICLR), 2023.

Visually-Augmented Language Modeling

Weizhi Wang, Li Dong, Hao Cheng, Haoyu Song, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei.

In Proc. International Conference on Learning Representations (ICLR), 2023.

INSCIT: Information-Seeking Conversations with Mixed-Initiative Interactions [Data]

Zeqiu Wu, Ryu Parish, Hao Cheng, Sewon Min, Prithviraj Ammanabrolu, Mari Ostendorf, Hannaneh Hajishirzi.

Transactions of the Association for Computational Linguistics (TACL), 2023.

Self-Verification Improves Few-Shot Clinical Information Extraction

Zelalem Gero, Chandan Singh, Hao Cheng, Tristan Naumann, Michel Galley, Jianfeng Gao, Hoifung Poon

ICML 3rd Workshop on Interpretable Machine Learning in Healthcare (IMLH), 2023.


[2022]

Open-domain Question Answering via Chain of Reasoning over Heterogeneous Knowledge [Code]

Kaixin Ma*, Hao Cheng*, Xiaodong Liu, Eric Nyberg, Jianfeng Gao.  [*Equal contribution]

In Findings of Conf. Empirical Methods in Natural Language Processing (EMNLP-Findings), 2022.

Knowledge-Rich Self-Supervision for Biomedical Entity Linking [Model]

Sheng Zhang*, Hao Cheng*, Shikhar Vashishth*, Cliff Wong, Jinfeng Xiao, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, Hoifung Poon.  [*Equal contribution]

In Findings of Conf. Empirical Methods in Natural Language Processing (EMNLP-Findings), 2022.

Unsupervised Learning of Hierarchical Conversation Structure [Code]

Bo-Ru Lu, Yushi Hu, Hao Cheng, Noah A Smith, Mari Ostendorf

In Findings of Conf. Empirical Methods in Natural Language Processing (EMNLP-Findings), 2022.

Open Domain Question Answering with A Unified Knowledge Interface [Code]

Kaixin Ma*, Hao Cheng*, Xiaodong Liu, Eric Nyberg, Jianfeng Gao.  [*Equal contribution]

In Proc. Assoc. for Computational Linguistics (ACL), 2022.

Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention

Yichong Xu, Chenguang Zhu, Shuohang Wang, Siqi Sun, Hao Cheng, Xiaodong Liu, Jianfeng Gao, Pengcheng He, Michael Zeng, Xuedong Huang. 

In Proc. International Joint Conference on Artificial Intelligence (IJCAI), 2022.


[2021]

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

Subhabrata Mukherjee, Xiaodong Liu, Guoqing Zheng, Saghar Hosseini, Hao Cheng, Ge Yang, Christopher Meek, Ahmed Awadallah, Jianfeng Gao.

In Proc.  of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS Datasets and Benchmarks), 2021.

Dialogue State Tracking with a Language Model using Schema-Driven Prompting [Code]

Chia-Hsuan Lee, Hao Cheng, Mari Ostendorf.

In Proc.  Conf. Empirical Methods in Natural Language Processing (EMNLP), 2021.

Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature

Yu Wang*, Jinchao Li*, Tristan Naumann*, Chenyan Xiong, Hao Cheng, Robert Tinn, Cliff Wong, Naoto Usuyama, Richard Rogahn, Zhihong Shen, Yang Qin, Eric Horvitz, Paul N. Bennett, Jianfeng Gao, and Hoifung Poon.  [*Equal contribution]

In Proc. of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD '21)

UnitedQA: A Hybrid Approach for Open Domain Question Answering [Code]

Hao Cheng*, Yelong Shen*, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao. [*Equal contribution]

In Proc. Assoc. for Computational Linguistics (ACL), 2021.

Posterior Differential Regularization with f-divergence for Improving Model Robustness [Code]

Hao Cheng, Xiaodong Liu, Lis Pereira, Yaoliang Yu, Jianfeng Gao

In Proc. Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2021.

Targeted Adversarial Training for Natural Language Understanding

Lis Pereira*, Xiaodong Liu*, Hao Cheng, Hoifung Poon, Jianfeng Gao, Ichiro Kobayashi.

In Proc. Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2021. [*Equal contribution]

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

Yu Gu*, Robert Tinn*, Hao Cheng*, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, Hoifung Poon.  2021 [*Equal contribution]

ACM Transactions on Computing for Healthcare


[2020]

Probabilistic Assumptions Matter: Improved Models for Distantly-Supervised Document-Level Question Answering [Code]

Hao Cheng, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. 

In Proc. Assoc. for Computational Linguistics (ACL), 2020

The microsoft toolkit of multi-task deep neural networks for natural language understanding

Xiaodong Liu, Yu Wang, Jianshu Ji, Hao Cheng, Xueyun Zhu, Emmanuel Awa, Pengcheng He, Weizhu Chen, Hoifung Poon, Guihong Cao, Jianfeng Gao

In Proc. Assoc. for Computational Linguistics (ACL), demo, 2020

Adversarial training for large neural language models

Xiaodong Liu, Hao Cheng, Pengcheng He, Weizhu Chen, Yu Wang, Hoifung Poon, Jianfeng Gao. 2020


[Selected Before 2020]:

A Dynamic Speaker Model for Conversational Interactions [Code]

Hao Cheng, Hao Fang, Mari Ostendorf.

In Proc. Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2019.

Sounding Board: A User-Centric and Content-Driven Social Chatbot

Hao Fang, Hao Cheng, Maarten Sap, Elizabeth Clark, Ari Holtzman, Yejin Choi, Noah A Smith, Mari Ostendorf. 

In Proc. Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), demo, 2018.

Bi-directional Attention with Agreement for Dependency Parsing [Code]

Hao Cheng, Hao Fang, Xiaodong He, Jianfeng Gao, Li Deng.

In Proc.  Conf. Empirical Methods in Natural Language Processing (EMNLP), 2016.

Scalable and Sound Low-Rank Tensor Learning [Code]

Hao Cheng, Yaoliang Yu, Xinhua Zhang, Eric Xing, Dale Schuurmans.

In Proc. Conf. Artificial Intelligence and Statistics (AISTATS), 2016.   

Open-Domain Name Error Detection using a Multi-Task RNN

Hao Cheng, Hao Fang, Mari Ostendorf. 

In Proc.  Conf. Empirical Methods in Natural Language Processing (EMNLP), 2015.


Code

    Github


Teaching @ UW

[Instructor][Grad]  E596/LING: 580 Conversational AI (course webpage) [Spring 2019]

[TA][Grad]  E596/LING 580: Conversational AI (course webpage) [Spring 2018]

[TA] [Grad]   EE511: Introduction to Statistical Learning (course webpage) [Winter 2018]

[TA] [Undergrad]  EE 235: Continuous-time Linear Systems [Autumn 2017]

[TA] [Undergrad]  EE 341: Discrete-Time Linear Systems [Spring 2016]