Pengtao Xie

Associate Professor, Department of Electrical and Computer Engineering

Associate Adjunct Professor, Division of Biomedical Informatics, Department of Medicine

Affiliated Faculty, Halıcıoğlu Data Science Institute in the School of Computing, Information and Data Sciences, Department of Molecular Biology in the School of Biological Sciences, Shu Chien-Gene Lay Department of Bioengineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, AI Group in the Department of Computer Science and Engineering, Center for Machine-Intelligence, Computing and Security, Institute of Engineering in Medicine, Institute for Genomic Medicine, and Jacobs Center for Health Innovation.

University of California San Diego

I obtained my PhD from the Machine Learning Department, School of Computer Science, Carnegie Mellon University. My recent research interests mainly lie in AI Agents, large language models, and foundation models, with applications to healthcare and biology. Some highlights: (1) Our AIBuildAI agent achieved first place on OpenAI MLE-Bench; (2) Our DreamPRM-1.5 and DreamPRM-1.0 methods achieve first place on the MMMU, MathVista, and RBench-V leaderboards for multi-modal reasoning; (3) our GenSeg method is selected as Nature Communications Editors’ Highlights (“the 50 best papers recently published”); (4) our Betty framework was recognized as a Notable-Top-5% paper at ICLR 2023; and (5) our DreamPRM-Code method achieves first place on the LiveCodeBench Coding Benchmark（2/1/2025-5/1/2025).

I received an NSF Career Award in 2024 and an NIH MIRA Award in 2025.

https://pengtaoxie.github.io/

p1xie@ucsd.edu Twitter

I am looking for highly-motivated PhD students, postdocs, and master students to join my group. If you plan to apply to the PhD program in the CSE department and are interested in working with me, please email me. I am also looking for research interns.

News

For latest news, please see https://pengtaoxie.github.io/
2026/4. One paper is accepted by ACL.
2026/3. Our AIBuildAI agent achieved first place on OpenAI MLE-Bench.
2026/2. Our billion-parameter single-cell foundation model scLong is published in Nature Communications.
2026/1. Our DreamPRM-Code method achieves first place on the LiveCodeBench Coding Benchmark（2/1/2025-5/1/2025).
2025/11. Serving as a Senior Area Chair for ICML 2026.
2025/10. Serving as an Associate Editor for IEEE Transactions on Pattern Analysis and Machine Intelligence.
2025/10. Our method DreamPRM-1.5 for multi-modal LLM reasoning achieves first place on the RBench-V leaderboard!
2025/10. Our BiDoRA work is featured in UCSD News.
2025/9. Serving as an Associate Editor for ACM Computing Surveys.
2025/9. Two papers are accepted by NeurIPS 2025, including DreamPRM.
2025/9. Our method DreamPRM-1.5 for multi-modal LLM reasoning achieves first place on the MMMU leaderboard!
2025/8. Our work - Generative AI enables medical image segmentation in ultra low-data regimes is selected as Nature Communications Editors’ Highlights ("the 50 best papers recently published").
2025/8. Two papers are accepted by EMNLP 2025.
2025/7. Our work - Generative AI enables medical image segmentation in ultra low-data regimes is featured in UCSD News.
2025/7. Our work - Generative AI enables medical image segmentation in ultra low-data regimes - is published in Nature Communications.
2025/7. Our work - BiDoRA: Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation is accepted by Transactions on Machine Learning Research.
2025/7. Co-organizing the NeurIPS 2025 2nd Workshop on Multi-modal Large Language Models and Foundation Models for Life Sciences.
2025/6. I am promoted to Associate Professor with tenure. Huge thanks to my students, colleagues, collaborators, and family!
2025/6. Our method DreamPRM for multi-modal LLM reasoning achieves first place on the MathVista leaderboard.
2025/6. Our work on Reweighting Pretraining Objectives for Task-Adaptive Pretraining is accepted by Transactions on Machine Learning Research.
2025/3. Our work on downstream task guided continual pretraining is accepted by Transactions on Machine Learning Research.
2025/3. Co-organizing the ICML 2025 Workshop on Large Language Models and Generative AI for Health.
2025/1. Received an NIH Maximizing Investigators’ Research Award (MIRA) for Early Stage Investigators. Thanks, NIH!
2025/1. I will teach ECE 285 Deep Generative Models in Spring 2025, covering fundamentals and state-of-the-art about large language models, diffusion models, etc.
2025/1. One paper is accepted by NAACL.
2025/1. Co-organizing the AAAI 2025 Workshop on Large Language Models and Generative AI for Health
2025/1. One paper is accepted by IEEE Transactions on AI.
2025/1. Continue to serve as Associate Editor for ACM Transactions on Computing for Healthcare.
2024/12. One paper is accepted by Transactions on Machine Learning Research.
2024/11. Serve as Area Chair for ICML 2025.
2024/9. Received an NSF Smart and Connected Health award. Thanks, NSF!
2024/9. One paper is accepted by NeurIPS.
2024/6. Two papers are accepted by JAMA Network Open and British Journal of Ophthalmology.
2024/5. Serve as Area Chair for NeurIPS 2024.
2024/5. Three papers are accepted by ICML 2024.
2024/5. Gave a talk at USC.
2024/5. Gave a talk at UIUC AI for Health Webinar.
2024/4. Received the NSF Career Award. Thanks, NSF!
2024/3. Received an NIH R21 Award. Thanks, NIH!
2024/3. Two papers are accepted by NAACL 2024.
2024/1. One paper is accepted by TACL.
2024/1. Server as Area Chair for ICML 2024.
2023/9. Two papers are accepted by NeurIPS 2023.
2023/6. Course evaluations are released. ECE269-Winter2022, ECE285-Winter2022, ECE285-Spring2023, ECE175B-Spring2023, ECE269-Winter2021, ECE285-Winter2021
2023/4. Three papers are accepted by ICML 2023.
2023/1. Two papers are accepted by ICLR 2023, including one Notable-Top-5% paper.
2023/1. I received the Best Graduate Teacher Award (presented by ECE at UCSD).
2022/4. Course evaluations are released. ECE269-Winter2022, ECE285-Winter2022, ECE269-Winter2021, ECE285-Winter2021
2020/8. My PhD thesis was selected as a finalist (top 5) for the AMIA Doctoral Dissertation Award.

PhD Students and Postdocs

Caitlin Aamodt (Schmidt AI in Science Postdoc Fellow, co-advised with Prof. Nathan Lewis)
Qi Cao
Han Guo
Niklas Klusch (Schmidt AI in Science Postdoc Fellow, co-advised with Prof. Elizabeth Villa)
Youwei Liang
Peijia Qin
Li Zhang
Ruiyi Zhang

PhD Alumni

Sai Somayajula (2025, now senior applied scientist in Generative AI at Oracle)
Ramtin Hosseini (2024, now co-founder and CEO of Xero1 Inc.)

Preprints

Ramtin Hosseini, Youwei Liang, Digvijay Singh, Hamidreza Rahmani, Sang Choe, Joan Lee, Min Xu, Eran Segal, James Zou, James Williamson, Danielle A. Grotjahn, Elizabeth Villa, Pengtao Xie. RobPicker: A Meta Learning Framework for Robust Identification of Macromolecules in Cryo-Electron Tomograms. bioRxiv 2025.09.16.676650v1
Youwei Liang, Ruiyi Zhang, Yongce Li, Mingjia Huo, Zinnia Ma, Digvijay Singh, Chengzhan Gao, Hamidreza Rahmani, Satvik Bandi, Li Zhang, Robert Weinreb, Atul Malhotra, Danielle A. Grotjahn, Linda Awdishu, Trey Ideker, Michael Gilson, Pengtao Xie. Multi-Modal Large Language Model Enables All-Purpose Prediction of Drug Mechanisms and Properties. bioRxiv 2024.09.29.615524.
Li Zhang, Han Guo, Leah Schaffer, Young Su Ko, Digvijay Singh, Hamid Rahmani, Danielle Grotjahn, Elizabeth Villa, Michael Gilson, Wei Wang, Trey Ideker, Eric Xing, Pengtao Xie. ProteinAligner: A Multi-modal Pretraining Framework for Protein Foundation Models, 2024. bioRxiv 2024.10.06.616870
Mingjia Huo, Han Guo, Xingyi Cheng, Digvijay Singh, Hamidreza Rahmani, Shen Li, Philipp Gerlof, Trey Ideker, Danielle A. Grotjahn, Elizabeth Villa, Le Song, Pengtao Xie. Multi-Modal Large Language Model Enables Protein Function Prediction, 2024. bioRxiv 2024.08.19.608729

Publications Since 2020

Google Scholar

Publications Before 2020

Congzheng Song, Shanghang Zhang, Najmeh Sadoughi, Pengtao Xie, Eric Xing. Generalized Zero-shot ICD Coding. International Joint Conference on Artificial Intelligence (IJCAI 2020).
Zeya Wang, Baoyu Jing, Yang Ni, Nanqing Dong, Pengtao Xie, Eric P Xing. Adversarial Domain Adaptation Being Aware of Class Relationships. European Conference on Artificial Intelligence (ECAI 2020).
B. Huang, K. Zhang, P. Xie, M. Gong, E. P. Xing. Specific and Shared Causal Relation Modeling and Mechanism-based Clustering. Advances in Neural Information Processing Systems (NeurIPS 2019).
K. Xu, M. Lam, J. Pang, X. Gao, C. Band, P. Mathur, F. Papay, A. K. Khanna, J. B. Cywinski, K. Maheshwari, P. Xie, E. P. Xing. Multimodal Machine Learning for Automated ICD Coding. Conference on Machine Learning for Healthcare (MLHC 2019).
Z.Wang, N.Dong, S.Rosario, M.Xu, P.Xie, and E.P.Xing. Ellipse Detection of Optic Disc-and-Cup Boundary in Fundus Image with Unsupervised Domain Adaption. The IEEE International Symposium on Biomedical Imaging (ISBI 2019).
P.Xie, W.Wu, Y.Zhu and E.P.Xing. Orthogonality-Promoting Distance Metric Learning: Convex Relaxation and Theoretical Analysis. The 35th International Conference on Machine Learning (ICML 2018) (Long Oral Presentation).
P.Xie, H.Zhang, Y.Zhu and E.P.Xing. Nonoverlap-Promoting Variable Selection. The 35th International Conference on Machine Learning (ICML 2018) (Short Oral Presentation).
P.Xie, H.Shi, M.Zhang and E.P.Xing. A Neural Architecture for Automated ICD Coding. The 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018) (Oral Presentation)
B.Jing, P.Xie and E.P.Xing. On the Automatic Generation of Medical Imaging Reports. The 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018).
P.Xie, J.Kim, Q.Ho, Y.Yu and E.P.Xing. Orpheus: Efficient Distributed Machine Learning via System and Algorithm Co-design. Symposium of Cloud Computing (SoCC 2018).
D.Sachan, P.Xie and E.P.Xing. Effective Use of Bidirectional Language Modeling for Medical Named Entity Recognition. Conference on Machine Learning for Healthcare (MLHC 2018).
X.Liu, K.Xu, P.Xie and E.P.Xing. Unsupervised Pseudo-Labeling for Extractive Summarization on Electronic Health Records. NIPS ML for Healthcare Workshop, 2018 (Spotlight Presentation).
P.Xie, R.Salakhutdinov, L.Mou and E.P.Xing. Deep Conditional Determinantal Point Process for Large-Scale Multi-Label Classification. International Conference on Computer Vision (ICCV 2017).
P.Xie, B.Poczos and E.P.Xing. Near-Orthogonality Regularization in Kernel Methods. Conference on Uncertainty in Artificial Intelligence (UAI 2017) (Plenary Presentation).
H.Zhang, Z.Zheng, S.Xu, X.Liang, W.Dai, Q.Ho, Z.Hu, J.Wei, P.Xie, and E.P.Xing. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters. 2017 USENIX Annual Technical Conference (ATC 2017) (Oral Presentation).
P.Xie, A.Singh and E.P.Xing. Uncorrelation and Evenness: A New Diversity-Promoting Regularizer. The 34th International Conference on Machine Learning (ICML 2017) (Oral Presentation).
P.Xie, Y.Deng, Y.Zhou, A.Kumar, Y.Yu, J.Zou and E.P.Xing. Learning Latent Space Models with Angular Constraints. The 34th International Conference on Machine Learning (ICML 2017) (Oral Presentation).
H.Zhou, J.Li, P.Xie and Y.Zhang. Improving the Generalization Performance of Multi-class SVM via Angular Regularization. The 26th International Joint Conference on Artificial Intelligence (IJCAI 2017).
P.Xie and E.P.Xing. A Constituent-Centric Neural Architecture for Reading Comprehension. The 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017).
Y.Zhou, K.Yuan, Y.Yu, X.Ni, P.Xie, E.P.Xing and S.Xu. Inference of multiple-wave population admixture by modeling decay of linkage disequilibrium with polynomial functions. Heredity, 2017.
E.P.Xing, Q.Ho, P.Xie and W.Dai. Strategies and Principles of Distributed Machine Learning on Big Data. Engineering, Transactions of Chinese Academy of Engineering (Engineering 2016).
P.Xie, J.Kim, Y.Zhou, Q.Ho, A.Kumar, Y.Yu and E.P.Xing. Lighter-Communication Distributed Machine Learning via Sufficient Factor Broadcasting. The 32nd Conference on Uncertainty in Artificial Intelligence (UAI 2016).
P.Xie, J.Zhu and E.P.Xing. Diversity-Promoting Bayesian Learning of Latent Variable Models. The 33rd International Conference on Machine Learning (ICML 2016) (Oral Presentation).
E.P.Xing, Q.Ho, W.Dai, J.Kim, J.Wei, S.Lee, X.Zheng, P.Xie, A.Kumar and Y.Yu. Petuum: A New Platform for Distributed Machine Learning on Big Data. IEEE Transactions on Big Data (IEEE BigData 2015).
P.Xie. Learning Compact and Effective Distance Metrics with Diversity Regularization. European Conference on Machine Learning (ECML 2015) (Oral Presentation).
P.Xie, Y.Deng and E.P.Xing. Diversifying Restricted Boltzmann Machine for Document Modeling. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2015) (Oral Presentation).
E.P.Xing, Q.Ho, W.Dai, J.Kim, J.Wei, S.Lee, X.Zheng, P.Xie, A.Kumar and Y.Yu. Petuum: A New Platform for Distributed Machine Learning on Big Data. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2015) (Oral Presentation).
P.Xie, D.Yang and E.P.Xing. Incorporating Word Correlation Knowledge into Topic Modeling. The 2015 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2015).
P.Xie, Y.Pei, Y.Xie and E.P.Xing. Mining User Interests from Personal Photos. The 29th AAAI Conference on Artificial Intelligence (AAAI 2015).
P.Xie and E.P.Xing. Integrating Image Clustering and Codebook Learning. The 29th AAAI Conference on Artificial Intelligence (AAAI 2015) (Oral Presentation).
P.Xie and E.P.Xing. Multi-Modal Distance Metric Learning. The 23rd International Joint Conference on Artificial Intelligence (IJCAI 2013) (Oral Presentation).
P.Xie and E.P.Xing. Integrating Document Clustering and Topic Modeling. Proceedings of the 29th International Conference on Uncertainty in Artificial Intelligence (UAI 2013).

Teaching

ECE285 Deep Generative Models. Winter 2021, Winter 2022, Spring 2023, Spring 2025. Course Evaluations: ECE285-Winter2022, ECE285-Winter2023, ECE285-Winter2021
ECE175B Probabilistic Reasoning and Graphical Models. Spring 2023, Spring 2024, Spring 2025. Course Evaluations: ECE175B-Spring2023
ECE269 Linear Algebra and Applications. Winter 2021, Winter 2022. Course Evaluations: ECE269-Winter2022, ECE269-Winter2021

Selected Awards and Honors

NIH MIRA Award, 2025
NSF Career Award, 2024
Best Graduate Teacher Award (presented by ECE at UCSD), 2023
ICLR Notable-Top-5% Paper, 2023
Global Top-100 Chinese Young Scholars in Artificial Intelligence (recognized by Baidu), 2022
UCSD Faculty Career Development Award, 2022
Tencent Faculty Award, 2021
Outstanding Reviewer for ICLR, 2021
Finalist (top 5) for AMIA Doctoral Dissertation Award, 2020
Amazon AWS Research Award, 2020
Tencent AI-Lab Faculty Award, 2020
Top Reviewer for ICML 2020
Innovator Award, 2018 (presented by the Pittsburgh Business Times)
Siebel Scholarship, 2014 (85 graduate students from around the world)

Past Students

Undergraduate Students

Yifeng Wang (2023 --> PhD student at CMU ECE)
Wenxiao Cai (2023 --> MS student at Stanford EE)
Zhihao Zhan (2022 --> PhD student at Mila)
Ruisi Zhang (2020 --> PhD student at UCSD ECE)
Matt Hong (2020 --> PhD student at UCSD CSE)
Jiayuan Huang (2020 --> Master student at CMU CS)
Jiaqi Zeng (2020 --> Master student at CMU CS)
Meng Zhou (2020 --> Master student at CMU CS)
Yuhong Chen (2020 --> Master student at CMU INI)
Yue Yang (2020 --> Master student at Georgia Tech CS)

Master Students

Jiachen Li (2020 --> PhD student at UCSB)
Xuehai He (2020 --> PhD student at UCSC)

Professional Activities

Associate Editor for ACM Transactions on Computing for Healthcare
Senior Area Chair for AAAI 2023
Area Chair for ICML 2021-2025, NeurIPS 2021-2025, CVPR 2021, NAACL 2021, ICCV 2021, AAAI 2021-2022, IJCAI 2021
Co-organizer for:
- NeurIPS 2020-2024 workshops on “Self-Supervised Learning – Theory and Practice”
- ICML 2021 Workshop on “Interpretable Machine Learning for Healthcare”
- AAAI 2021-2022 workshops on “Trustworthy AI for Healthcare”
- ICML 2021 Workshop on “Self-Supervised Learning for Reasoning and Perception”
- ICLR 2021 workshop “Machine Learning for Preventing and Combating Pandemics”

Panelist for NSF (7 times) and NIH (twice)
Reviewer for Nature Machine Intelligence, Nature Communications

Selected Talks

Generative AI and Foundation Models for Medical Image Segmentation in Ultra-Low Data Regimes
- May 2024, Department of Computer Science, USC
- May 2024, AI for Health Webinar, UIUC
Sample Efficient Biomedical Image Semantic Segmentation
- Jul 2023, Computational Genomics Summer Institute, UCLA
ProteinChat: Towards Enabling ChatGPT-Like Capabilities on Protein 3D Structures
- Jun 2023, Mohamed bin Zayed University of Artificial Intelligence
- Jun 2023, BioMap
Self-supervised Regularization
- Oct 2022, Workshop on Self-Supervised Learning for Signal Decoding
ML Training Strategies Inspired by Humans’ Learning Skills
- Apr 2023, TILOS Seminar
- Apr 2023, IV CaliBaja Symposium and Workshop
- Sep 2022, Workshop on Composable, Automatic, and Scalable Learning.

Page updated

Report abuse