Pengtao Xie
Assistant Professor, Department of Electrical and Computer Engineering
Assistant Adjunct Professor, Division of Biomedical Informatics, Department of Medicine
Affiliated Faculty, Halıcıoğlu Data Science Institute in the School of Computing, Information and Data Sciences, Shu Chien-Gene Lay Department of Bioengineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, AI Group in the Department of Computer Science and Engineering, Center for Machine-Intelligence, Computing and Security, Institute of Engineering in Medicine, and Institute for Genomic Medicine.
University of California San Diego
I obtained my PhD from the Machine Learning Department, School of Computer Science, Carnegie Mellon University. My research interests mainly lie in machine learning inspired by humans' learning skills (especially classroom learning skills), such as learning by self explanation, small-group learning, learning by teaching, etc., and their applications in Large Language Models, Foundation Models, Healthcare, and Biomedicine. Here is a summary of research outcome. I received an NSF Career Award in 2024 and an NIH MIRA Award in 2025.
p1xie@ucsd.edu
I am looking for highly-motivated PhD students, postdocs, and master students to join my group. I am also looking for research interns.
News
2025/1. Received an NIH Maximizing Investigators’ Research Award (MIRA) for Early Stage Investigators. Thanks, NIH!
2024/12. One paper is accepted by Transactions on Machine Learning Research.
2024/11. Serve as Area Chair for ICML 2025.
2024/9. Received an NSF Smart and Connected Health award. Thanks, NSF!
2024/9. One paper is accepted by NeurIPS.
2024/6. Two papers are accepted by JAMA Network Open and British Journal of Ophthalmology.
2024/5. Serve as Area Chair for NeurIPS 2024.
2024/5. Three papers are accepted by ICML 2024.
2024/5. Gave a talk at USC.
2024/5. Gave a talk at UIUC AI for Health Webinar.
2024/4. Received the NSF Career Award. Thanks, NSF!
2024/3. Received an NIH R21 Award. Thanks, NIH!
2024/3. Two papers are accepted by NAACL 2024.
2024/1. One paper is accepted by TACL.
2024/1. Server as Area Chair for ICML 2024.
2023/9. Two papers are accepted by NeurIPS 2023.
2023/6. Course evaluations are released. ECE269-Winter2022, ECE285-Winter2022, ECE285-Spring2023, ECE175B-Spring2023, ECE269-Winter2021, ECE285-Winter2021
2023/4. Three papers are accepted by ICML 2023.
2023/1. Two papers are accepted by ICLR 2023, including one Notable-Top-5% paper.
2023/1. I received the Best Graduate Teacher Award (presented by ECE at UCSD).
2022/4. Course evaluations are released. ECE269-Winter2022, ECE285-Winter2022, ECE269-Winter2021, ECE285-Winter2021
2020/8. My PhD thesis was selected as a finalist (top 5) for the AMIA Doctoral Dissertation Award.
PhD Students and Postdocs
Caitlin Aamodt (Schmidt AI in Science Postdoc Fellow, co-advised with Prof. Nathan Lewis)
Mingjia Huo (Co-advised with Prof. Shamim Nemati)
Niklas Klusch (Schmidt AI in Science Postdoc Fellow, co-advised with Prof. Elizabeth Villa)
PhD Alumni
Ramtin Hosseini (2024, now co-founder and CEO of Xero1 Inc.)
Preprints
Ding Bai, Shentong Mo, Ruiyi Zhang, Yingtao Luo, Jiahao Gao, Jeremy Parker Yang, Qiuyang Wu, Digvijay Singh, Hamidreza Rahmani, Tiffany Amariuta, Danielle Grotjahn, Sheng Zhong, Nathan Lewis, Wei Wang, Trey Ideker, Pengtao Xie*, Eric Xing*. scLong: A Billion-Parameter Foundation Model for Capturing Long-Range Gene Context in Single-Cell Transcriptomics, 2024. (*Corresponding authors) bioRxiv 2024.11.09.622759v2
Ruiyi Zhang, Sai Ashish Somayajula, Pengtao Xie. TapWeight: Reweighting Pretraining Objectives for Task-Adaptive Pretraining, 2024. arXiv:2410.10006
Bokai Hu, Sai Ashish Somayajula, Xin Pan, Zihan Huang, Pengtao Xie. Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning, 2024. arXiv:2410.11020
Peijia Qin, Ruiyi Zhang, Pengtao Xie. BiDoRA: Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation, 2024. arXiv:2410.09758
Yuchen Li, Li Zhang, Youwei Liang, Pengtao Xie. AM-SAM: Automated Prompting and Mask Calibration for Segment Anything Model, 2024. arXiv:2410.09714
Han Guo, Ramtin Hosseini, Ruiyi Zhang, Sai Ashish Somayajula, Ranak Roy Chowdhury, Rajesh K. Gupta, Pengtao Xie. Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization, 2024. arXiv:2402.18128
Li Zhang, Han Guo, Leah Schaffer, Young Su Ko, Digvijay Singh, Hamid Rahmani, Danielle Grotjahn, Elizabeth Villa, Michael Gilson, Wei Wang, Trey Ideker, Eric Xing, Pengtao Xie. ProteinAligner: A Multi-modal Pretraining Framework for Protein Foundation Models, 2024. bioRxiv 2024.10.06.616870
Li Zhang, Basu Jindal, Ahmed Alaa, Robert Weinreb, David Wilson, Eran Segal, James Zou, Pengtao Xie. Generative AI Enables Medical Image Segmentation in Ultra Low-Data Regimes, 2024. medRxiv 2024.08.23.24312461
Duy MH Nguyen, Nghiem T Diep, Trung Q Nguyen, Hoang-Bao Le, Tai Nguyen, Tien Nguyen, TrungTin Nguyen, Nhat Ho, Pengtao Xie, Roger Wattenhofer, James Zou, Daniel Sonntag, Mathias Niepert. LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model, 2024. arXiv:2410.02615
Publications Since 2020
Publications Before 2020
Congzheng Song, Shanghang Zhang, Najmeh Sadoughi, Pengtao Xie, Eric Xing. Generalized Zero-shot ICD Coding. International Joint Conference on Artificial Intelligence (IJCAI 2020).
Zeya Wang, Baoyu Jing, Yang Ni, Nanqing Dong, Pengtao Xie, Eric P Xing. Adversarial Domain Adaptation Being Aware of Class Relationships. European Conference on Artificial Intelligence (ECAI 2020).
B. Huang, K. Zhang, P. Xie, M. Gong, E. P. Xing. Specific and Shared Causal Relation Modeling and Mechanism-based Clustering. Advances in Neural Information Processing Systems (NeurIPS 2019).
K. Xu, M. Lam, J. Pang, X. Gao, C. Band, P. Mathur, F. Papay, A. K. Khanna, J. B. Cywinski, K. Maheshwari, P. Xie, E. P. Xing. Multimodal Machine Learning for Automated ICD Coding. Conference on Machine Learning for Healthcare (MLHC 2019).
Z.Wang, N.Dong, S.Rosario, M.Xu, P.Xie, and E.P.Xing. Ellipse Detection of Optic Disc-and-Cup Boundary in Fundus Image with Unsupervised Domain Adaption. The IEEE International Symposium on Biomedical Imaging (ISBI 2019).
P.Xie, W.Wu, Y.Zhu and E.P.Xing. Orthogonality-Promoting Distance Metric Learning: Convex Relaxation and Theoretical Analysis. The 35th International Conference on Machine Learning (ICML 2018) (Long Oral Presentation).
P.Xie, H.Zhang, Y.Zhu and E.P.Xing. Nonoverlap-Promoting Variable Selection. The 35th International Conference on Machine Learning (ICML 2018) (Short Oral Presentation).
P.Xie, H.Shi, M.Zhang and E.P.Xing. A Neural Architecture for Automated ICD Coding. The 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018) (Oral Presentation)
B.Jing, P.Xie and E.P.Xing. On the Automatic Generation of Medical Imaging Reports. The 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018).
P.Xie, J.Kim, Q.Ho, Y.Yu and E.P.Xing. Orpheus: Efficient Distributed Machine Learning via System and Algorithm Co-design. Symposium of Cloud Computing (SoCC 2018).
D.Sachan, P.Xie and E.P.Xing. Effective Use of Bidirectional Language Modeling for Medical Named Entity Recognition. Conference on Machine Learning for Healthcare (MLHC 2018).
X.Liu, K.Xu, P.Xie and E.P.Xing. Unsupervised Pseudo-Labeling for Extractive Summarization on Electronic Health Records. NIPS ML for Healthcare Workshop, 2018 (Spotlight Presentation).
P.Xie, R.Salakhutdinov, L.Mou and E.P.Xing. Deep Conditional Determinantal Point Process for Large-Scale Multi-Label Classification. International Conference on Computer Vision (ICCV 2017).
P.Xie, B.Poczos and E.P.Xing. Near-Orthogonality Regularization in Kernel Methods. Conference on Uncertainty in Artificial Intelligence (UAI 2017) (Plenary Presentation).
H.Zhang, Z.Zheng, S.Xu, X.Liang, W.Dai, Q.Ho, Z.Hu, J.Wei, P.Xie, and E.P.Xing. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters. 2017 USENIX Annual Technical Conference (ATC 2017) (Oral Presentation).
P.Xie, A.Singh and E.P.Xing. Uncorrelation and Evenness: A New Diversity-Promoting Regularizer. The 34th International Conference on Machine Learning (ICML 2017) (Oral Presentation).
P.Xie, Y.Deng, Y.Zhou, A.Kumar, Y.Yu, J.Zou and E.P.Xing. Learning Latent Space Models with Angular Constraints. The 34th International Conference on Machine Learning (ICML 2017) (Oral Presentation).
H.Zhou, J.Li, P.Xie and Y.Zhang. Improving the Generalization Performance of Multi-class SVM via Angular Regularization. The 26th International Joint Conference on Artificial Intelligence (IJCAI 2017).
P.Xie and E.P.Xing. A Constituent-Centric Neural Architecture for Reading Comprehension. The 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017).
Y.Zhou, K.Yuan, Y.Yu, X.Ni, P.Xie, E.P.Xing and S.Xu. Inference of multiple-wave population admixture by modeling decay of linkage disequilibrium with polynomial functions. Heredity, 2017.
E.P.Xing, Q.Ho, P.Xie and W.Dai. Strategies and Principles of Distributed Machine Learning on Big Data. Engineering, Transactions of Chinese Academy of Engineering (Engineering 2016).
P.Xie, J.Kim, Y.Zhou, Q.Ho, A.Kumar, Y.Yu and E.P.Xing. Lighter-Communication Distributed Machine Learning via Sufficient Factor Broadcasting. The 32nd Conference on Uncertainty in Artificial Intelligence (UAI 2016).
P.Xie, J.Zhu and E.P.Xing. Diversity-Promoting Bayesian Learning of Latent Variable Models. The 33rd International Conference on Machine Learning (ICML 2016) (Oral Presentation).
E.P.Xing, Q.Ho, W.Dai, J.Kim, J.Wei, S.Lee, X.Zheng, P.Xie, A.Kumar and Y.Yu. Petuum: A New Platform for Distributed Machine Learning on Big Data. IEEE Transactions on Big Data (IEEE BigData 2015).
P.Xie. Learning Compact and Effective Distance Metrics with Diversity Regularization. European Conference on Machine Learning (ECML 2015) (Oral Presentation).
P.Xie, Y.Deng and E.P.Xing. Diversifying Restricted Boltzmann Machine for Document Modeling. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2015) (Oral Presentation).
E.P.Xing, Q.Ho, W.Dai, J.Kim, J.Wei, S.Lee, X.Zheng, P.Xie, A.Kumar and Y.Yu. Petuum: A New Platform for Distributed Machine Learning on Big Data. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2015) (Oral Presentation).
P.Xie, D.Yang and E.P.Xing. Incorporating Word Correlation Knowledge into Topic Modeling. The 2015 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2015).
P.Xie, Y.Pei, Y.Xie and E.P.Xing. Mining User Interests from Personal Photos. The 29th AAAI Conference on Artificial Intelligence (AAAI 2015).
P.Xie and E.P.Xing. Integrating Image Clustering and Codebook Learning. The 29th AAAI Conference on Artificial Intelligence (AAAI 2015) (Oral Presentation).
P.Xie and E.P.Xing. Multi-Modal Distance Metric Learning. The 23rd International Joint Conference on Artificial Intelligence (IJCAI 2013) (Oral Presentation).
P.Xie and E.P.Xing. Integrating Document Clustering and Topic Modeling. Proceedings of the 29th International Conference on Uncertainty in Artificial Intelligence (UAI 2013).
Teaching
ECE285 Deep Generative Models. Course Evaluations: ECE285-Winter2022, ECE285-Winter2023, ECE285-Winter2021
ECE269 Linear Algebra and Applications. Course Evaluations: ECE269-Winter2022, ECE269-Winter2021
ECE175B Probabilistic Reasoning and Graphical Models. Course Evaluations: ECE175B-Spring2023
Selected Awards and Honors
NIH MIRA Award, 2025
NSF Career Award, 2024
Best Graduate Teacher Award (presented by ECE at UCSD), 2023
ICLR Notable-Top-5% Paper, 2023
Global Top-100 Chinese Young Scholars in Artificial Intelligence (recognized by Baidu), 2022
UCSD Faculty Career Development Award, 2022
Tencent Faculty Award, 2021
Outstanding Reviewer for ICLR, 2021
Finalist (top 5) for AMIA Doctoral Dissertation Award, 2020
Amazon AWS Research Award, 2020
Tencent AI-Lab Faculty Award, 2020
Top Reviewer for ICML 2020
Innovator Award, 2018 (presented by the Pittsburgh Business Times)
Siebel Scholarship, 2014 (85 graduate students from around the world)
Past Students
Undergraduate Students
Yifeng Wang (2023 --> PhD student at CMU ECE)
Wenxiao Cai (2023 --> MS student at Stanford EE)
Zhihao Zhan (2022 --> PhD student at Mila)
Ruisi Zhang (2020 --> PhD student at UCSD ECE)
Matt Hong (2020 --> PhD student at UCSD CSE)
Jiayuan Huang (2020 --> Master student at CMU CS)
Jiaqi Zeng (2020 --> Master student at CMU CS)
Meng Zhou (2020 --> Master student at CMU CS)
Yuhong Chen (2020 --> Master student at CMU INI)
Yue Yang (2020 --> Master student at Georgia Tech CS)
Master Students
Jiachen Li (2020 --> PhD student at UCSB)
Xuehai He (2020 --> PhD student at UCSC)
Professional Activities
Associate Editor for ACM Transactions on Computing for Healthcare
Senior Area Chair for AAAI 2023
Area Chair for ICML 2021-2024, NeurIPS 2021-2023, CVPR 2021, NAACL 2021, ICCV 2021, AAAI 2021-2022, IJCAI 2021
Co-organizer for:
NeurIPS 2020-2024 workshops on “Self-Supervised Learning – Theory and Practice”
ICML 2021 Workshop on “Interpretable Machine Learning for Healthcare”
AAAI 2021-2022 workshops on “Trustworthy AI for Healthcare”
ICML 2021 Workshop on “Self-Supervised Learning for Reasoning and Perception”
ICLR 2021 workshop “Machine Learning for Preventing and Combating Pandemics”
Panelist for NSF (7 times) and NIH (twice)
Reviewer for Nature Machine Intelligence, Nature Communications
Selected Talks
Generative AI and Foundation Models for Medical Image Segmentation in Ultra-Low Data Regimes
May 2024, Department of Computer Science, USC
May 2024, AI for Health Webinar, UIUC
Sample Efficient Biomedical Image Semantic Segmentation
Jul 2023, Computational Genomics Summer Institute, UCLA
ProteinChat: Towards Enabling ChatGPT-Like Capabilities on Protein 3D Structures
Jun 2023, Mohamed bin Zayed University of Artificial Intelligence
Jun 2023, BioMap
Self-supervised Regularization
Oct 2022, Workshop on Self-Supervised Learning for Signal Decoding
ML Training Strategies Inspired by Humans’ Learning Skills
Apr 2023, TILOS Seminar
Apr 2023, IV CaliBaja Symposium and Workshop
Sep 2022, Workshop on Composable, Automatic, and Scalable Learning.