Current Position:

Senior Director of Data Science and Machine Learning



Previous Experience:

Global Head of Machine Learning, IQVIA

Research Lead, MIT-IBM AI Lab

Research Staff Member, IBM Research



Ph.D. in Machine Learning, University of Washington, Seattle

Our textbook was published!

About The Textbook:

In collaboration with Professor Jimeng Sun from UIUC, we wrote this textbook to Introduce the concepts of deep learning models in the context of healthcare to students and professionals who have interest. It took us 2 years of hard work to write it but totally worth it.

Sunstella Foundation

We founded the Sunstella Foundation, and will donate the book income to help grow students in the field of technology and engineering.


Cao (Danica) Xiao is the Senior Director and Head of Data Science and Machine Learning at Amplitude. Her team focus on developing and deploying self-serving machine learning models and products based on multi-sourced user data to solve critical business challenges regarding digital production analytics and optimization. Besides, she is a passionate machine learning researcher with over 95+ papers published in leading CS venues. She is also a technology leader with extensive experience in machine learning roadmap creation, team building and mentoring. Prior to Amplitude, she was the Global Head of Machine Learning in the Analytics Center of Excellence of IQVIA. Before that, she was a research staff member at IBM Research and research lead at MIT-IBM Watson AI Lab. She got her Ph.D. degree in machine learning from University of Washington, Seattle. Recently, she also co-authored a textbook on deep learning for healthcare and founded a non-profit organization for mentoring machine learning talents.


  1. [ICLR 22] Jan 2022, our paper on Differentiable scaffolding tree for molecular optimization is accepted by ICLR 2022.

  2. [WWW 22] Jan 2022, our paper on real-time population-level disease prediction is accepted by WWW 2022

  3. [Cell Patterns] Jan 2022, our paper on clinical trial outcome prediction was accepted by Cell Patterns.

  4. [AAAI 22] Dec 2021, 1 paper accepted by AAAI 2022, on control the overall prediction risk of classification with rejection options.

  5. [Cell Patterns] Nov 2021, our paper on "Machine Learning Applications for Therapeutic Tasks with Genomics Data" was published on Cell Patterns.

  6. [NeurIPS 21] Our paper on Therapeutics Data Commons was accepted by 2021 Conference on Neural Information Processing Systems (NeurIPS).

  7. [KDD 21] Our tutorial titled "Advances in Mining Heterogeneous Healthcare Data" is accepted by KDD 2021.

  8. [KDD 21] May 2021, 2 papers are accepted by KDD 2021. One on tensor decomposition for noisy data completion, the other on drug discovery.

  9. [ACL 21] One paper is accepted by Findings of ACL 2021 on automated ICD coding.

  10. [IJCAI 21] April 2021, 2 papers on recommendations are accepted by IJCAI 2021

  11. [WWW 21] Jan 2021, 3 papers are accepted by WWW 2021.


  1. Best paper published in 2018 in “AI in Health”. IMIA Yearbook on Medical Informatics, 2019.

  2. First runner-up for IEEE-TASE best paper of 2019, 2019

  3. Manager's Choice Award, IBM Research, 2018

  4. Winner of the 2016 Parkinson's Disease PPMI Data Challenge, Michael. J. Fox Foundation, 2016

  5. Third Place of National IIE-CIS mHealth App Competition, IISE, 2016

  6. Outstanding Female Award, Society of Women Engineers (SWE), 2015-2016

  7. GSFEI Top Scholar Award, University of Washington, Seattle, 2012-2014

  8. Spring Research Scholarship, American Statistical Association/Society for Industrial and Applied Mathematics, Chicago IL, 2016

Research Interest

  1. ML/DL for user behavioral data modeling

  2. ML/DL for marketing cohort targeting and product recommendation

  3. ML/DL for online experimentation based on user data

  4. knowledge graph and graph inference for SaaS solutions

  5. Auto-ML for scalable SaaS model serving

  6. ML for scalable and automatic customer success monitoring and business growth