Research & MISC

Research Interest

  1. ML/DL for user behavioral data modeling

  2. ML/DL for marketing cohort targeting and product recommendation

  3. ML/DL for online experimentation based on user data

  4. knowledge graph and graph inference for SaaS solutions

  5. Auto-ML for scalable SaaS model serving

  6. ML for scalable and automatic customer success monitoring and business growth


Tutorial Materials

  1. KDD' 21 Tutorial on "Mining Heterogeneous Healthcare Data"

  2. IJCAI' 20 Tutorial on "Machine Learning for Drug Discovery"

  3. KDD' 19 Tutorial on "Data Mining for drug discovery and development"

  4. KDD' 18 Tutorial on "deep learning for healthcare"


Open-source Software

  1. [DeepPurpose] A deep learning based molecular modeling and prediction toolkit on drug-target interaction prediction, compound property prediction, protein-protein interaction prediction, and protein function prediction (using PyTorch). It allows easy usages (several lines of codes only) to enable biomedical scientists to leverage deep learning for drug discovery.

  2. [Therapeutics Data Commons (TDC)] (https://tdcommons.ai/) An open-science platform with AI/ML-ready datasets and learning tasks for therapeutics, spanning the discovery and development of safe and effective medicines. It includes 22 therapeutic tasks and 66 ML-ready benchmark datasets over 15m data points. TDC also provides an ecosystem of tools, libraries, leaderboards, and community resources, including data functions, strategies for systematic model evaluation, meaningful data splits, data processors, and molecule generation oracles. All resources are integrated and accessible via an open Python library.

  3. [PyHealth] A python machine learning library for AI in healthcare applications aiming at integrating and streamlining the development and evaluation of predictive health modeling, thus to simplify and expedite this process for health data scientists.

Patent

  1. Cao Xiao, Zach Shahn, Daby Sow, Mo Ghalwash, Sanjoy Dey, “Learning Interpretable Strategies in the Presence of Existing Domain Knowledge”, US Patent US20210202055A1, IBM, 2021

  2. Zhongshu Gu, Dimitrios Pendarakis, Ian Molloy, Heqing Huang, Tengfei Ma, Cao Xiao, "Enhancing Data Privacy in Remote Deep Learning Services", US Patent US20200082272, IBM, 2020.

  3. Lingfei Wu, Jinfeng Yi, Cao Xiao, Michael Witbrock, "Method and System for An Unsupervised Time-Series Feature Learning Using Random Features", US Patent US20180330201A1, IBM, 2017.

  4. David Freeman, Ted Hwa, and Cao Xiao, "Fake Account Identification", US Patent US10333964B1. LinkedIn Corporation, 2015.

Service

[Program Committee] NeurIPS (2021, 2020, 2019, 2018), ICLR (2021, 2020, 2019), ICML (2022, 2021, 2020),

AAAI (2021, 2020, 2019, 2018), IJCAI (2022, 2021, 2020, 2019), KDD’ 17, SDM’ 19

[Reviewer] Nature Communications, IEEE Transactions on Knowledge and Data Engineering, JAMIA, JBI, Bioinformatics, Scientific Reports, Journal of Combinatorial Optimization, IEEE Trans on Systems, Man, and Cybernetics: Systems, Transactions on the Web, IEEE Trans on Big Data, Drug Safety, IIE Trans. on Healthcare Systems Engineering, Brain Informatics, etc.