Machine Learning in Chemistry 101

This graduate-level course gives an overview of the machine learning (ML) techniques that are useful for solving problems in Chemistry, and particularly for the computational understanding and predictions of materials and molecules at the atomistic level. In the first part of the course, after taking a quick refresher of the basic concepts in Probabilities and Statistics, students will learn about basic and advanced ML methods including supervised learning and unsupervised learning. During the second part, the connection between Chemistry and the mathematical tools of ML will be made, and the concepts on the construction of loss functions, representations, descriptors and kernels will be introduced. For the last part, experts who are actively using ML methods to solve research problems in Chemistry and Materials will be invited to give real world examples on how did ML methods transformed the way they perform research.

Regression

  • Linear models, linear regression

  • Polynomial regression

  • Regularization

  • Kernel regression


Dimensionality reduction

  • Curse of the dimensionality

  • Blessing of non-uniformity

  • Prerequisite in Statistics and Mathematics

  • Principal component analysis (PCA)

  • Non-linear dimensionality reduction methods


Sparsification and clustering

  • Single value decomposition (SVD)

  • CUR decomposition

  • Farthest Point Sampling (FPS)


  • Gaussian mixture model (GMM)

  • K-means

  • DBSCAN


Representing structural data

  • Smooth Overlap of Atomic Positions (SOAP)

  • Local VS. global features

  • Features to predictions


For the workshop, we play with a data set of small molecules selected from QM7b data set, with formation energy.