Machine Learning in Chemistry 101
This graduate-level course gives an overview of the machine learning (ML) techniques that are useful for solving problems in Chemistry, and particularly for the computational understanding and predictions of materials and molecules at the atomistic level. In the first part of the course, after taking a quick refresher of the basic concepts in Probabilities and Statistics, students will learn about basic and advanced ML methods including supervised learning and unsupervised learning. During the second part, the connection between Chemistry and the mathematical tools of ML will be made, and the concepts on the construction of loss functions, representations, descriptors and kernels will be introduced. For the last part, experts who are actively using ML methods to solve research problems in Chemistry and Materials will be invited to give real world examples on how did ML methods transformed the way they perform research.
Regression
Linear models, linear regression
Polynomial regression
Regularization
Kernel regression
Dimensionality reduction
Curse of the dimensionality
Blessing of non-uniformity
Prerequisite in Statistics and Mathematics
Principal component analysis (PCA)
Non-linear dimensionality reduction methods
Sparsification and clustering
Single value decomposition (SVD)
CUR decomposition
Farthest Point Sampling (FPS)
Gaussian mixture model (GMM)
K-means
DBSCAN
Representing structural data
Smooth Overlap of Atomic Positions (SOAP)
Local VS. global features
Features to predictions
For the workshop, we play with a data set of small molecules selected from QM7b data set, with formation energy.