MATH 5472. Computer-Age Statistical Inference and its applications
Synopsis
This course is designed for PhD students (year 1) in applied mathematics, statistics, and engineering who are interested in learning from data. It covers advanced topics in statistical machine learning, with emphasis on the integration of statistical models and algorithms for statistical inference. This course aims to first make connections among classical topics, and then move forward to modern topics, including statistical view of deep learning. Various applications will be discussed, such as computer vision, human genetics, and text mining.
Note: On one side, this course can be challenging for some non-math students as some homework requires mathematical derivation. On the other side, it can be challenging for some math students as it requires coding. If you are still interested in, then let's suffer to learn! Of course, students are welcome to be audience.
Lecture information
Tuesday, Thursday, 03:00PM - 04:20PM, Room 5506, 25-26 lift, main academic building, HKUST.
Introduction. [Note]
Lecture 1. James-Stein Estimator and Empirical Bayes. [Lecture note]
Ref: Stein's Unbiased Risk Estimate (SURE) [link]
Ref: Empirical Bayes and missing species (Efron's book "Large-Scale Inference. Empirical Bayes Methods for Estimation, Testing, and Prediction." Section 11.5)
Suggested Reading:
Empirical Bayes estimation of normal means, accounting for uncertainty in estimated standard errors. [link]
Newton-Stein Method: An Optimization Method for GLMs via Stein's Lemma [link]
How Biased is the Apparent Error of an Estimator Tuned by SURE? [link]
Tractable Evaluation of Stein’s Unbiased Risk Estimator with Convex Regularizers [link]
Flexible signal denoising via flexible empirical Bayes shrinkage [link]
Understanding Diffusion Models: A Unified Perspective [link] Very hot topic!!!
Empirical Bayes: Concepts and Methods [link] Very nice review!!
Empirical Bayes: Ideals and applications [talk by Prof. Efron]
Bayesian lens and Bayesian blinker [link]
Lecture 2. Linear mixed models. [Lecture note]
R package: Variance Component Model [link]
Ref: PRML, Chapters 2, 3, and 6. (Gaussian distribution, Bayesian linear model, and Gaussian process)
Suggested reading
Ridge Regularizaton: an Essential Concept in Data Science [link] Very nice review!
A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits. [link] The SOTA algorithm for linear mixed models with its application to large-scale real datasets (e.g., millions of features from millions of samples).
Bishop (2006) Chapter 3.5.2. A fixed point algorithm for evidence approximation (Empirical Bayes).
Bayesian Model Selection, the Marginal Likelihood, and Generalization. [link] This paper is quite debatable.
A flexible empirical Bayes approach to multiple linear regression and connections with penalized regression. [link]
A Study of Error Variance Estimation in Lasso Regression [link]
Bayesian Lasso [link]
Pure Fourier series animation montage [link]
Lecture 3. Explicit and inexplicit regularization in supervised learning. [Lecture note]
Ref: Additive logistic regression: a statistical view of boosting [link]
Ref: Greedy function approximation: A gradient boosting machine. [link]
Ref: Boosting as a Regularized Path to a Maximum Margin Classifier [link]
Ref: Evidence Contrary to the Statistical View of Boosting (with discussion) [link]
Ref: A General Framework for Fast Stagewise Algorithms [link]
Ref: Statistical Modeling: The Two Cultures [link]
Suggested reading: Recent papers discussing Breiman's Two Cultures [observation study]
Suggested reading: Gradient and Newton Boosting for Classification and Regression [link]
Suggested reading: Gaussian Process Boosting [link]
Suggested reading: Randomization as Regularization: A Degrees of Freedom Explanation for Random Forest Success [link]
Code examples [GradientBoostingDemo][GradientBoosting_sklearn][GradientBoosting_rpart]
Lecture 4. The Expectation-Maximization (EM) algorithm and its extension. [lecture note 1][lecture note 2]
Ref: Liu C., et al. 1998. Parameter expansion for EM acceleration the PXEM algorithm.
Ref: Lewandowski A., Liu C. and Wiel, S. 2010. Parameter Expansion and Efficient Inference.
Ref: Learning From Crowds [link]
Ref: Methods for correcting inference based on outcomes predicted by machine learning [link]
Suggested Reading:
The art of data augmentation. [link]
Using Redundant Parameterizations to Fit Hierarchical Models [link]
Majorization-minimization algorithms in signal processing, communications, and machine learning [link]
Auxiliary Deep Generative Models [link] This is related to Auxiliary variable MCMC.
Data Augmentation for Bayesian Deep Learning [link]
Lecture 5. Variational Inference. [Lecture note]
Suggested reading
Variational Inference: A Review for Statisticians. [link]
Advances in Variational Inference. [Arxiv link][PAMI version]
Lecture 6. False discovery rate. [Lecture note]
Lecture 7. Matrix factorization. [Lecture note]
Suggested reading
Principal Component Analysis. A review article on PCA, appear in Nature Reviews. [link]
Low-Rank Modeling and Its Applications in Image Analysis. [The Matlab code to produce the results presented in this paper]
Empirical Bayes Matrix Factorization [link]
Sparse Bayesian methods for low-rank matrix estimation. [link]
Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares.
Genes mirror geography within Europe [link]
Joint modelling of confounding factors and prominent genetic regulators provides increased accuracy in genetical genomics studies [link]
Lecture 8. Latent Dirichlet Allocation and PSD model. [lecture note]
Ref: Latent Dirichlet Allocation [link]
Ref: Finding scientific topics. PNAS [link] (Gibbs sampling for topic models)
Ref: Non-negative matrix factorization algorithms greatly improve topic model fits. arXiv:2105.13440.
Ref: Inference of population structure using multilocus genotype data. Genetics. 155.
Ref: Section 13.5 modeling population admixture. Computer age statistical inference by Efron and Hastie. 2016.
Lecture 9. Generative adversarial networks. [lecture note]
Lecture 10. Variational inference in deep learning. [lecture note 1][lecture note 2]
Case study [ppt]
Episode [You are very much ON TIME]
Reference books
Bishop C. (2006) Pattern Recognition and Machine Learning [link]
Efron B. and Hastie. T. (2016) Computer-Age Statistical Inference [link]
Kevin Patrick Murphy. (2022) Probabilistic Machine Learning: An Introduction [link]
Kevin Patrick Murphy. (2022) Probabilistic Machine Learning: Advanced topics [link]
John Winn, Christopher M. Bishop, Thomas Diethe, John Guiver and Yordan Zaykov. Model-based machine learning [link for early access]
Grading policy: Assignment (60%) + Project (40%)
Assignment (60%)
Assignment 1 [pdf]
Assignment 2 [pdf]
Assignment 3 [pdf]
Assignment 4 [pdf]
Project (40%)
In this project, you can choose one paper from the "Project list" (to be posted). The purpose of this project is that we are learning to critically read and discuss papers in statistics and machine learning. These papers can be new and potentially influential works, or they can be older important works that you may not have seen in other classes. Please inform the instructor once you decide to work on one paper. No more than two students can work on the same paper. The rule for you to pick your topic is "first come, first served".
Requirement: each student needs to submit a report after reading his/her chosen paper. Rough format: overview of the paper, simulations or examples to illustrate the key results of the paper (based on your own implementation), summary of the points. Your report also includes the github link of your code, such that your results can be easily reproducible. Aim for 6-10 pages. Click here for the Latex template. You can check the scribed notes of the journal club at CMU and use them as an example when you prepare your own report.
Remark: There will be a discount if you use an existing implementation to reproduce the key results. You should make an explicit statement in your report if you use the existing implementation. Due to the university regulation on academic integrity, there will be a substantial penalty if you do not make such a statement.
Deadline: Dec. 15, 2023
Please choose your project here [link]
Project list:
1. Weighted Low Rank Matrix Approximation and Acceleration.
2. Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares
3. Flexible signal denoising via flexible empirical Bayes shrinkage. Journal of Machine Learning Research 22(93): 1-28.
4. Empirical Bayes estimation of normal means, accounting for uncertainty in estimated standard errors. arXiv:1901.10679.
5. False discovery rates: a new deal. Biostatistics 18(2): 275-294.
6. Finding scientific topics. PNAS [link] (Gibbs sampling for topic models)
7. Non-negative matrix factorization algorithms greatly improve topic model fits. arXiv:2105.13440.
8. Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse [link]
9. varbvs: fast variable selection for large-scale regression. arXiv:1709.06597.
10. Empirical Bayes matrix factorization. Journal of Machine Learning Research 22(120): 1-40.
11. Variational Inference for Latent Variables and Uncertain Inputs in Gaussian Processes. Journal of Machine Learning Research [link]
12. Maximum Likelihood for Gaussian Process Classification and Generalized Linear Mixed Models under Case-Control Sampling. Journal of Machine Learning Research [link]
13. The Implicit Regularization of Stochastic Gradient Flow for Least Squares. International Conference on Machine Learning, 2020.
14. Generalizing RNA velocity to transient cell states through dynamical modelling. [link]
15. SPICEMIX enables integrative single-cell spatial modeling of cell identity [link]
16. ebnm: an R package for solving the empirical Bayes normal means problem using a variety of prior families. arXiv:2110.00152.
17. Latent Dirichlet Allocation [link]
18. Gaussian Process Boosting [link]
19. Diffusion Posterior Sampling For General Noisy Inverse Problems. [link]
20. Construction of a 3D whole organism spatial atlas by joint modelling of multiple slices with deep neural networks [link]
21. XMAP: Cross-population fine-mapping by leveraging genetic diversity and accounting for confounding bias [link]