Synopsis
This course is designed for PhD students (year 1) in applied mathematics, statistics, and engineering who are interested in learning from data. It covers advanced topics in statistical machine learning, with emphasis on the integration of statistical models and algorithms for statistical inference. This course aims to first make connections among classical topics, and then move forward to modern topics, including statistical view of deep learning. Various applications will be discussed, such as computer vision, human genetics, and text mining.
Note: On one side, this course can be challenging for some non-math students as some homework requires mathematical derivation. On the other side, it can be challenging for some math students as it requires coding. If you are still interested in, then let's suffer to learn! Of course, students are welcome to be audience.
Lecture information
Fall, 2024, Tuesday, Thursday, 01:30PM - 02:50PM, Rm 5620, Lift 31-32, main academic building, HKUST.
Introduction. [Note]
Lecture 1. James-Stein Estimator and Empirical Bayes. [Lecture note]
Ref: Stein's Unbiased Risk Estimate (SURE) [link]
Ref: Empirical Bayes and missing species (Efron's book "Large-Scale Inference. Empirical Bayes Methods for Estimation, Testing, and Prediction." Section 11.5)
Suggested Reading:
Empirical Bayes estimation of normal means, accounting for uncertainty in estimated standard errors. [link]
Newton-Stein Method: An Optimization Method for GLMs via Stein's Lemma [link]
How Biased is the Apparent Error of an Estimator Tuned by SURE? [link]
Confidence Intervals for Nonparametric Empirical Bayes Analysis [link]
Bayesian Learning via Stochastic Gradient Langevin Dynamics [link]
Tractable Evaluation of Stein’s Unbiased Risk Estimator with Convex Regularizers [link]
Understanding Diffusion Models: A Unified Perspective [link] Very hot topic!!!
Diffusion Posterior Sampling for General Noisy Inverse Problems (Tweedies' formula in AI) [link]
Empirical Bayes: Concepts and Methods [link] Very nice review!!
Empirical Bayes: Ideas and Applications [link] Very nice talk!
Bayesian lens and Bayesian blinker [link]
The uneasy relationship between deep learning and (classical) statistics [link] Your role: Free style reading but Critical thinking!
Lecture 2. Linear mixed models. [Lecture note]
R package: Variance Component Model [link]
Ref: PRML, Chapters 2, 3, and 6. (Gaussian distribution, Bayesian linear model, and Gaussian process)
Suggested reading
Ridge Regularizaton: an Essential Concept in Data Science [link] Very nice review!
A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits. [link] The SOTA algorithm for linear mixed models with its application to large-scale real datasets (e.g., millions of features from millions of samples).
Bishop (2006) Chapter 3.5.2. A fixed point algorithm for evidence approximation (Empirical Bayes).
Bayesian Model Selection, the Marginal Likelihood, and Generalization. [link] This paper is quite debatable.
A flexible empirical Bayes approach to multiple linear regression and connections with penalized regression. [link]
Bayesian Lasso [link]
Lecture 3. Explicit and implicit regularization in supervised learning. [Lecture note]
Ref: Additive logistic regression: a statistical view of boosting [link]
Ref: Greedy function approximation: A gradient boosting machine. [link]
Ref: Boosting as a Regularized Path to a Maximum Margin Classifier [link]
Ref: Evidence Contrary to the Statistical View of Boosting (with discussion) [link]
Ref: A General Framework for Fast Stagewise Algorithms [link]
Ref: Statistical Modeling: The Two Cultures [link]
Suggested reading: Recent papers discussing Breiman's Two Cultures [observation study]
Suggested reading: Gradient and Newton Boosting for Classification and Regression [link]
Suggested reading: Gaussian Process Boosting [link]
Suggested reading: Randomization as Regularization: A Degrees of Freedom Explanation for Random Forest Success [link]
Suggested reading: Why do tree-based models still outperform deep learning on tabular data? [link]
Suggested reading: Tabular Data: Deep Learning is Not All You Need [link]
Interview with Jerome H. Friedman about Gradient Boosting [link] Very interesting interview with many stories.
Code examples [GradientBoostingDemo][GradientBoosting_sklearn][GradientBoosting_rpart]
Lecture 4. The Expectation-Maximization (EM) algorithm and its extension. [Lecturenote1][lecturenote2]
Ref: Liu C., et al. 1998. Parameter expansion for EM acceleration the PXEM algorithm.
Ref: Lewandowski A., Liu C. and Wiel, S. 2010. Parameter Expansion and Efficient Inference.
Ref: Learning From Crowds [link]
Ref: Methods for correcting inference based on outcomes predicted by machine learning [link]
Ref: Prediction-powered inference, published in Science [link]
Ref: Cross-prediction-powered inference [link]
Suggested reading: Weakly supervised clustering: Learning fine-grained signals from coarse labels [link]
Suggested reading: Majorization-minimization algorithms in signal processing, communications, and machine learning [link]
Suggested reading: The art of data augmentation. [link]
Lecture 5. Variational Inference. [lecturenote][lecturenote2]
Suggested reading
Variational Inference: A Review for Statisticians. [link]
Advances in Variational Inference. [Arxiv link][PAMI version]
Covariance, robustness and Variational Bayes. [link]
Lecture 6. False discovery rate. [Lecturenote]
Lecture 7. Matrix factorization. [lecturenote]
Suggested reading
Principal Component Analysis. A review article on PCA, appear in Nature Reviews. [link]
Low-Rank Modeling and Its Applications in Image Analysis. [The Matlab code to produce the results presented in this paper]
Empirical Bayes Matrix Factorization [link]
Sparse Bayesian methods for low-rank matrix estimation. [link]
Genes mirror geography within Europe [link]
Joint modelling of confounding factors and prominent genetic regulators provides increased accuracy in genetical genomics studies [link]
Lecture 8. Latent Dirichlet Allocation and PSD model.
Ref: Latent Dirichlet Allocation [link]
Ref: Finding scientific topics. PNAS [link] (Gibbs sampling for topic models)
Ref: Non-negative matrix factorization algorithms greatly improve topic model fits. arXiv:2105.13440.
Ref: Inference of population structure using multilocus genotype data. Genetics. 155.
Ref: Section 13.5 modeling population admixture. Computer age statistical inference by Efron and Hastie. 2016.
Lecture 9. Generative adversarial networks. [Lecturenote]
Lecture 10. Variational inference in deep learning. [lecturenote 1][lecture note 2]
Ref: Understanding Diffusion Models: A Unified Perspective [Paper] Very hot topic!!!
Suggested reading:
Case study: Probabilistic models + deep learning methods [ppt]
Episode: Ten Statistical Ideas that Changed the World [link]
Reference books
Bishop C. (2006) Pattern Recognition and Machine Learning [link]
Hastie, Tibshirani, Friedman, Elements of statistical learning. [link]
Efron B. and Hastie. T. (2016) Computer-Age Statistical Inference [link]
Kevin Patrick Murphy. (2022) Probabilistic Machine Learning: An Introduction [link]
Kevin Patrick Murphy. (2022) Probabilistic Machine Learning: Advanced topics [link]
John Winn, Christopher M. Bishop, Thomas Diethe, John Guiver and Yordan Zaykov. Model-based machine learning [link for early access]
Simon J.D. Prince (2023) Understand deep learning. [link]
Bishop C., Bioshop H.(2024) Deep learning: foundation and concepts. [link]
Grading policy: Assignment (60%) + Project (40%)
Assignment (60%): posted on Canvas
Assignment 1 [pdf]
Assignment 2 [pdf]
Assignment 3 [pdf]
Assignment 4 [pdf]
Project (40%)
In this project, you can choose one paper from the "Project list". The purpose of this project is that we are learning to critically read and discuss papers in statistics and machine learning. These papers can be new and potentially influential works, or they can be older important works that you may not have seen in other classes. Please inform the instructor once you decide to work on one paper. No more than two students can work on the same paper. The rule for you to pick your topic is "first come, first served". You can also propose a high-quality paper to instructor by Nov. 10, 2024, but you need to make a 5-min presentation in class to justify your choice (e.g. it is related to this course; it is an interesting topic with some challenging issues). If your proposal is approved, then you can work on it.
Requirement: each student needs to submit a report after reading his/her chosen paper.
Rough format:
Overview of the paper (why is it interesting? what are the major challenges to solve? how does the proposed method address the challenges?)
Simulations or examples to illustrate the key results of the paper (based on your own implementation). Your report also includes the github link of your code, such that your results can be easily reproducible.
Remark: There will be a discount if you use an existing implementation to reproduce the key results. You should make an explicit statement in your report if you use the existing implementation. Due to the university regulation on academic integrity, there will be a substantial penalty if you do not make such a statement.
Deadline: Dec. 15, 2024, 11:59 pm (HK time), submission through Canvas
Please choose your project here [link]
Statistics and Machine learning
A Statistical View of Column Subset Selection [link]
A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits [link] (MoM + stochastic approximation)
A simple new approach to variable selection in regression, with application to genetic fine mapping [link]
Flexible signal denoising via flexible empirical Bayes shrinkage [link]
False discovery rates: a new deal. [link]
Measuring missing heritability: Inferring the contribution of common variants [link]
Statistical analysis of spatial expression patterns for spatially resolved transcriptomic studies [link]
Diffusion Posterior Sampling for General Noisy Inverse Problems [link]
Gaussian Process Boosting [link]
Hierarchical Multi-Label Classification Networks [link]
Non-negative matrix factorization algorithms greatly improve topic model fits [link]
A Fast Coordinate Descent Method for High-Dimensional Non-Negative Least Squares using a Unified Sparse Regression [link]
Fitting Multilevel Factor Models [link]
AI for Science
Construction of a 3D whole organism spatial atlas by joint modelling of multiple slices with deep neural networks [link] (Poisson model + GAT)
Generalizing RNA velocity to transient cell states through dynamical modelling. [link] This is an example of EM + ODE
Integrating spatial and single-cell transcriptomics data using deep generative models with SpatialScope [link]
Interpretable spatially aware dimension reduction of spatial transcriptomics with STAMP [link] (AI + topic model)
Dependency-aware deep generative models for multitasking analysis of spatial omics data [link] (Spatial VAE)
Deep generative modeling for single-cell transcriptomics [link] ( a representative work of generative probabilistic model)
Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models [link]
To be added
Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares [link]
Variational Inference for Latent Variables and Uncertain Inputs in Gaussian Processes. [link]
To be posted.