yufan_li (at) g (dot) harvard (dot) edu
Department of Statistics, Harvard University
1 Oxford St, Cambridge, MA, 02138
I obtained my Ph.D. from the Department of Statistics at Harvard University. I'm fortunate to be advised by Professors Subhabrata Sen and Pragya Sur. My PhD research focuses on (i) developing theoretical foundation and novel methodologies for high dimensional statistics, particularly for data with complex global dependencies; (ii) designing ML algorithms with provable guarantees for fundamental problems in data sciences. My work at Harvard Stats has been recognized by the department with the Arthur P. Dempster Award, presented annually by the department to 1–2 graduate students "who have made significant contributions to theoretical or foundational research in statistics."
I also interned at Google DeepMind during the summer of 2024, hosted by Ben Adlam and Alex Alemi, where I worked on transformer pretraining and scaling laws. Before my PhD, I obtained my bachelor's degree from University of Toronto and masters degree from Harvard University.
Harvard University, Department of Statistics, Cambridge, MA, Aug 2020-May 2025
Ph.D. Candidate in Statistics (thesis advisors: Subhabrata Sen , Pragya Sur)
Dissertation: Physics, Information and Inference: High-Dimensional Models under Structured Dependencies
Harvard University, SEAS, Cambridge, MA, , Aug. 2018 -May 2020
M.Eng. in Computational Sciences & Engineering (thesis advisor: Natesh Pillai)
University of Toronto, Applied Science & Engineering, Toronto, ON, , Sep.2013 -May 2018
B.A.Sc. in Engineering Science (thesis advisor: Jeffrey Rosenthal)
Student Researcher @ Google DeepMind, Cambridge, MA, May-Oct 2024
Host: Ben Adlam, Alex Alemi
Investigate a systematic bias in the standard power-law scaling relationship and explore alternative parametric forms to correct the bias; contribute to the team's scaling baseline project by implementing/tuning a duration-free learning rate schedule and measuring its compute efficiency across scales; compare transformer and n-gram model's next-token prediction performance at different token frequencies and context lengths;
Engineering: Jax/XLA/Flax, distributed training and parallelism for transformer pre-training, model tuning through careful ablations/experimentations, compute/IO/memory profiling and optimization.
Spectrum-Aware Debiasing: A Modern Inference Framework with Application to Principal Component Regression, with Pragya Sur [in submission at Annals of Statistics, Won Dempster Awared]
Investigated how to de-bias regularized estimators (e.g. LASSO, Elastic Net) using "one-step estimator". The insight lies in applying a scalar adjustment coefficient to the step size that appropriately addresses high-dimensionality and spectral properties of the design matrix;
Leveraged the adjustment insight to de-bias Principal Component Regression for high dimensional inference; the method performs well on design matrices with complex global dependence (e.g. time series, fat-tails, latent low-rank, linear networks, asymmetric), as well as on various real datasets in data science and statistics (e.g. genetics, audio & image, financial returns, socio-economics, demand-forecast indicators)
Random Linear Estimation with Rotationally-Invariant Designs: Asymptotics at High Temperature, with Zhou Fan, Subhabrata Sen & Yihong Wu [published IEEE Transactions on Information Theory, Wolf ISIT Best Student Paper Award Finalist]
Studied information-theoretic properties of Bayes-optimal estimator in high-dimensional Bayesian linear regression; verified "single-letter" formulas that characterize mutual information, MMSE under a high temperature assumption on signal-to-noise ratio;
Technically, we used vector AMP iterates to track Bayes-optimal estimator and computed moments of log partition function using large deviation analysis techniques
TAP Equations for Orthogonally Invariant Spin Glasses at High Temperature, with Zhou Fan & Subhabrata Sen [Annales de l'Institut Henri Poincaré B: Probabilités et Statistiques, to appear]
Proved TAP equations for mean-field spin glasses exhibiting global correlation in spin interaction at high temperature. TAP equations describe marginals of Gibbs measure and are fundamental to AMP algorithms and the nascent TAP variational inference methods. Our proof provides the first confirmation of TAP equations for orthogonal ensemble since Parisi-Potters' conjecture in the 90s;
Technically, we exploited connection between TAP equations and high dimensional geometry of the Gibbs measure as a "spherical band"
ROTI-GCV: Generalized Cross-Validation for Right-Rotationally Invariant Data, with Kevin Luo & Pragya Sur [published at AISTATS]
Demonstrate that the standard generalized cross-validation (GCV) may fail for high dimensional data with global correlations and heavy tails
Proposed corrected formula of GCV under a broad universality class and conduct experiments that demonstrate the accuracy of our approach in a variety of synthetic and semi-synthetic settings
Understanding Optimal Feature Transfer via a Fine-Grained Bias-Variance Analysis, with Ben Adlam & Subhabrata Sen [in submission at JMLR]
Investigate the transfer representation learning paradigm in a simple, exactly solvable model where the feature layer is pretrained on upstream data and transferred to an ensemble of downstream tasks
Study structures of optimally pretrained kernel and how they correspond to a fine-grained bias-variance tradeoff.
Optimal and Provable Calibration in High-Dimensional Binary Classification: Angular Calibration and Platt Scaling, with Pragya Sur [in submission at NeurIPS]
Propose a data-driven predictor that is rigorously proven to be calibrated across a wide range of high-dimensional binary classification scenarios. Its calibration is achieved by interpolating with a chance predictor, where the interpolation weight is determined by the angle between the estimated signal and the true signal.
Show that our calibrated predictor uniquely minimizes any Bregman divergence relative to the true label-generation probability; Establish conditions under which a classical Platt-scaled predictor converges to this Bregman-optimal calibrated solution.
"Solvable" Batched Bandits: Balance Risk and Reward in Phased Release Problem, with Iavor Bojinov & Jialiang Mao [published at NeurIPS]
Designed a batched bandit algorithm for a novel online-decision making problem: determine optimal release schedule of new product updates under a budget constraint of adverse treatment effects (i.e. not releasing bad updates to too many users);
Our approach decomposes "risk-of-ruin" (probability of budget depletion) recursively and solves for optimal choice-of-arms analytically from a sequence of simple quadratic equations. Using only sample means and variances of the online outcomes, our method bypasses challenging rare-event simulations and are highly efficient and parallelizable.
The 37th New England Statistics Symposium (NESS), 2024 May
Youth in High Dimensions at International Center for Theoretical Physics (ICTP), 2024 May
Yale Institute for Foundations of Data Science (FDS), 2024 April
37th NeurIPS New Orleans, 2023 December
Conference on Digital Experimentation (CODE) at MIT, 2023 October
Machine Learning Summer School at Princeton University, 2023 June
IEEE International Symposium on Information Theory (ISIT) Taipei, 2023 June
American Causal Inference Conference (ACIC) in Austin TX, 2023 May
Advances of Probabilistic Algorithms, 35-th New England Statistics Symposium (NESS), 2022 May
Deep Learning Theory Summer School at Princeton University, 2021 July
Reviewer, 2024-Present
Journals: IEEE Transactions on Information Theory, Bernoulli Journal, Sankhya A
Conferences: NeurIPS, Conference on Learning Theory (COLT)
Co-advise undergraduate thesis, 2024-Present
Co-advise (with Prof. Pragya Sur) college student Kevin Luo’s honors thesis “High Dimensional Linear Interpolation for Structured Data”. The work won Hoopes Prize for outstanding undergraduate research. Joint paper here.
Student Consultant, Harvard Statistics Consulting Service, Dec. 2021-Present
Weekly 2 -hr consultation sessions for researchers across disciplines (e.g., health care, chemistry, social sciences); Consult clients on ML, Bayesian analysis and high dimensionality; Followed a biomed. case seeking to recover sleep conditions of schizophrenia patients from biometric data. Proposed an Hidden Markov Model (HMM) solution to the client and provided satisfactory results.
Probability & Math. Physics Reading Group, HT Yau's group, Dec. 2021-2024
Deliver technical presentations on advanced topics: quantum ergodicity (notes), spectral graph theory (notes), spin glass Hamiltonian optimization (notes); participate in the weekly group gatherings
Teaching Fellow, Department of Statistics, Harvard University, 2021-Present
Probability II (Graduate), Inference I (Graduate), Data Science I (Undergrad/Masters), Statistics and Data Science for Networks (Undergrad); Random High dimensional Optimization (Graduate), Statistical Consulting (Graduate)