Yufan Li

yufan_li (at) g (dot) harvard (dot) edu

Department of Statistics, Harvard University

1 Oxford St, Cambridge, MA, 02138

I'm a 5-th year Ph.D. student in the statistics department at Harvard University. I'm fortunate to be advised by Professors Subhabrata Sen and Pragya Sur. My PhD research focuses on (i) establishing robust theoretical foundation for high dimensional statistics particularly for data with complex global dependencies; (ii) designing ML algorithms with provable guarantees for fundamental problems in data sciences. I also interned at Google DeepMind during the summer of 2024, hosted by Ben Adlam, where I worked on transformer pretraining and scaling laws. Before my PhD, I obtained my bachelor's degree from University of Toronto and masters degree from Harvard University.

Education

Harvard University, Department of Statistics, Cambridge, MA, Aug 2020-May 2025

Ph.D. Candidate in Statistics (thesis advisors: Subhabrata Sen , Pragya Sur)

Harvard University, SEAS, Cambridge, MA, , Aug. 2018 -May 2020

M.Eng. in Computational Sciences & Engineering (thesis advisor: Natesh Pillai)

University of Toronto, Applied Science & Engineering, Toronto, ON, , Sep.2013 -May 2018

B.A.Sc. in Engineering Science (thesis advisor: Jeffrey Rosenthal)

Internship Experience

Google DeepMind, Science of Scaling @ Path to AGI, Cambridge, MA, May-Oct 2024

Investigate a systematic bias in the standard power-law scaling relationship and explore alternative parametric forms to correct the bias; contribute to the team's scaling baseline project by implementing/tuning a duration-free learning rate schedule and measuring its compute efficiency across scales; compare transformer and n-gram model's next-token prediction performance at different token frequencies and context lengths;
Engineering: Jax/XLA/Flax, distributed training and parallelism for transformer pre-training, model tuning through careful ablations/experimentations, compute/IO/memory profiling and optimization.

High Dimensional Statistics & Probability

Spectrum-Aware Debiasing: A Modern Inference Framework with Application to Principal Component Regression, with Pragya Sur [in submission at Annals of Statistics]

Investigated how to de-bias regularized estimators (e.g. LASSO, Elastic Net) using "one-step estimator". The insight lies in applying a scalar adjustment coefficient to the step size that appropriately addresses high-dimensionality and spectral properties of the design matrix;
Leveraged the adjustment insight to de-bias Principal Component Regression for high dimensional inference; the method performs well on design matrices with complex global dependence (e.g. time series, fat-tails, latent low-rank, linear networks, asymmetric), as well as on various real datasets in data science and statistics (e.g. genetics, audio & image, financial returns, socio-economics, demand-forecast indicators)

Random Linear Estimation with Rotationally-Invariant Designs: Asymptotics at High Temperature, with Zhou Fan, Subhabrata Sen & Yihong Wu [published IEEE Transactions on Information Theory]

Studied information-theoretic properties of Bayes-optimal estimator in high-dimensional Bayesian linear regression; verified "single-letter" formulas that characterize mutual information, MMSE under a high temperature assumption on signal-to-noise ratio;
Technically, we used vector AMP iterates to track Bayes-optimal estimator and computed moments of log partition function using large deviation analysis techniques

TAP Equations for Orthogonally Invariant Spin Glasses at High Temperature, with Zhou Fan & Subhabrata Sen [accepted Annales de l'Institut Henri Poincaré B: Probabilités et Statistiques]

Proved TAP equations for mean-field spin glasses exhibiting global correlation in spin interaction at high temperature. TAP equations describe marginals of Gibbs measure and are fundamental to AMP algorithms and the nascent TAP variational inference methods. Our proof provides the first confirmation of TAP equations for orthogonal ensemble since Parisi-Potters' conjecture in the 90s;
Technically, we exploited connection between TAP equations and high dimensional geometry of the Gibbs measure as a "spherical band"

Machine Learning and Methodologies

Understanding Optimal Feature Transfer via a Fine-Grained Bias-Variance Analysis, with Ben Adlam & Subhabrata Sen [in submission]

Investigate the transfer representation learning paradigm in a simple, exactly solvable model where the feature layer is pretrained on upstream data and transferred to an ensemble of downstream tasks
Study structures of optimally pretrained kernel and how they correspond to a fine-grained bias-variance tradeoff.

"Solvable" Batched Bandits: Balance Risk and Reward in Phased Release Problem, with Iavor Bojinov & Jialiang Mao [published NeurIPS 2023]

Designed a batched bandit algorithm for a novel online-decision making problem: determine optimal release schedule of new product updates under a budget constraint of adverse treatment effects (i.e. not releasing bad updates to too many users);
Our approach decomposes "risk-of-ruin" (probability of budget depletion) recursively and solves for optimal choice-of-arms analytically from a sequence of simple quadratic equations. Using only sample means and variances of the online outcomes, our method bypasses challenging rare-event simulations and are highly efficient and parallelizable.

ROTI-GCV: Generalized Cross-Validation for Right-Rotationally Invariant Data, with Kevin Luo & Pragya Sur [in submission at AISTATS]

Demonstrate that the standard generalized cross-validation (GCV) may fail for high dimensional data with global correlations and heavy tails
Proposed corrected formula of GCV under a broad universality class and conduct experiments that demonstrate the accuracy of our approach in a variety of synthetic and semi-synthetic settings

Conferences & Summer Schools

The 37th New England Statistics Symposium (NESS), 2024 May
Youth in High Dimensions at International Center for Theoretical Physics (ICTP), 2024 May
Yale Institute for Foundations of Data Science (FDS), 2024 April
37th NeurIPS New Orleans, 2023 December
Conference on Digital Experimentation (CODE) at MIT, 2023 October
Machine Learning Summer School at Princeton University, 2023 June
IEEE International Symposium on Information Theory (ISIT) Taipei, 2023 June
American Causal Inference Conference (ACIC) in Austin TX, 2023 May
Advances of Probabilistic Algorithms, 35-th New England Statistics Symposium (NESS), 2022 May
Deep Learning Theory Summer School at Princeton University, 2021 July

Consulting, Reading Group & Teaching

Statistical Consultant, Harvard Statistics Consulting Service, Dec. 2021-Present

Weekly 2 -hr consultation sessions for researchers across disciplines (e.g., health care, chemistry, social sciences); Consult clients on ML, Bayesian analysis and high dimensionality; Followed a biomed. case seeking to recove sleep conditions of schizophrenia patients from biometric data. Proposed an Hidden Markov Model (HMM) solution to the client and provided satisfactory results.

Probability & Math. Physics Reading Group, HT Yau's group, Dec. 2021-Present

Deliver technical presentations on advanced topics: quantum ergodicity (notes), spectral graph theory (notes), spin glass Hamiltonian optimization (notes); participate in the weekly group gatherings

Teaching Fellow, Department of Statistics, Harvard University, 2021-Present

Probability II (Graduate), Inference I (Graduate), Data Science I (Undergrad/Masters), Statistics and Data Science for Networks (Undergrad); Random High dimensional Optimization (Graduate), Statistical Consulting (Graduate)