This semester we will read on the topics of differential privacy and robustness.
Time: Monday 2-3:20 pm
Location: KAP 156
Schedule
Jan 13 Organizational meeting
Jan 27 Troy Tao: Dwork and Roth, The Algorithmic Foundations of Differential Privacy, Chapter 2 and 3.1-3.3. See also the course website by Yu-Xiang Wang (UCSD).
Abstract: In this introduction to differential privacy, we will cover the following content: 1. motivations: examples of privacy violation and how differential privacy helps with the privacy concerns. 2. definition of differential privacy, intuitions and properties of the definition. 3. main techniques to make algorithms differentially private: l-1/l-2 sensitivity, Laplace/Gaussian/Exponential Mechanisms with proof ideas. 4. examples of privatization mechanisms on algorithms such as mean estimation.
Feb 3 Troy Tao: Can we spot a fake?
Abstract: Suppose we observed a data point X, which is originally sampled from a standard normal distribution R^n, but potentially later corrupted by adding a vector to it adversarially. As an observer, can we distinguish corrupted data from an intact one? Suppose an adversary can choose a vector from a set T, and scale it by a positive constant r, the present work demonstrates that: given T, there exists a threshold r(T) such that if r < r(T), the adversarially manipulated data is statistically indistinguishable. For highly symmetric sets T, this radius r(T) is approximately equal to twice the "scaled Gaussian width" of T. During this talk, we will learn the problem setup, investigate the derivation of the threshold, and further discuss the potential connection from this work to data privacy/ differential privacy.
Feb 10 Devansh Gupta: The Johnson-Lindenstrauss Transform Itself Preserves Differential Privacy
Abstract: In this talk, we will discuss the following setting: Assume that there are two datasets in the form of matrices with adjacent databases $D$ and $D'$ where $D - D'$ is a rank 1 matrix of bounded norm and assuming that each of the matrices has sufficiently large singular values. Then, roughly speaking, the JL Transform of the matrix $D$ preserves differential privacy with respect to $D'$. The talk would be about formalizing this statement and if time permits, we will also see an application of this result in privately estimating the variance of the data in a particular direction by publishing a private covariance matrix.
Feb 24 Pengtao Li: On the breakdown point of transport-based quantiles
Abstract: In this talk, I will present recent advances in transport-based quantiles with a focus on their robustness properties, quantified via breakdown point. Specifically, an unexpected relation between the transport-based quantile and Tukey depth is established. As a consequence, the transport median defined for general dimensions $d\geq2$ has breakdown point 1/2, demonstrating strong robustness in the multivariate settings.
March 3 Mahdi Salmani: Privacy Amplification by Iteration
Abstract: Iterative algorithms, including Stochastic Gradient Descent (SGD), are the most popular approaches for solving optimization problems, and this raises an important question: How can these algorithms be privatized? In this talk, we review fundamental concepts and techniques in differential privacy analysis—including privacy loss, privacy amplification by subsampling, and advanced (strong) composition—to obtain some naive privacy guarantees for Differentially-Private Stochastic Gradient Descent (DP-SGD). We then introduce the concept of the moment accountant, the inspiration for Rényi differential privacy, to derive a tighter privacy bound. By the end of this talk, we will be prepared to discuss the concept of privacy amplification by iteration in the week after.
March 10 Mahdi Salmani: Privacy Amplification by Iteration
Abstract: In our last talk, we discussed privacy analysis for DP-SGD using privacy amplification by subsampling and composition theorems, and we introduced Rényi Differential Privacy. In the next talk, we will obtain privacy guarantees for noisy SGD from a different perspective. Instead of relying on composition or subsampling, we will use the fact that the SGD update is contractive for smooth and convex functions with an appropriate learning rate.
March 24 Brian Fan: Adaptive robust confidence interval
Abstract: Classical methods for constructing confidence intervals commonly assume that all observations are drawn from the same underlying distribution, which is often not the case in practice due to errors such as measurement mistakes, dishonest assessments, etc. To address such uncertainty, Huber's contamination model introduces a framework where a small fraction of the data may come from an arbitrary distribution. In this talk, we explore the construction of confidence intervals under Huber’s model in the specific setting where observations follow a mixture of the form $(1-\epsilon)N(\theta, 1) + \epsilon Q$, with an unknown contamination proportion $\epsilon$. Our discussion is based on recent work by Luo \& Gao, in which they propose a possible construction of the confidence interval for $\theta$ with valid coverage property and reveal that any confidence interval that remains valid under unknown contamination must be significantly wider than one built with known $\epsilon$. This is the first talk in a two-part series, so we will mainly focus on the background, key intuitions, and main results, laying the foundation for understanding the authors’ optimal construction and its implications for asymptotic robustness.
March 31 Brian Fan: Adaptive robust confidence interval
Abstract: In this second talk, we continue to explore the construction of confidence intervals presented by Luo&Gao, with a focus on the length guarantees. If time permits, we will discuss the extension of their C.I. construction onto general location families.
April 7 Rundong Ding: Sharper Bounds for Chebyshev Moment Matching with Applications to Differential Privacy and Beyond
Abstract: Recovering a probability distribution from noisy moment information is a foundational problem in statistics and numerical analysis. We begin by reviewing the limitations of standard moment matching and how Chebyshev polynomials offer a better-conditioned alternative. The central result is a new bound on the Wasserstein-1 distance between distributions whose Chebyshev moments approximately match. Unlike prior work that required uniformly tight bounds on each moment, the authors derive a weighted l^2 error criterion that allows for more noise in higher-degree moments. We will discuss the proof of the main theorem, including a constructive version of Jackson's theorem and a novel global Chebyshev coefficient decay bound. These results improve the robustness and interpretability of Chebyshev moment matching and lay the groundwork for practical algorithms.
April 14 Rundong Ding: Sharper Bounds for Chebyshev Moment Matching with Applications to Differential Privacy and Beyond
Abstract: In the second part of this presentation, we will talk about the two main applications of the paper. Building upon the sharper Wasserstein-1 approximation guarantees derived from Chebyshev moment matching, we delve into two core applications. First, we present a simple yet effective algorithm for generating differentially private synthetic data, which achieves near-optimal error bounds while offering improved computational efficiency over prior methods. Second, we examine a refined algorithm for spectral density estimation in numerical linear algebra, which significantly reduces the number of matrix-vector products needed for accurate approximation. Both applications highlight how the improved moment error bounds translate into enhanced statistical and computational performance.
April 21 Brian Fan: A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification
Abstract: In statistics, constructing valid prediction intervals is often straightforward when the noise distribution is known—or at least assumed. But what if we're completely agnostic about the underlying distribution? Conformal inference is a powerful framework that enables the construction of valid prediction intervals without any distributional assumptions on the noise. In this talk, we'll explore the motivation behind conformal inference and how traditional assumptions are gradually lifted within this framework. We'll also touch on some key research directions in the field. If time permits, I’ll share a bit about my own research on multi-target conformal inference. This is the first talk of a two-part talk series, so we’ll focus only on regression. The second talk will cover classification and explore connections between conformal inference and differential privacy.