Past Talks

Past Seminar Presentations

Thursday, May 7, 2026 [Recording]
- Speaker: Taehyun Kim (Columbia University)
- Title: Empirical Bayes Estimation and Inference via Smooth Nonparametric Maximum Likelihood
- Abstract: The empirical Bayes g-modeling approach via the nonparametric maximum likelihood estimator (NPMLE) is widely used for large-scale estimation and inference in the normal means problem, yet theoretical guarantees for uncertainty quantification remain scarce. A key obstacle is that the NPMLE of the mixing distribution is necessarily discrete, which yields discrete posterior credible sets and a deconvolution rate that is logarithmic. We address both limitations by studying a hierarchical Gaussian smoothing layer that restricts the mixing distribution to a Gaussian location mixture. The resulting smooth NPMLE is computed by solving a convex optimization problem and inherits the near-parametric denoising performance of the classical NPMLE. For deconvolution it achieves a polynomial rate of convergence which we show is asymptotically minimax over the corresponding class. The estimated smooth posteriors converge to the true posteriors at the same polynomial rate in weighted total variation distance. When the model is misspecified, the smooth NPMLE converges to the Kullback-Leibler projection of the true marginal density onto the model class at a nearly parametric rate, and the polynomial deconvolution and posterior convergence rates carry over to this pseudo-true target. Building on this smooth posterior, we characterize optimal marginal coverage sets: the shortest set-valued rules achieving a prescribed marginal coverage probability. Plug-in empirical Bayes marginal coverage sets based on the smooth NPMLE achieve asymptotically exact coverage at a polynomial rate and converge to the oracle optimal set in expected length. All results extend to heteroscedastic Gaussian observations. We also study identifiability of the proposed model and show that the largest Gaussian component of the prior is identifiable, and provide a consistent estimator and a finite-sample upper confidence bound for it.
- Discussant: Kevin Chen (Stanford University)
- Links: [Relevant papers: paper #1]

Thursday, April 30, 2026 [Recording]
- Speaker: Zhexiao Lin (University of California, Berkeley)
- Title: Introducing the b-value: combining unbiased and biased estimators from a sensitivity analysis perspective
- Abstract: In empirical research, when we have multiple estimators for the same parameter of interest, a central question arises: how do we combine unbiased but less precise estimators with biased but more precise ones to improve the inference? Under this setting, the point estimation problem has attracted considerable attention. In this paper, we focus on a less studied inference question: how can we conduct valid statistical inference in such settings with unknown bias? We propose a strategy to combine unbiased and biased estimators from a sensitivity analysis perspective. We derive a sequence of confidence intervals indexed by the magnitude of the bias, which enable researchers to assess how conclusions vary with the bias levels. Importantly, we introduce the notion of the b-value, a critical value of the unknown maximum relative bias at which combining estimators does not yield a significant result. We apply this strategy to three canonical combined estimators: the precision-weighted estimator, the pretest estimator, and the soft-thresholding estimator. For each estimator, we characterize the sequence of confidence intervals and determine the bias threshold at which the conclusion changes. Based on the theory, we recommend reporting the b-value based on the soft-thresholding estimator and its associated confidence intervals, which are robust to unknown bias and achieve the lowest worst-case risk among the alternatives.
- Discussant: Timothy Armstrong (University of Southern California)
- Links: [Relevant papers: paper #1]

Thursday, April 16, 2026 [Recording]
- Speaker: Lihua Lei (Stanford University)
- Title: Compound Selection Decisions: An Almost SURE Approach
- Abstract: This paper proposes methods for producing compound selection decisions in a Gaussian sequence model. Given unknown, fixed parameters μ_{1:n} and known σ_{1:n} with observations Y_i∼𝖭(μ_i,σ_i^2), the decision maker would like to select a subset of indices S so as to maximize utility (1/n)∑_{i∈S}(μ_i−K_i), for known costs K_i. Inspired by Stein's unbiased risk estimate (SURE), we introduce an almost unbiased estimator, called ASSURE, for the expected utility of a proposed decision rule. ASSURE allows a user to choose a welfare-maximizing rule from a pre-specified class by optimizing the estimated welfare, thereby producing selection decisions that borrow strength across noisy estimates. We show that ASSURE produces decision rules that are asymptotically no worse than the optimal but infeasible decision rule in the pre-specified class. We apply ASSURE to the selection of Census tracts for economic opportunity, the identification of discriminating firms, and the analysis of p-value decision procedures in A/B testing.
- Discussant: Toru Kitagawa (Brown University)
- Links: [Relevant papers: paper #1]

Thursday, April 23, 2026 [Recording]
- Speaker: Drew Nguyen (University of California, Berkeley)
- Title: Controlling the false discovery rate under a non-parametric graphical dependence model
- Abstract: We propose sufficient conditions and computationally efficient procedures for false discovery rate control in multiple testing when the p-values are related by a known \emph{dependency graph} -- meaning that we assume independence of p-values that are not within each other's neighborhoods, but otherwise leave the dependence unspecified. Our methods' rejection sets coincide with that of the Benjamini--Hochberg (BH) procedure whenever there are no edges between BH rejections, and we find in simulations and a genomics data example that their power approaches that of the BH procedure when there are few such edges, as is commonly the case. Because our methods ignore all hypotheses not in the BH rejection set, they are computationally efficient whenever that set is small. Our fastest method, the IndBH procedure, typically finishes within seconds even in simulations with up to one million hypotheses.
- Discussant: Mengqi Lin (University of Michigan)
- Links: [Relevant papers: paper #1]

Thursday, April 9, 2026 [Recording]
- Speaker: Aureo de Paula (University College London, CeMMAP and Institute for Fiscal Studies)
- Title: Prediction Sets and Conformal Inference with Interval Outcomes
- Abstract: Given data on a scalar random variable 𝑌, a prediction set for 𝑌 with miscoverage level 𝛼 is a set of values for 𝑌 that contains a randomly drawn 𝑌 with probability 1 − 𝛼, where 𝛼 ∈ (0, 1). Among all prediction sets that satisfy this coverage property, the oracle prediction set is the one with the smallest volume. This paper provides estimation methods of such prediction sets given observed conditioning covariates when 𝑌 is censored or measured in intervals. We first characterise the oracle prediction set under interval censoring and develop consistent estimators for the oracle prediction intervals and prediction sets consisting of multiple disjoint intervals. We use conformal inference to construct a prediction set that achieves finite-sample validity under censoring and maintains consistency as sample size increases, using a conformity score function designed for interval data. The procedure accommodates the prediction uncertainty that is irreducible (due to the stochastic nature of outcomes), the modelling uncertainty due to partial identification and also sampling uncertainty that gets reduced as samples get larger. We conduct a set of Monte Carlo simulations and an application to data from the Current Population Survey. The results highlight the robustness and efficiency of the proposed methods.
- Discussant: Lihua Lei (Stanford University)
- Links: [Relevant papers: paper #1]

Thursday, April 2, 2026 [Recording]
- Speaker: Junu Lee (University of Pennsylvania)
- Title: Power of masking methods for adaptive testing in a multivariate normal means problem
- Abstract: Many large-scale testing procedures learn signal structure from the data to boost power. Direct data reuse can inflate Type-I error (“double dipping”), so a common remedy is masking: withholding some information during learning and using it for testing. Sample splitting masks by withholding observations for testing, while null augmentation (e.g., knockoffs or full-conformal outlier detection) masks by appending null samples or variables and withholding their identities until testing. In many settings, little is known about how the power of masking methods compares across mechanisms, across tuning choices, or against more data-efficient non-masking alternatives. We study these questions in a stylized two-groups multivariate normal means model with an unknown signal direction learned from the data. Within this testbed, we develop a transparent, unified set of asymptotic power expressions for three parallel methods differing in masking choices: a sample splitting method, a full-conformal-style null augmentation method, and an oracle in-sample benchmark. Our main findings are: (1) the augmentation method is more powerful than the splitting method with matched tuning; (2) the power-optimal number of null samples for the augmentation method is a vanishing fraction of the number of tests, in which case its power approaches that of the in-sample benchmark; and (3) for a tractable approximation to the augmentation method, the optimal number of null samples scales as the square root of the number of tests, with empirical evidence suggesting a similar scaling for the method itself. These results characterize masking-induced power trade-offs in a tractable model and suggest qualitative lessons for other settings.
- Discussant: Souhardya Sengupta (Harvard University)
- Links: [Relevant papers: paper #1]

Thursday, March 26, 2026 [Link to join]
- Speaker: Jiadong Liang (University of Pennsylvania)
- Title: Optimal training-conditional regret for online conformal prediction
- Abstract: We study online conformal prediction for non-stationary data streams subject to unknown distribution drift. While most prior work studied this problem under adversarial settings and/or assessed performance in terms of gaps of time-averaged marginal coverage, we instead evaluate performance through training-conditional cumulative regret. We specifically focus on independently generated data with two types of distribution shift: abrupt change points and smooth drift.

When non-conformity score functions are pretrained on an independent dataset, we propose a split-conformal–style algorithm that leverages drift detection to adaptively update calibration sets, which provably achieves minimax-optimal regret. When non-conformity scores are instead trained online, we develop a full-conformal–style algorithm that again incorporates drift detection to handle non-stationarity; this approach relies on stability—rather than permutation symmetry—of the model-fitting algorithm, which is often better suited to online learning under evolving environments. We establish non-asymptotic regret guarantees for our online full conformal algorithm, which match the minimax lower bound under appropriate restrictions on the prediction sets. Numerical experiments corroborate our theoretical findings.

- Discussant: Guillaume Principato (Université Paris-Saclay)
- Links: [Relevant papers: paper #1]

Thursday, March 19, 2026 [Recording]
- Speaker: Yuval Benjamini (Hebrew University)
- Title: Uncertainty intervals for ranking, with applications in machine learning
- Abstract: In many scientific and machine learning evaluations, stakeholders care more about who ranks first, second, or third than about the exact numerical scores. However, the uncertainty in these rankings, which can be substantial, is often ignored. Quantifying and communicating this uncertainty requires statistical tools specifically designed for inference on ranks.

This talk develops methods for constructing statistically meaningful intervals for ranks. For a single ranking task, confidence intervals for ranks can be derived from the pairwise tests, when these are adjusted to control the family-wise error rate. However, these intervals tend to be very conservative. We propose constructing rank intervals from false-discovery rate controlled pairwise test-families, and analyze the statistical properties and the efficiency gains of these new intervals.

When multiple ranking tasks are available for the same competitors, as in model leaderboards, we propose an aggregation framework based on prediction intervals. These intervals capture both within-task uncertainty and between-task variability, providing a unified way to quantify ranking uncertainty across tasks. We demonstrate the methods for measuring uncertainty when ranking model features by their importance, and when comparing model performances in public leaderboards.

The talk is based on joint work with Bitya Neuhof and Yoav Benjamini.

- Discussant: Skyler Wu (Stanford University)
- Links: [Slides: talk; discussion]

Thursday, March 12, 2026 [Recording]
- Speaker: Peter Hoff (Duke University)
- Title: Selective and marginal selective inference for exceptional groups
- Abstract: Statistical analyses of multipopulation studies often use the data to select a particular population as the target of inference. For example, a confidence interval may be constructed for a population only in the event that its sample mean is larger than that of the other populations. We show that for the normal means model, confidence interval procedures that maintain strict coverage control conditional on such a selection event will have infinite expected width. For applications where such selective coverage control is of interest, this result motivates the development of procedures with finite expected width and approximate selective coverage control over a range of plausible parameter values. To this end, we develop selection-adjusted empirical Bayes confidence procedures that use information from the data to approximate an oracle confidence procedure that has exact selective coverage control and finite expected width. In numerical comparisons of the oracle and empirical Bayes procedures to procedures that only guarantee selective coverage control marginally over selection events, we find that improved selective coverage control comes at the cost of increased expected interval width.
- Discussant: Asaf Weinstein (The Hebrew University of Jerusalem)
- Links: [Relevant papers: paper #1]

Thursday, March 5, 2026 [Recording]
- Speaker: Yanjun Han (New York University)
- Title: Two Roads to Empirical Bayes: Mean-Field Approximation and Universal Priors
- Abstract: In high-dimensional compound decision problems, empirical Bayes seeks to approximate the Bayes decision rule under an unknown prior governing many parameters. This perspective suggests two principled approximation strategies: either approximate the unknown prior by an i.i.d. surrogate and estimate it from the data, or replace it with a prescribed dependent surrogate and approximate its Bayes rule through pretraining.

Under the first approach, we show quantitatively that high-dimensional conditional expectations under a random permutation prior admit a sharp mean-field approximation. Applied to the classical problem of distribution estimation, this analysis yields an estimator that achieves optimal instance-wise risk in a competitive framework and ultimately bests the classical Good--Turing estimator in both theory and practice.

Under the second approach, we formalize recent empirical evidence that transformers pretrained on synthetic data perform strongly on empirical Bayes tasks. Focusing on the Poisson model, we establish the existence of universal priors under which a pretrained estimator achieves near-optimal regret uniformly over arbitrary test distributions. Our analysis interprets the pretrained estimator as performing hierarchical Bayesian inference: adaptation to unknown test priors arises through posterior contraction, and length generalization (when the test sequence exceeds the training length) corresponds to inference under a fractional posterior. Numerical experiments with pretrained transformers support these theoretical predictions.

- Discussant: Jiaying Gu (University of Toronto)
- Links: [Relevant papers: paper #1, paper #2]

Thursday, February 26, 2026 [Recording]
- Speaker: Yash Nair (Stanford University)
- Title: Diversifying Conformal Selections
- Abstract: When selecting from a list of potential candidates, it is important to ensure not only that the selected items are of high quality, but also that they are sufficiently dissimilar so as to both avoid redundancy and to capture a broader range of desirable properties. In drug discovery, scientists aim to select potent drugs from a library of unsynthesized candidates, but recognize that it is wasteful to repeatedly synthesize highly similar compounds. In job hiring, recruiters may wish to hire candidates who will perform well on the job, while also considering factors such as socioeconomic background, prior work experience, gender, or race. We study the problem of using any prediction model to construct a maximally diverse selection set of candidates while controlling the false discovery rate (FDR) in a model-free fashion. Our method, diversity-aware conformal selection (DACS), achieves this by designing a general optimization procedure to construct a diverse selection set subject to a simple constraint involving conformal e-values which depend on carefully chosen stopping times. The key idea of DACS is to use optimal stopping theory to adaptively choose the set of e-values which (approximately) maximizes the expected diversity measure. We give an example diversity metric for which our procedure can be run exactly and efficiently. We also develop a number of computational heuristics which greatly improve its running time for generic diversity metrics. We demonstrate the empirical performance of our method both in simulation and on job hiring and drug discovery datasets.
- Discussant: Ulysse Gazin (The Laboratoire de Probabilités, Statistique et Modélisation)
- Links: [Relevant papers: paper #1]

Thursday, February 19, 2026 [Recording]
- Speaker: Adam Jaffe (Columbia University)
- Title: Constrained Denoising, Empirical Bayes, and Optimal Transport
- Abstract: In the statistical problem of denoising, Bayes and empirical Bayes methods can "overshrink" their output relative to the latent variables of interest. This work is focused on constrained denoising problems which mitigate such phenomena. At the oracle level, i.e., when the latent variable distribution is assumed known, we apply tools from the theory of optimal transport to characterize the solution to (i) variance-constrained, (ii) distribution-constrained, and (iii) general-constrained denoising problems. At the empirical level, i.e., when the latent variable distribution is not known, we use empirical Bayes methodology to estimate these oracle denoisers. Our approach is modular, and transforms any suitable (unconstrained) empirical Bayes denoiser into a constrained empirical Bayes denoiser. We prove explicit rates of convergence for our proposed methodologies, which both extend and sharpen existing asymptotic results that have previously considered only variance constraints. We apply our methodology in two applications: one in astronomy concerning the relative chemical abundances in a large catalog of red-clump stars, and one in baseball concerning minor- and major league batting skill for rookie players.
- Discussant: Jake Soloff (University of Michigan)
- Links: [Relevant papers: paper #1]

Thursday, February 12, 2026 [Recording]
- Speaker: Fei Xue (Purdue University)
- Title: High-dimensional statistical inference for linkage disequilibrium score regression and its cross-ancestry extensions
- Abstract: Linkage disequilibrium score regression (LDSC) has emerged as an essential tool for genetic and genomic analyses of complex traits, utilizing high- dimensional data derived from genome-wide association studies (GWAS). LDSC computes the linkage disequilibrium (LD) scores using an external reference panel, and integrates the LD scores with only summary data from the original GWAS. In this paper, we investigate LDSC within a fixed-effect data integration framework, underscoring its ability to merge multi-source GWAS data and reference panels. In particular, we take account of the genome-wide dependence among the high-dimensional GWAS summary statistics, along with the block-diagonal dependence pattern in estimated LD scores. Our analysis uncovers several key factors of both the original GWAS and reference panel datasets that determine the performance of LDSC. We show that it is relatively feasible for LDSC-based estimators to achieve asymptotic normality when applied to genome-wide genetic variants (e.g., in genetic variance and covariance estimation), whereas it becomes considerably challenging when we focus on a much smaller subset of genetic variants (e.g., in partitioned heritability analysis). Moreover, by modeling the disparities in LD patterns across different populations, we show that LDSC can be expanded to conduct cross-ancestry analyses using data from genetically distinct global populations. We validate our theoretical findings through extensive numerical evaluations using real genetic data from the UK Biobank study.
- Discussant: Rajarshi Mukherjee (Harvard University)
- Links: [Relevant papers: paper #1]

Thursday, February 5, 2026 [Recording]
- Speaker: Etienne Roquain (Sorbonne Université)
- Title: Online selective conformal inference: adaptive scores, convergence rate and optimality
- Abstract: In a supervised online setting, quantifying uncertainty has been proposed in the seminal work of \cite{gibbs2021adaptive}. For any given point-prediction algorithm, their method (ACI) produces a conformal prediction set with an average missed coverage getting close to a pre-specified level α for a long time horizon. We introduce an extended version of this algorithm, called OnlineSCI, allowing the user to additionally select times where such an inference should be made. OnlineSCI encompasses several prominent online selective tasks, such as building prediction intervals for extreme outcomes, classification with abstention, and online testing. While OnlineSCI controls the average missed coverage on the selected in an adversarial setting, our theoretical results also show that it controls the instantaneous error rate (IER) at the selected times, up to a non-asymptotical remainder term. Importantly, our theory covers the case where OnlineSCI updates the point-prediction algorithm at each time step, a property which we refer to as {\it adaptive} capability. We show that the adaptive versions of OnlineSCI can convergence to an optimal solution and provide an explicit convergence rate in each of the aforementioned application cases, under specific mild conditions. Finally, the favorable behavior of OnlineSCI in practice is illustrated by numerical experiments.
- Discussant: Ying Jin (University of Pennsylvania)
- Links: [Relevant papers: paper #1][slides]

Thursday, January 29, 2026 [Recording]
- Speaker: Nikos Ignatiadis (University of Chicago)
- Title: Stein's unbiased risk estimate and Hyvärinen's score matching
- Abstract: Given a collection of observed signals corrupted with Gaussian noise, how can we learn to optimally denoise them? This fundamental problem arises in both empirical Bayes and generative modeling. In empirical Bayes, the predominant approach is via nonparametric maximum likelihood estimation (NPMLE), while in generative modeling, score matching (SM) methods have proven very successful. In our setting, Hyvärinen's implicit SM is equivalent to another classical idea from statistics -- Stein's Unbiased Risk Estimate (SURE). Revisiting SURE minimization, we establish, for the first time, that SURE achieves nearly parametric rates of convergence of the regret in the classical empirical Bayes setting with homoscedastic noise. We also prove that SURE-training can achieve fast rates of convergence to the oracle denoiser in a commonly studied misspecified model. In contrast, the NPMLE may not even converge to the oracle denoiser under misspecification of the class of signal distributions. We show how to practically implement our method in settings involving heteroscedasticity and side-information, such as in an application to the estimation of economic mobility in the Opportunity Atlas. Our empirical results demonstrate the superior performance of SURE-training over NPMLE under misspecification. Collectively, our findings advance SURE/SM as a strong alternative to the NPMLE for empirical Bayes problems in both theory and practice.
- Discussant: Yan Chen (Duke University)
- Links: [Relevant papers: paper #1]

Monday, December 8, 2025 [Recording]
- Speaker: Patrick Kline (UC Berkeley)
- Title: Branching Fixed Effects: A Proposal for Communicating Uncertainty
- Abstract: Economists often rely on estimates of linear fixed effects models developed by other teams of researchers. Assessing the uncertainty in these estimates can be challenging. I propose a form of sample plitting for network data that breaks two-way fixed effects estimates into statistically independent branches, each of which provides an unbiased estimate of the parameters of interest. These branches facilitate uncertainty quantification, moment estimation, and shrinkage. Algorithms are developed for efficiently extracting branches from large datasets. I illustrate these techniques using a benchmark dataset from Veneto, Italy that has been widely used to study firm wage effects.
- Discussant: Martin Weidner (University of Oxford)
- Links: [Relevant papers: paper #1]

Monday, December 1, 2025 [Recording]
- Speaker: Sanjit Dandapanthula (Carnegie Mellon University)
- Title: Multiple testing in multi-stream sequential change detection
- Abstract: Multi-stream sequential change detection involves simultaneously monitoring many streams of data and trying to detect when their distributions change, if at all. Here, we theoretically study multiple testing issues that arise from detecting changes in many streams. We point out that any algorithm with finite average run length (ARL) must have a trivial worst-case false detection rate (FDR), family-wise error rate (FWER), per-family error rate (PFER), and global error rate (GER); thus, any attempt to control these Type I error metrics is fundamentally in conflict with the desire for a finite ARL (which is typically necessary in order to have a small detection delay). One of our contributions is to define a new class of metrics which can be controlled, called error over patience (EOP). We propose algorithms that combine the recent e-detector framework (which generalizes the Shiryaev-Roberts and CUSUM methods) with the recent e-Benjamini-Hochberg procedure and e-Bonferroni procedures. We prove that these algorithms control the EOP at any desired level under very general dependence structures on the data within and across the streams. In fact, we prove a more general error control that holds uniformly over all stopping times and provides a smooth trade-off between the conflicting metrics. Additionally, if finiteness of the ARL is forfeited, we show that our algorithms control the worst-case Type I error.
- Discussant: Anamitra Chaudhuri (The University of Texas at Austin)
- Links: [Relevant papers: paper #1]

Monday, November 24, 2025 [Recording]
- Speaker: Wanrong Zhu (UC Irvine)
- Title: Conformal prediction after data-dependent model selection
- Abstract: Given a family of pretrained models and a hold-out set, how can we construct a valid conformal prediction set while selecting a model that minimizes the width of the set? If we use the same hold-out data set both to select a model (the model that yields the smallest conformal prediction sets) and then to construct a conformal prediction set based on that selected model, we suffer a loss of coverage due to selection bias. Alternatively, we could further splitting the data to perform selection and calibration separately, but this comes at a steep cost if the size of the dataset is limited. In this paper, we address the challenge of constructing a valid prediction set after data-dependent model selection -- commonly, selecting the model that minimizes the width of the resulting prediction sets. Our novel methods can be implemented efficiently and admit finite-sample validity guarantees without invoking additional sample-splitting. We show that our methods yield prediction sets with asymptotically optimal width under certain notion of regularity for the model class. The improvement in the width of the prediction sets constructed by our methods are further demonstrated through applications to synthetic datasets in various settings and a real data example.
- Discussant: Ran Xie (Stanford University)
- Links: [Relevant papers: paper #1]

Wednesday, November 19, 2025 (200-th ISSI seminar) [Recording]
- Speaker: Emmanuel Candès (Stanford University)
- Title: What Statistics and AI Oﬀer Each Other?
- Abstract: This talk will discuss how thinking carefully about AI inputs and outputs yields more powerful, safer AI. By examining several vignettes, we shall answer questions such as: How do we train language models under cost constraints? What happens when we’ve exhausted all available data? If I start a clinical trial using the drug AI thinks is best, will it pan out? How can we ensure high quality products when AI is used in a larger workflow? That is, how do I know whether AI automated a task correctly? AI powered predictions are beginning to substitute for real data when collection of the latter is difficult, slow, or costly. How then should we leverage machine learning predictions both as a substitute for high-quality data and as a tool for guiding real data collection?

Monday, November 3, 2025 [Recording]
- Speaker: Lucas Janson (Harvard University)
- Title: Chiseling: Powerful and Valid Subgroup Selection via Interactive Machine Learning
- Abstract: In regression and causal inference, controlled subgroup selection aims to identify, with inferential guarantees, a subgroup (defined as a subset of the covariate space) on which the average response or treatment effect is above a given threshold. E.g., in a clinical trial, it may be of interest to find a subgroup with a positive average treatment effect. However, existing methods either lack inferential guarantees, heavily restrict the search for the subgroup, or sacrifice efficiency by naive data splitting. We propose a novel framework called chiseling that allows the analyst to interactively refine and test a candidate subgroup by iteratively shrinking it. The sole restriction is that the shrinkage direction only depends on the points outside the current subgroup, but otherwise the analyst may leverage any prior information or machine learning algorithm. Despite this flexibility, chiseling controls the probability that the discovered subgroup is null (e.g., has a non-positive average treatment effect) under minimal assumptions: for example, in randomized experiments, this inferential validity guarantee holds under only bounded moment conditions. When applied to a variety of simulated datasets and a real survey experiment, chiseling identifies substantially better subgroups than existing methods with inferential guarantees. This is joint work with Nathan Cheng and Asher Spector.
- Discussant: Jann Spiess (Stanford University)
- Links: [Relevant papers: paper #1]

Monday, October 27, 2025 [Recording]
- Speaker: Aaditya Ramdas (Carnegie Mellon University)
- Title: Locally minimax optimal confidence sets for the best model
- Abstract: This paper tackles a fundamental inference problem: given n observations from a distribution P over R d with unknown mean µ, we must form a confidence set for the index (or indices) corresponding to the smallest component of µ. By duality, we reduce this to testing, for each r in 1, . . . , d, whether µr is the smallest. Based on the sample splitting and self-normalization approach of Kim and Ramdas (2024), we propose “dimension-agnostic” tests that maintain validity regardless of how d scales with n, and regardless of arbitrary ties in µ. Notably, our validity holds under mild moment conditions, requiring little more than finiteness of a second moment, and permitting possibly strong dependence between coordinates. In addition, we establish the local minimax separation rate for this problem, which adapts to the cardinality of a confusion set, and show that the proposed tests attain this rate. Furthermore, we develop robust variants that continue to achieve the same minimax rate under heavy-tailed distributions with only finite second moments. While these results highlight the theoretical strength of our method, a practical concern is that sample splitting can reduce finite-sample power. We show that this drawback can be substantially alleviated by the multi-split aggregation method of Guo and Shah (2025). Finally, empirical results on simulated and real data illustrate the strong performance of our approach in terms of type I error control and power compared to existing methods.
- Discussant: Lihua Lei (Stanford University)
- Links: [Relevant papers: paper #1]

Tuesday, June 3, 2025 [Recording]
- Speaker: Ziang Niu (University of Pennsylvania)
- Title: Assumption-lean weak limits and tests for two-stage adaptive experiments
- Abstract: Adaptive experiments are becoming increasingly popular in real-world applications for effectively maximizing in-sample welfare and efficiency by data-driven sampling. Despite their growing prevalence, however, the statistical foundations for valid inference in such settings remain underdeveloped. Focusing on two-stage adaptive experimental designs, we address this gap by deriving new weak convergence results for mean outcomes and their differences. In particular, our results apply to a broad class of estimators, weighted inverse probability weighting (WIPW) estimators. In contrast to prior work, our results require significantly weaker assumptions and sharply characterize phase transitions in limiting behavior across different signal regimes. Through this common lens, our general results unify previously fragmented results under the two-stage setup. To address potential non-normal limiting behavior, we propose a computationally efficient and provably valid plug-in bootstrap method for hypothesis testing. Our results and approaches are sufficiently general to accommodate various adaptive experimental designs, including batched bandit and subgroup enrichment experiments. Simulations and semi-synthetic studies demonstrate the practical value of our approach, revealing statistical phenomena unique to adaptive experiments.
- Discussant: Zijun Gao (University of Southern California)
- Links: [Relevant papers: paper #1][Slides: talk; discussion]

Tuesday, May 27, 2025 [Recording]
- Speaker: Neil Xu (Carnegie Mellon University)
- Title: Bringing Closure to FDR Control With a Uniform Improvement of the e-Benjamini-Hochberg Procedure
- Abstract: We present a novel necessary and sufficient principle for multiple testing methods. This principle asserts that every multiple testing method is a special case of a general closed testing procedure based on e-values. It generalizes the standard closure principle, known to underlie all methods controlling familywise error and tail probabilities of false discovery proportions, to a large class of error rates --- in particular, this generalized closure principle applies to methods controlling the false discovery rate (FDR). By writing existing methods as special cases of this procedure, we can achieve uniform improvements of these methods, and we show this in particular for the eBH and the BY procedures, as well as the self-consistent method of Su (2018). We also show that methods derived using the closure principle have several valuable properties. They generally control their error rate not just for one rejected set, but simultaneously over many, allowing post hoc flexibility for the researcher. Moreover, we show that because all multiple testing methods for all error rates are special cases of the same procedure, researchers may even choose the target error rate post hoc. Under certain conditions, this flexibility even extends to post hoc choice of the nominal error rate. In addition, the closure principle allows methods to exploit logical relationships between hypotheses to gain power.

This is joint work with Aldo Solari, Lasse Fischer, Rianne de Heide, Aaditya Ramdas, and Jelle Goeman.

- Discussant: Junu Lee (University of Pennsylvania)
- Links: [Relevant papers: paper #1, paper #2][Slides]

Tuesday, May 20, 2025 [Recording]
- Speaker: Aytijhya Saha (Indian Statistical Institute)
- Title: Post-detection inference for sequential changepoint localization
- Abstract: This talk will focus on a fundamental but largely unexplored challenge in sequential changepoint analysis: conducting inference following a detected change. We study the problem of localizing the changepoint using only the data observed up to a data-dependent stopping time at which a sequential detection algorithm $\mathcal A$ declares a change. We first construct confidence sets for the unknown changepoint when pre- and post-change distributions are assumed to be known. We then extend our framework to composite pre- and post-change scenarios. We impose no conditions on the observation space or on $\mathcal A$ --- we only need to be able to run $\mathcal A$ on simulated data sequences. In summary, this work offers both theoretically sound and practically effective tools for sequential changepoint localization.
- Discussant: Yao Xie (Georgia Institute of Technology)
- Links: [Relevant papers: paper #1]

Tuesday, May 13, 2025 [Recording]
- Speaker: Hongjian Wang (Carnegie Mellon University)
- Title: Anytime-valid FDR control with the stopped e-BH procedure
- Abstract: The recent e-Benjamini-Hochberg (e-BH) procedure for multiple hypothesis testing is known to control the false discovery rate (FDR) under arbitrary dependence between the input e-values. This paper points out an important subtlety when applying the e-BH procedure with e-processes, which are sequential generalizations of e-values (where the data are observed sequentially). Since adaptively stopped e-processes are e-values, the e-BH procedure can be repeatedly applied at every time step, and one can continuously monitor the e-processes and the rejection sets obtained. One would hope that the "stopped e-BH procedure" (se-BH) has an FDR guarantee for the rejection set obtained at any stopping time. However, while this is true if the data in different streams are independent, it is not true in full generality, because each stopped e-process is an e-value only for stopping times in its own local filtration, but the se-BH procedure employs a stopping time with respect to a global filtration. This can cause information to leak across time, allowing one stream to know its future by knowing past data of another stream. This paper formulates a simple causal condition under which local e-processes are also global e-processes and thus the se-BH procedure does indeed control the FDR. The condition excludes unobserved confounding from the past and is met under most reasonable scenarios including genomics.
- Discussant: Yo Joong Choe (University of Chicago)
- Links: [Relevant papers: paper #1][Slides: discussion]

Tuesday, May 6, 2025 [Recording]
- Speaker: Li Ma (Duke University)
- Title: Tree-Based Generative Modeling through the Lens of Selective Inference
- Abstract: In this talk, I will explore how tree structures and partitions can be leveraged to model complex data distributions. I show that effective inference in these tree-based generative models fundamentally relies on addressing two forms of selective inference. The first arises from selecting the tree or partition of the sample space, which is crucial for capturing the structural characteristics of multivariate distributions. The second involves selection in assigning probabilities along sequential splits of the sample space, enabling adaptation to potentially heterogeneous smoothness in the underlying density. I will present a suite of modeling and inference techniques within a probabilistic modeling and likelihood-based inference framework, designed to systematically address these selection challenges. These methods facilitate efficient learning of partition trees, impose appropriate regularization on probability assignments, and enable data-adaptive reduction of the tree space on which selection occurs. Together, they provide a principled approach to enhancing the flexibility and interpretability of tree-based generative models.
- Discussant: Qian Zhao (University of Massachusetts, Amherst)
- Links: [Relevant papers: paper #1]

Tuesday, April 29, 2025 [Recording]
- Speaker: Yusuf Sale (Ludwig Maximilian University of Munich)
- Title: Online Selective Conformal Prediction: Errors and Solutions
- Abstract: In online selective conformal inference, data arrives sequentially, and prediction intervals are constructed only when an online selection rule is met. Since online selections may break the exchangeability between the selected test datum and the rest of the data, one must correct for this by suitably selecting the calibration data. We evaluate existing calibration selection strategies and pinpoint some fundamental errors in the associated claims that guarantee selection-conditional coverage and control of the false coverage rate (FCR). To address these shortcomings, we propose novel calibration selection strategies that provably preserve the exchangeability of the calibration data and the selected test datum. Consequently, we demonstrate that online selective conformal inference with these strategies guarantees both selection-conditional coverage and FCR control. Our theoretical findings are supported by experimental evidence examining tradeoffs between valid methods.
- Discussant: Yajie Bao (Nankai University)
- Links: [Relevant papers: paper #1]

Tuesday, April 22, 2025 [Recording]
- Speaker: Ying Jin (Harvard University)
- Title: Automated Hypothesis Validation with Agentic Sequential Falsifications
- Abstract: Hypotheses are central to information acquisition, decision-making, and discovery. However, many real-world hypotheses are abstract, high-level statements that are difficult to validate directly. This challenge is further intensified by the rise of hypothesis generation from Large Language Models (LLMs), which are prone to hallucination and produce hypotheses in volumes that make manual validation impractical. Here we propose POPPER, an agentic framework for rigorous automated validation of free-form hypotheses. Guided by Karl Popper’s principle of falsification, POPPER validates a hypothesis using LLM agents that design and execute falsification experiments targeting its measurable implications. We employ a sequential testing framework to ensure strict Type-I error control while actively gathering evidence from diverse observations, whether drawn from existing data or newly conducted procedures. We demonstrate POPPER on six domains including biology, economics, and sociology. POPPER delivers robust error control, high power, and scalability. Furthermore, compared to human scientists, POPPER achieved comparable performance in validating complex biological hypotheses while reducing time by 10 folds, providing a scalable, rigorous solution for hypothesis validation.
- Discussant: Haokun Liu (University of Chicago)
- Links: [Relevant papers: paper #1]

Tuesday, April 15, 2025 [Recording]
- Speaker: Maximilian Kasy (University of Oxford)
- Title: Optimal Pre-Analysis Plans: Statistical Decisions Subject to Implementability
- Abstract: What is the purpose of pre-analysis plans, and how should they be designed? We model the interaction between an agent who analyzes data and a principal who makes a decision based on agent reports. The agent could be the manufacturer of a new drug, and the principal a regulator deciding whether the drug is approved. Or the agent could be a researcher submitting a research paper, and the principal an editor deciding whether it is published. The agent decides which statistics to report to the principal. The principal cannot verify whether the analyst reported selectively. Absent a pre-analysis message, if there are conflicts of interest, then many desirable decision rules cannot be implemented. Allowing the agent to send a message before seeing the data increases the set of decision rules that can be implemented, and allows the principal to leverage agent expertise. The optimal mechanisms that we characterize require pre-analysis plans. Applying these results to hypothesis testing, we show that optimal rejection rules pre-register a valid test, and make worst-case assumptions about unreported statistics. Optimal tests can be found as a solution to a linear-programming problem.
- Discussant: Jonathan Libgober (University of Southern California)
- Links: [Relevant papers: paper #1]

Tuesday, April 8, 2025 [Recording]
- Speaker: William Hartog (Stanford University)
- Title: Family-wise Error Rate Control with E-values
- Abstract: The closure principle is a standard tool for achieving family-wise error rate (FWER) control in multiple testing problems. In general, the computational cost for closed testing can be exponential in the number of hypotheses. The celebrated graphical approach of FWER control [Bretz et al., 2009] overcomes the computational hurdle by using weighted Bonferroni local tests on p-values with appropriately chosen weights. In this study, we extend the graphical approach to e-values. With valid e-values – common in settings of sequential hypothesis testing or universal inference for irregular parametric models – we can derive strictly more powerful local tests based on weighted averages of e-values. Consequently, this e-value-based closed test is more powerful than the corresponding graphical approach with inverse e-values as p-values. Although the computational shortcuts for the p-value-based graphical approach are not applicable, we develop efficient polynomial-time algorithms using dynamic programming for e-value-based graphical approaches with any directed acyclic graph. For special graphs, such as those used in the Holm’s procedure and fallback procedure, we develop tailored algorithms with computation cost linear in the number of hypotheses, up to logarithmic factors.
- Discussant: Ruodu Wang (University of Waterloo)
- Links: [Relevant papers: paper #1]

Tuesday, April 1, 2025 [Recording]
- Speaker: Sida Li (University of Chicago)
- Title: Prediction-Powered Adaptive Shrinkage Estimation
- Abstract: Prediction-Powered Inference (PPI) is a powerful framework for enhancing statistical estimates by combining limited gold-standard data with machine learning (ML) predictions. While prior work has demonstrated PPI's benefits for individual statistical tasks, modern applications require answering numerous parallel statistical questions. We introduce Prediction-Powered Adaptive Shrinkage (PAS), a method that bridges PPI with empirical Bayes shrinkage to improve the estimation of multiple means. PAS debiases noisy ML predictions within each task and then borrows strength across tasks by using those same predictions as a reference point for shrinkage. The amount of shrinkage is determined by minimizing an unbiased estimate of risk, and we prove that this tuning strategy is asymptotically optimal. Experiments on both synthetic and real-world datasets show that PAS adapts to the reliability of the ML predictions and outperforms traditional and modern baselines in large-scale applications.
- Discussant: Dan Kluger (MIT)
- Links: [Relevant papers: paper #1]

Tuesday, March 25, 2025 [Recording]
- Speaker: Andreas Petrou-Zeniou (MIT)
- Title: Inference on Multiple Winners with Applications to Economic Mobility
- Abstract: While policymakers and researchers are often concerned with conducting inference based on a data-dependent selection, a strictly larger class of inference problems arises when considering multiple data-dependent selections, such as when selecting on statistical significance or quantiles. Given this, we study the problem of conducting inference on populations selected according to their ranks, which we dub the inference on multiple winners problem. In this setting, we encounter both selective and simultaneous inference problems, making existing approaches either not applicable or too conservative. Instead, we propose a novel, two-step approach to the inference on multiple winners problem, with the first step modeling a key nuisance parameter driving selection, and the second step using this model to derive critical values on the errors of the winners. In simulations, our two-step approach reduces over-coverage error by up to 96% relative to existing approaches. In a stylized example on job training, we demonstrate that existing approaches partially apply, and that our novel two-step approach is broadly applicable and yields informative confidence sets. In a second application, we apply our two-step approach to revisit the winner's curse in the Creating Moves to Opportunity (CMTO) program. We find that, after correcting for the inference on multiple winners problem, we fail to reject the possibility of null effects in the majority of census tracts selected by the CMTO program.
- Discussant: Sarah Moon (MIT)
- Links: [Relevant papers: paper #1][Slides]

Tuesday, March 18, 2025 [Recording]
- Speaker: Lan Gao (The University of Tennessee Knoxville)
- Title: Asymptotic FDR Control with Model-X Knockoffs: Is Moments Matching Sufficient?
- Abstract: We propose a unified theoretical framework for studying the robustness of the model-X knockoffs framework by investigating the asymptotic false discovery rate (FDR) control of the practically implemented approximate knockoffs procedure. This procedure deviates from the model-X knockoffs framework by substituting the true covariate distribution with a user-specified distribution that can be learned using in-sample observations. By replacing the distributional exchangeability condition of the model-X knockoff variables with three conditions on the approximate knockoff statistics, we establish that the approximate knockoffs procedure achieves the asymptotic FDR control. Using our unified framework, we further prove that an arguably most popularly used knockoff variable generation method—the Gaussian knockoffs generator based on the first two moments matching—achieves the asymptotic FDR control when the two-moment-based knockoff statistics are employed in the knockoffs inference procedure. For the first time in the literature, our theoretical results justify formally the effectiveness and robustness of the Gaussian knockoffs generator. Simulation and real data examples are conducted to validate the theoretical findings.
- Discussant: Abhinav Chakraborty (University of Pennsylvania)
- Links: [Relevant papers: paper #1]

Tuesday, March 11, 2025 [Recording]
- Speaker: Jelle Goeman (Leiden University)
- Title: Should we use spatial information in closed testing for neuroimaging data?
- Abstract: In functional neuroimaging, researchers aim to find regions of the brain that respond to a certain stimulus. Dividing the brain into around 200,000 voxels (3d pixels), a p-value can be calculated per voxel for the null hypothesis that there is no response. For addressing the resulting multiple testing problem, closed testing procedures are well-suited, since they give a simultaneous 95% confidence lower bound to the proportion of active voxels in all regions, giving statistical rigor and researcher flexibility. When constructing such a closed testing procedure, it seems sensible to take into account the spatial nature of the brain. P-values of voxels close together are highly correlated, regions of interest generally consist of connected collections of voxels, and the signal is expected to cluster spatially. The standard regionwise family-wise error controlling method in neuroimaging, cluster extent inference, therefore explicitly takes connectedness into account. In this talk we compare two closed testing methods. The first one is constructed as a uniform improvement of cluster extent inference and takes the location of each p-value in the brain into account. The second one ignores this spatial information, but is otherwise very similar. Paradoxically, we find that the method that discards the spatial information tends to give larger activation proportions.
- Discussant: Anna Vesely (University of Bologna)
- Links: [Relevant papers: paper #1, paper #2]

Tuesday, March 4, 2025 [Recording]
- Speaker: Snigdha Panigrahi (University of Michigan)
- Title: Inference with Randomized Regression Trees
- Abstract: Regression trees are a widely used machine learning algorithm that fit piecewise constant models by recursively partitioning the predictor space. In this talk, I will introduce Randomized Regression Trees (RRT), a novel selective inference method that enables statistical inference in a data-dependent model derived from the fitted tree. The RRT method achieves this by adding independent Gaussian noise to the gain function underlying the splitting rules of classical regression trees.

The RRT method offers several advantages. First, it utilizes the added randomization to obtain an exact pivot using the full dataset, while accounting for the data-dependent structure of the fitted tree. Second, with a small amount of randomization, the RRT method achieves predictive accuracy similar to a model trained on the entire dataset. At the same time, it provides significantly more powerful inference than data splitting methods, which rely only on a held-out portion of the data for inference. Third, unlike data splitting approaches, it yields intervals that adapt to the signal strength in the data. Throughout this talk, I will demonstrate how RRT transforms a purely predictive algorithm into a method capable of performing reliable and powerful inference in the fitted tree model.

Discussant: Anna Neufeld (Williams College)
Links: [Relevant papers: paper #1]

Tuesday, February 25, 2025 [Recording]
- Speaker: Keisuke Hirano (The Pennsylvania State University)
- Title: Asymptotic Representations for Sequential Decisions, Adaptive Experiments, and Batched Bandits
- Abstract: We develop asymptotic approximation results that can be applied to sequential estimation and inference problems, adaptive randomized controlled trials, and other statistical decision problems that involve multiple decision nodes with structured and possibly endogenous information sets. Our results extend the classic asymptotic representation theorem used extensively in efficiency bound theory and local power analysis. In adaptive settings where the decision at one stage can affect the observation of variables in later stages, we show that a limiting data environment characterizes all limit distributions attainable through a joint choice of an adaptive design rule and statistics applied to the adaptively generated data, under local alternatives. We illustrate how the theory can be applied to study the choice of adaptive rules and end-of-sample statistical inference in batched (groupwise) sequential adaptive experiments.
- Discussant: Kevin Chen (Stanford University)
- Links: [Relevant papers: paper #1][Slides]

Tuesday, February 18, 2025 [Recording]
- Speaker: Anav Sood (Stanford University)
- Title: Selective inference is easier with p-values
- Abstract: Selective inference is a subfield of statistics that enables valid inference after selection of a data-dependent question. In this paper, we introduce selectively dominant p-values, a class of p-values that allow practitioners to easily perform inference after arbitrary selection procedures. Unlike a traditional p-value, whose distribution must stochastically dominate the uniform distribution under the null, a selectively dominant p-value must have a post-selection distribution that stochastically dominates that of a uniform having undergone the same selection process; moreover, this property must hold simultaneously for all possible selection processes. Despite the strength of this condition, we show that all commonly used p-values (e.g., p-values from two-sided testing in parametric families, one-sided testing in monotone likelihood ratio and exponential families, F-tests for linear regression, and permutation tests) are selectively dominant. By recasting two canonical selective inference problems—inference on winners and rank verification—in our selective dominance framework, we provide simpler derivations, a deeper conceptual understanding, and new generalizations and variations of these methods. Additionally, we use our insights to introduce selective variants of methods that combine p-values, such as Fisher's combination test.
- Discussant: Maximilian Kasy (University of Oxford)
- Links: [Relevant papers: paper #1][Slides:talk; discussion]

Tuesday, February 11, 2025 [Recording]
- Speaker: Daniel Xiang (University of Chicago)
- Title: A frequentist local false discovery rate
- Abstract: The local false discovery rate (lfdr) of (Efron et. al 2001) enjoys major conceptual and decision-theoretic advantages over the false discovery rate (FDR) as an error criterion in multiple testing, but is only well-defined in Bayesian models where the truth status of each null hypothesis is random. We define a frequentist counterpart to the lfdr based on the relative frequency of nulls at each point in the sample space. The frequentist lfdr is defined without reference to any prior, but preserves several important properties of the Bayesian lfdr: For continuous test statistics, lfdr(t) gives the probability, conditional on observing some statistic equal to t, that the corresponding null hypothesis is true. Evaluating the lfdr at an individual test statistic also yields a calibrated forecast of whether its null hypothesis is true. Finally, thresholding the lfdr at 1/(1+c) gives the best separable rejection rule under the weighted classification loss where Type I errors are c times as costly as Type II errors. The lfdr can be estimated efficiently using parametric or non-parametric methods, and a closely related error criterion can be provably controlled in finite samples under independence assumptions. Whereas the FDR measures the average quality of all discoveries in a given rejection region, our lfdr measures how the quality of discoveries varies across the rejection region, allowing for a more fine-grained analysis. This is joint work with Jake Soloff and Will Fithian.
- Discussant: Asaf Weinstein (Hebrew University, Jerusalem)
- Links: [Relevant papers: paper #1]

Tuesday, February 4, 2025 [Recording]
- Speaker: Sifan Liu (Flatiron Institute)
- Title: Cross-Validation with Antithetic Gaussian Randomization
- Abstract: We introduce a method for performing cross-validation without sample splitting. Our approach constructs train-test data pairs using externally generated Gaussian randomization variables, drawing inspiration from recent randomization techniques such as data-fission and data-thinning. The key innovation lies in a carefully designed correlation structure among these randomization variables, referred to as antithetic Gaussian randomization. This correlation is crucial in maintaining a bounded variance while allowing the bias of the estimator to vanish, offering an additional advantage over standard cross-validation, whose performance depends heavily on the bias-variance tradeoff dictated by the number of folds. We provide a theoretical analysis of the mean squared error of the proposed estimator, proving that as the level of randomization decreases to zero, the bias converges to zero, while the variance remains bounded and decays linearly with the number of repetitions. This analysis highlights the benefits of the antithetic Gaussian randomization over independent randomization. The method is well-suited for problems where traditional sample splitting is infeasible, such as when data are not assumed to be independently and identically distributed. Even in scenarios where sample splitting is possible, our method offers a computationally efficient alternative for estimating prediction error, achieving comparable or even lower error than standard cross-validation at a significantly reduced computational cost. This is based on joint work with Snigdha Panigrahi and Jake A. Soloff.
- Discussant: Jing Lei (Carnegie Mellon University)
- Links: [Relevant papers: paper #1]

Tuesday, January 28, 2025 [Recording]
- Speaker: Zhimei Ren (University of Pennsylvania)
- Title: Confidence on the Focal: Conformal Prediction with Selection-Conditional Coverage
- Abstract: Conformal prediction builds marginally valid prediction intervals that cover the unknown outcome of a randomly drawn test point with a prescribed probability. However, in practice, data-driven methods are often used to identify specific test unit(s) of interest, requiring uncertainty quantification tailored to these focal units. In such cases, marginally valid conformal prediction intervals may fail to provide valid coverage for the focal unit(s) due to selection bias. In this talk, I will present a general framework for constructing a prediction set with finite-sample exact coverage, conditional on the unit being selected by a given procedure. The general form of our method accommodates arbitrary selection rules that are invariant to the permutation of the calibration units, and generalizes Mondrian Conformal Prediction to multiple test units and non-equivariant classifiers. We then work out computationally efficient implementation of our framework for a number of realistic selection rules, including top-K selection, optimization-based selection, selection based on conformal p-values, and selection based on properties of preliminary conformal prediction sets. The performance of our methods is demonstrated via applications in drug discovery and health risk prediction.
- Discussant: Ziyi Liang (UC Irvine)
- Links: [Relevant papers: paper #1]

Monday, December 9, 2024 [Recording]
- Speaker: Muriel Pérez-Ortiz (Eindhoven University of Technology)
- Title: E-statistics, group invariance and anytime-valid testing
- Abstract: We study worst-case-growth-rate-optimal (GROW) e-statistics for hypothesis testing between two group models. It is known that under a mild condition on the action of the underlying group G on the data, there exists a maximally invariant statistic. We show that among all e-statistics, invariant or not, the likelihood ratio of the maximally invariant statistic is GROW, both in the absolute and in the relative sense, and that an anytime-valid test can be based on it. The GROW e-statistic is equal to a Bayes factor with a right Haar prior on G. Our treatment avoids nonuniqueness issues that sometimes arise for such priors in Bayesian contexts. A crucial assumption on the group G is its amenability, a well-known group-theoretical condition, which holds, for instance, in scale-location families. Our results also apply to finite-dimensional linear regression.
- Discussant: Junu Lee (University of Pennsylvania)
- Links: [Relevant papers: paper #1]

Monday, December 2, 2024 [Recording]
- Speaker: Ariane Marandon (Turing Institute)
- Title: Two-sided conformalized survival analysis
- Abstract: In this work, we consider the problem of generating prediction intervals for survival times. In classification/regression, prediction intervals with guaranteed coverage can be constructed with conformal prediction (CP) however, classical CP cannot be directly applied for predicting survival times due to censoring.

Previously, Candès et al. (2023) introduced a novel method based on CP to generate valid and efficient lower predictive bounds on survival times. This paper considers a different problem: that of generating an upper predictive bound (in addition to a lower predictive bound). We propose a new method using CP that generates two-sided or one-sided prediction intervals for survival times. Specifically, the method provides both lower and upper predictive bounds for individuals deemed sufficiently similar to the non-censored population, while returning only a lower bound for others. The prediction intervals offer finite-sample coverage guarantees, requiring no distributional assumptions other than the sampled data points are independent and identically distributed. The performance of the procedure is assessed using both synthetic and real-world datasets. Joint work with Chris Holmes (Dep. Of Statistics, Oxford University)

- Discussant: Rohan Hore (University of Chicago)
- Links: [Relevant papers: paper #1]

Monday, November 25th, 2024 [Recording]
- Speaker: Iqraa Meah (Centre for Research in Epidemiology and Statistics)
- Title: False discovery proportion envelopes with m-consistency
- Abstract: In this talk, I will present new non-asymptotic confidence envelopes for the false discovery proportion (FDP) in a multiple testing scenario called the "preordered setting" introduced by Katsevich and Ramdas (2020). This setting involves p-values that arrive in a pre-specified order, exemplified by the Barber and Candès (2015) Knockoff procedure, where p-values are derived from a high-dimensional linear regression model and ordered using ancillary statistics independent of the tests.

In this setting, our emphasis is on obtaining FDP confidence bounds that both have non-asymptotic coverage and are asymptotically accurate in a specific sense, as the number m of tested hypotheses grows. Namely, we introduce and study the property (which we call m-consistency) that the confidence bound converges to or below the desired level α when applied to a specific reference α-level false discovery rate (FDR) controlling procedure.

With this perspective in mind, we derive new bounds that provide improvements over existing ones, both theoretically and practically, and are suitable for situations where at least a moderate number of rejections is expected. In particular, the improvement is significant for knockoff p-values, which shows the impact of the method for a practical use. These improvements are illustrated with numerical experiments and real data examples.

- Discussant: Eugene Katsevich (University of Pennsylvania)
- Links: [Relevant papers: paper #1]

Monday, November 18th, 2024 [Recording]
- Speaker: Ian Waudby-Smith (UC Berkeley)
- Title: Distribution-uniform anytime-valid inference and sequential tests of conditional independence without Model-X
- Abstract: Are asymptotic confidence sequences and anytime p-values uniformly valid for a nontrivial class of distributions P? We give a positive answer to this question by deriving distribution-uniform anytime-valid inference procedures. Historically, anytime-valid methods -- including confidence sequences, anytime p-values, and sequential hypothesis tests that enable inference at stopping times -- have been justified nonasymptotically. Nevertheless, asymptotic procedures such as those based on the central limit theorem occupy an important part of statistical toolbox due to their simplicity, universality, and weak assumptions. While recent work has derived asymptotic analogues of anytime-valid methods with the aforementioned benefits, these were not shown to be P-uniform, meaning that their asymptotics are not uniformly valid in a class of distributions P. Indeed, the anytime-valid inference literature currently has no central limit theory to draw from that is both uniform in P and in the sample size n. This paper fills that gap by deriving a novel P-uniform strong Gaussian approximation theorem. We apply some of these results to obtain an anytime-valid test of conditional independence without the Model-X assumption.
- Discussant: Peter Grünwald (Centrum Wiskunde & Informatica and Leiden University)
- Links: [Relevant papers: paper #1, paper #2, paper #3]

Monday, November 11th, 2024 [No recording is available for this seminar]
- Speaker: Anru Zhang (Duke University)
- Title: Functional post-clustering selective inference with applications to EHR
- Abstract: In electronic health records (EHR) analysis, clustering patients according to patterns in their data is crucial for uncovering new subtypes of diseases. Existing medical literature often relies on classical hypothesis testing methods to test for differences in means between these clusters. Due to selection bias induced by clustering algorithms, the implementation of these classical methods on post-clustering data often leads to an inflated type-I error. In this paper, we introduce a new statistical approach that adjusts for this bias when analyzing data collected over time. Our method extends classical selective inference methods for cross-sectional data to longitudinal data. We provide theoretical guarantees for our approach with upper bounds on the selective type-I and type-II errors. We apply the method to simulated data and real-world Acute Kidney Injury (AKI) EHR datasets, thereby illustrating the advantages of our approach.
- Discussant: Yiqun Chen (Stanford University)
- Links: [Relevant papers: paper #1]

Monday, November 4th, 2024 [Recording]
- Speaker: Ziang Niu (University of Pennsylvania)
- Title: Computationally efficient and statistically accurate conditional independence testing with spaCRT
- Abstract: We introduce the saddlepoint approximation-based conditional randomization test (spaCRT), a novel conditional independence test that effectively balances statistical accuracy and computational efficiency, inspired by applications to single-cell CRISPR screens. Resampling-based methods like the distilled conditional randomization test (dCRT) offer statistical precision but at a high computational cost. The spaCRT leverages a saddlepoint approximation to the resampling distribution of the dCRT test statistic, achieving very similar finite-sample statistical performance with significantly reduced computational demands. We prove that the spaCRT p-value approximates the dCRT p-value with vanishing relative error, and that these two tests are asymptotically equivalent. Through extensive simulations and real data analysis, we demonstrate that the spaCRT controls Type-I error and maintains high power, outperforming other asymptotic and resampling-based tests. Our method is particularly well-suited for large-scale single-cell CRISPR screen analyses, facilitating the efficient and accurate assessment of perturbation-gene associations.
- Discussant: Molei Liu (Columbia University)
- Links: [Relevant papers: paper #1]

Monday, October 28th, 2024 [Recording]
- Speaker: Nikos Ignatiadis (University of Chicago)
- Title: Empirical partially Bayes multiple testing and compound χ2 decisions
- Abstract: A common task in high-throughput biology is to screen for associations across thousands of units of interest, e.g., genes or proteins. Often, the data for each unit are modeled as Gaussian measurements with unknown mean and variance and are summarized as per-unit sample averages and sample variances. The downstream goal is multiple testing for the means. In this domain, it is routine to "moderate" (that is, to shrink) the sample variances through parametric empirical Bayes methods before computing p-values for the means. Such an approach is asymmetric in that a prior is posited and estimated for the nuisance parameters (variances) but not the primary parameters (means). Our work initiates the formal study of this paradigm, which we term "empirical partially Bayes multiple testing." In this framework, if the prior for the variances were known, one could proceed by computing p-values conditional on the sample variances -- a strategy called partially Bayes inference by Sir David Cox. We show that these conditional p-values satisfy an Eddington/Tweedie-type formula and are approximated at nearly-parametric rates when the prior is estimated by nonparametric maximum likelihood. The estimated p-values can be used with the Benjamini-Hochberg procedure to guarantee asymptotic control of the false discovery rate. Even in the compound setting, wherein the variances are fixed, the approach retains asymptotic type-I error guarantees.
- Discussant: Jelle Goeman (Leiden University)
- Links: [Relevant papers: paper #1]

Monday, October 21st, 2024 [Recording]
- Speaker: Kai Zhang (University of North Carolina at Chapel Hill)
- Title: BET and BELIEF
- Abstract: We study the problem of distribution-free dependence detection and modeling through the new framework of binary expansion statistics (BEStat). The binary expansion testing (BET) avoids the problem of non-uniform consistency and improves upon a wide class of commonly used methods (a) by achieving the minimax rate in sample size requirement for reliable power and (b) by providing clear interpretations of global relationships upon rejection of independence. The binary expansion approach also connects the symmetry statistics with the current computing system to facilitate efficient bitwise implementation. Modeling with the binary expansion linear effect (BELIEF) is motivated by the fact that two linearly uncorrelated binary variables must be also independent. Inferences from BELIEF are easily interpretable because they describe the association of binary variables in the language of linear models, yielding convenient theoretical insight and striking parallels with the Gaussian world. With BELIEF, one may study generalized linear models (GLM) through transparent linear models, providing insight into how modeling is affected by the choice of link. We explore these phenomena and provide a host of related theoretical results. This is joint work with Benjamin Brown and Xiao-Li Meng.
- Discussant: Hongjian Shi (Technical University of Munich)
- Links: [Relevant papers: paper #1, paper #2]

Monday, October 14st, 2024 [Recording]
- Speaker: William Bekerman (University of Pennsylvania)
- Title: Hypothesis selection via sample splitting for valid powerful testing in matched observational studies
- Abstract: Observational studies are valuable tools for inferring causal effects in the absence of controlled experiments. However, these studies may be biased due to the presence of some relevant, unmeasured set of covariates. One approach to mitigate this concern is to identify hypotheses likely to be more resilient to hidden biases by splitting the data into a planning sample for designing the study and an analysis sample for making inferences. We devise a powerful and flexible method for selecting hypotheses in the planning sample when an unknown number of outcomes are affected by the treatment. We investigate the theoretical properties of our method and conduct extensive simulations that demonstrate pronounced benefits, especially at higher levels of allowance for unmeasured confounding. Finally, we demonstrate our method in an observational study of the multi-dimensional impacts of a devastating flood in Bangladesh.
- Discussant: Richard Guo (University of Michigan)
- Links: [Relevant papers: paper #1]

Monday, October 7th, 2024 [Recording]
- Speaker: Lei Shi (UC Berkeley)
- Title: Forward selection and post-selection inference in factorial designs
- Abstract: Ever since the seminal work of R. A. Fisher and F. Yates, factorial designs have been an important experimental tool to simultaneously estimate the effects of multiple treatment factors. In factorial designs, the number of treatment combinations grows exponentially with the number of treatment factors, which motivates the forward selection strategy based on the sparsity, hierarchy, and heredity principles for factorial effects. Although this strategy is intuitive and has been widely used in practice, its rigorous statistical theory has not been formally established. To fill this gap, we establish design-based theory for forward factor selection in factorial designs based on the potential outcome framework. We not only prove a consistency property for the factor selection procedure but also discuss statistical inference after factor selection. In particular, with selection consistency, we quantify the advantages of forward selection based on asymptotic efficiency gain in estimating factorial effects. With inconsistent selection in higher-order interactions, we propose two strategies and investigate their impact on subsequent inference. Our formulation differs from the existing literature on variable selection and post-selection inference because our theory is based solely on the physical randomization of the factorial design and does not rely on a correctly specified outcome model.
- Discussant: Matthew Blackwell (Harvard University)
- Links: [Relevant papers: paper #1]

Monday, September 30th, 2024 [Recording]
- Speaker: Will Fithian (UC Berkeley)
- Title: Estimating the false discovery rate of variable selection
- Abstract: We introduce a generic estimator for the false discovery rate of any model selection procedure, in common statistical modeling settings including the Gaussian linear model, Gaussian graphical model, and model-X setting. We prove that our method has a conservative (non-negative) bias in finite samples under standard statistical assumptions, and provide a bootstrap method for assessing its standard error. For methods like the Lasso, forward-stepwise regression, and the graphical Lasso, our estimator serves as a valuable companion to cross-validation, illuminating the tradeoff between prediction error and variable selection accuracy as a function of the model complexity parameter. This is joint work with Yixiang Luo and Lihua Lei.
- Discussant: Lucas Janson (Harvard University)
- Links: [Relevant papers: paper #1]

Thursday, June 20, 2024 [Recording]
- Speaker: Junu Lee (University of Pennsylvania)
- Title: Boosting e-BH via conditional calibration
- Abstract: The e-BH procedure is an e-value-based multiple testing procedure that provably controls the false discovery rate (FDR) under any dependence structure between the e-values. Despite this appealing theoretical FDR control guarantee, the e-BH procedure often suffers from low power in practice. In this paper, we propose a general framework that boosts the power of e-BH without sacrificing its FDR control under arbitrary dependence. This is achieved by the technique of conditional calibration, where we take as input the e-values and calibrate them to be a set of "boosted e-values" that are guaranteed to be no less -- and are often more -- powerful than the original ones. Our general framework is explicitly instantiated in three classes of multiple testing problems: (1) testing under parametric models, (2) conditional independence testing under the model-X setting, and (3) model-free conformalized selection. Extensive numerical experiments show that our proposed method significantly improves the power of e-BH while continuing to control the FDR. We also demonstrate the effectiveness of our method through an application to an observational study dataset for identifying individuals whose counterfactuals satisfy certain properties.
- Discussant: Yixiang Luo (University of California, Berkeley)
- Links: [Relevant papers: paper #1]

Thursday, June 13, 2024 [Recording]
- Speaker: Leying Guan (Yale University)
- Title: A conformal test of linear models via permutation-augmented regressions
- Abstract: Permutation tests are widely recognized as robust alternatives to tests based on normal theory. Random permutation tests have been frequently employed to assess the significance of variables in linear models. Despite their widespread use, existing random permutation tests lack finite-sample and assumption-free guarantees for controlling type I error in partial correlation tests. To address this ongoing challenge, we have developed a conformal test through permutation-augmented regressions, which we refer to as PALMRT. PALMRT not only achieves power competitive with conventional methods but also provides reliable control of type I errors at no more than $2\alpha$, given any targeted level $\alpha$, for arbitrary fixed designs and error distributions. We have confirmed this through extensive simulations. Compared to existing assumption-free tests for linear models, PALMRT does not compromise as much on power or set stringent requirements on the sample size, making it suitable for diverse biomedical applications. We further illustrate its advantage in a long-Covid study where PALMRT validated key findings previously identified using the t-test after multiple corrections, while alternative distribution-free tests suffered from a drastic loss of power and failed to identify any discoveries. We endorse PALMRT as a robust and practical hypothesis test in scientific research for its superior error control, power preservation, and simplicity.
- Discussant: Guillaume Pouliot (University of Chicago)
- Links: [Relevant papers: paper #1]

Thursday, June 6, 2024 [Recording]
- Speaker: Changliang Zou (Nankai University)
- Title: CAP: A General Algorithm for Online Selective Conformal Prediction with FCR Control
- Abstract: We study the problem of post-selection predictive inference in an online fashion. To avoid devoting resources to unimportant units, a preliminary selection of the current individual before reporting its prediction interval is common and meaningful in online predictive tasks. Since the online selection causes a temporal multiplicity in the selected prediction intervals, it is important to control the real-time false coverage-statement rate (FCR) which measures the overall miscoverage level. We develop a general framework named CAP (Calibration after Adaptive Pick) that performs an adaptive pick rule on historical data to construct a calibration set if the current individual is selected and then outputs a conformal prediction interval for the unobserved label. We provide tractable procedures for constructing the calibration set for popular online selection rules. We proved that CAP can achieve an exact selection-conditional coverage guarantee in the finite-sample and distribution-free regimes. To account for the distribution shift in online data, we also embed CAP into some recent dynamic conformal prediction algorithms and show that the proposed method can deliver long-run FCR control. Numerical results on both synthetic and real data corroborate that CAP can effectively control FCR around the target level and yield more narrowed prediction intervals over existing baselines across various settings.
- Discussant: Ying Jin (Stanford University)
- Links: [Relevant papers: paper #1]

Thursday, May 30, 2024 [Recording]
- Speaker: Yuetian Luo (University of Chicago)
- Title: The Limits of Assumption-free Tests for Algorithm Performance
- Abstract: Algorithm evaluation and comparison are fundamental questions in machine learning and statistics -- how well does an algorithm perform at a given modeling task, and which algorithm performs best? Many methods have been developed to assess algorithm performance, often based around cross-validation type strategies, retraining the algorithm of interest on different subsets of the data and assessing its performance on the held-out data points. Despite the broad use of such procedures, the theoretical properties of these methods are not yet fully understood. In this work, we explore some fundamental limits for answering these questions with limited amounts of data. In particular, we make a distinction between two questions: how good is an algorithm A at the problem of learning from a training set of size n, versus, how good is a particular fitted model produced by running A on a particular training data set of size n? Our main results prove that, for any test that treats the algorithm A as a ``black box'' (i.e., we can only study the behavior of A empirically), there is a fundamental limit on our ability to carry out inference on the performance of A, unless the number of available data points N is many times larger than the sample size n of interest. (On the other hand, evaluating the performance of a particular fitted model is easy as long as a holdout data set is available -- that is, as long as N−n is not too small.) We also ask whether an assumption of algorithmic stability might be sufficient to circumvent this hardness result. Surprisingly, we find that this is not the case: the same hardness result still holds for the problem of evaluating the performance of A, aside from a high-stability regime where fitted models are essentially nonrandom. Finally, we also establish similar hardness results for the problem of comparing multiple algorithms. This is joint work with Rina Foygel Barber.
- Discussant: Brian Trippe (Columbia University)
- Links: [Relevant papers: paper #1][Response to the discussion]

Thursday, May 23, 2024 [Recording]
- Speaker: Zhaomeng Chen (Stanford University)
- Title: Controlled Variable Selection from Summary Statistics Only? A Solution via GhostKnockoffs and Penalized Regression
- Abstract: Identifying which variables do influence a response while controlling false positives pervades statistics and data science. In this paper, we consider a scenario in which we only have access to summary statistics, such as the values of marginal empirical correlations between each dependent variable of potential interest and the response. This situation may arise due to privacy concerns, e.g., to avoid the release of sensitive genetic information. We extend GhostKnockoffs [He et al., 2022] and introduce variable selection methods based on penalized regression achieving false discovery rate (FDR) control. We report empirical results in extensive simulation studies, demonstrating enhanced performance over previous work. We also apply our methods to genome-wide association studies of Alzheimer’s disease, and evidence a significant improvement in power.
- Discussant: Lan Gao (University of Tennessee Knoxville)
- Links: [Relevant papers: paper #1, paper #2]
Thursday, May 16, 2024 [Recording]
- Speaker: Youngjoo Yun (University of Wisconsin-Madison)
- Title: Selective inference for clustering with unknown variance
- Abstract: In many modern statistical problems, the limited available data must be used both to develop the hypotheses to test, and to test these hypotheses—that is, both for exploratory and confirmatory data analysis. Reusing the same dataset for both exploration and testing can lead to massive selection bias, leading to many false discoveries. Selective inference is a framework that allows for performing valid inference even when the same data is reused for exploration and testing. In this work, we are interested in the problem of selective inference for data clustering, where a clustering procedure is used to hypothesize a separation of the data points into a collection of subgroups, and we then wish to test whether these data-dependent clusters in fact represent meaningful differences within the data. Recent work by Gao, Bien and Witten (2022) provides a framework for doing selective inference for this setting, where a hierarchical clustering algorithm is used for producing the cluster assignments, which was then extended to k-means clustering by Chen and Witten (2022). Both these works rely on assuming a known covariance structure for the data, but in practice, the noise level needs to be estimated—and this is particularly challenging when the true cluster structure is unknown. In our work, we extend this work to the setting of noise with unknown variance, and provide a selective inference method for this more general setting. Empirical results show that our new method is better able to maintain high power while controlling Type I error when the true noise level is unknown.
- Discussant: Lucy Gao (University of British Columbia)
- Links: [Relevant papers: paper #1]

Thursday, May 9, 2024 [Recording]
- Speaker: Kevin Chen (Harvard University)
- Title: Optimal Conditional Inference in Adaptive Experiments
- Abstract: We study batched bandit experiments and consider the problem of inference conditional on the realized stopping time, assignment probabilities, and target parameter, where all of these may be chosen adaptively using information up to the last batch of the experiment. Absent further restrictions on the experiment, we show that inference using only the results of the last batch is optimal. When the adaptive aspects of the experiment are known to be location-invariant, in the sense that they are unchanged when we shift all batch-arm means by a constant, we show that there is additional information in the data, captured by one additional linear function of the batch-arm means. In the more restrictive case where the stopping time, assignment probabilities, and target parameter are known to depend on the data only through a collection of polyhedral events, we derive computationally tractable and optimal conditional inference procedures.
- Discussant: Karun Adusumilli (University of Pennsylvania)
- Links: [Relevant papers: paper #1]

Thursday, May 2, 2024 [Recording]
- Speaker: Brian Cho (Cornell University)
- Title: Peeking with PEAK: Sequential, Nonparametric Composite Hypothesis Tests for Means of Multiple Data Streams
- Abstract: We propose a novel nonparametric sequential test for composite hypotheses for means of multiple data streams. Our proposed method, peeking with expectation-based averaged capital (PEAK), builds upon the testing-as-betting framework and provides a non-asymptotic α-level test across any stopping time. PEAK is computationally tractable and efficiently rejects hypotheses that are incorrect across all potential distributions that satisfy our nonparametric assumption, enabling joint composite hypothesis testing on multiple streams of data. We numerically validate our theoretical findings under the best arm identification and threshold identification in the bandit setting, illustrating both the competitive performance and the computational efficiency of our method against state-of-the-art testing methods.
- Discussant: Shubhanshu Shekhar (Carnegie Mellon University)
- Links: [Relevant papers: paper #1]

Thursday, April 25, 2024 [Recording]
- Speaker: Haeun Moon (Carnegie Mellon University)
- Title: Augmented doubly robust post-imputation inference for proteomic data
- Abstract: Quantitative measurements produced by mass spectrometry proteomics experiments offer a direct way to explore the role of proteins in molecular mechanisms. However, analysis of such data is challenging due to the large proportion of missing values. A common strategy to address this issue is to utilize an imputed dataset, which often introduces systematic bias into downstream analyses if the imputation errors are ignored. In this paper, we propose a statistical framework inspired by doubly robust estimators that offers valid and efficient inference for proteomic data. Our framework combines powerful machine learning tools, such as variational autoencoders, to augment the imputation quality with high-dimensional peptide data, and a parametric model to estimate the propensity score for debiasing imputed outcomes. Our estimator is compatible with the double machine learning framework and has provable properties. Simulation studies verify its empirical superiority over other existing procedures. In application to both single-cell proteomic data and bulk-cell Alzheimer’s Disease data our method utilizes the imputed data to gain additional, meaningful discoveries and yet maintains good control of false positives.
- Discussant: Tijana Zrnic (Stanford University)
- Links: [Relevant papers: paper #1]

Thursday, April 18, 2024 [Recording]
- Speaker: Thorsten Dickhaus (University of Bremen)
- Title: Multiple marginal models for multinomial regression with high-dimensional covariates
- Abstract: Modern high-throughput biomedical devices routinely produce data on a large scale, and the analysis of high-dimensional datasets has become commonplace in biomedical studies. However, given thousands or tens of thousands of measured variables in these datasets, extracting meaningful features poses a challenge. We propose a procedure to evaluate the strength of the associations between a nominal (categorical) response variable and multiple features (covariates) simultaneously. Specifically, we propose a framework of large-scale multiple testing under arbitrary correlation dependency among test statistics. First, marginal multinomial regressions are performed for each feature individually. Second, we use an approach of multiple marginal models for each baseline-category pair to establish asymptotic joint normality of the stacked vector of the marginal multinomial regression coefficients. Third, we estimate the (limiting) covariance matrix between the estimated coefficients from all marginal models. Finally, our approach approximates the realized false discovery proportion of a thresholding procedure for the marginal p-values for each baseline-category logit pair. We demonstrate a practical application of the method to hyperspectral imaging data. The dataset is obtained by a matrix-assisted laser desorption/ionization instrument. This is joint work with Vladimir Vutov.
- Discussant: Lucy Xia (Hong Kong University of Science and Technology)
- Links: [Relevant papers: paper #1, paper #2]

Thursday, April 11, 2024 [Recording]
- Speaker: Wanteng Ma (Hong Kong University of Science and Technology)
- Title: Multiple Testing of Linear Forms for Noisy Matrix Completion
- Abstract: Many important tasks of large-scale recommender systems can be naturally cast as testing multiple linear forms for noisy matrix completion. These problems, however, present unique challenges because of the subtle bias-and-variance tradeoff and an intricate dependency among the estimated entries induced by the low-rank structure. Here, we develop a general approach to overcome these difficulties by introducing new statistics for individual tests with sharp asymptotics both marginally and jointly, and utilizing them to control the false discovery rate (FDR) via a data splitting and symmetric aggregation scheme. We show that valid FDR control can be achieved with guaranteed power under nearly optimal sample size requirements using the proposed methodology. Extensive numerical simulations and real data examples are also presented to further illustrate its practical merits.
- Discussant: Yu Gui (University of Chicago)
- Links: [Relevant papers: paper #1]

Thursday, April 4, 2024 [Recording]
- Speaker: Matteo Gasparin (University of Padova)
- Title: Merging uncertainty sets via majority vote
- Abstract: Given K uncertainty sets that are arbitrarily dependent -- for example, confidence intervals for an unknown parameter obtained with K different estimators, or prediction sets obtained via conformal prediction based on K different algorithms on shared data -- we address the question of how to efficiently combine them in a black-box manner to produce a single uncertainty set. We present a simple and broadly applicable majority vote procedure that produces a merged set with nearly the same error guarantee as the input sets. We then extend this core idea in a few ways: we show that weighted averaging can be a powerful way to incorporate prior information, and a simple randomization trick produces strictly smaller merged sets without altering the coverage guarantee. Further improvements can be obtained inducing exchangeability within the sets. When deployed in online settings, we show how the exponential weighted majority algorithm can be employed in order to learn a good weighting over time. We then combine this method with adaptive conformal inference to deliver a simple conformal online model aggregation (COMA) method for nonexchangeable data.
- Discussant: David Ritzwoller (Stanford University)
- Links: [Relevant papers: paper #1]

Thursday, March 28, 2024 [Recording] (8:30 am PT / 11:30 am ET / 3:30 pm London / 5:30 pm Tel Aviv, time change due to Daylight Savings Time)
- Speaker: Armeen Taeb (University of Washington)
- Title: On False Positive Error
- Abstract: Controlling the false positive error in model selection is a prominent paradigm for gathering evidence in data-driven science. In model selection problems such as variable selection and graph estimation, models are characterized by an underlying Boolean structure such as presence or absence of a variable or an edge. Therefore, false positive error or false negative error can be conveniently specified as the number of variables/edges that are incorrectly included or excluded in an estimated model. However, the increasing complexity of modern datasets has been accompanied by the use of sophisticated modeling paradigms in which defining false positive error is a significant challenge. For example, models specified by structures such as partitions (for clustering), permutations (for ranking), directed acyclic graphs (for causal inference), or subspaces (for principal components analysis) are not characterized by a simple Boolean logical structure, which leads to difficulties with formalizing and controlling false positive error. We present a generic approach to endow a collection of models with partial order structure, which leads to systematic approaches for defining natural generalizations of false positive error and methodology for controlling this error. (Joint work with Peter Bühlmann, Venkat Chandrasekaran, and Parikshit Shah)
- Discussant: Peter Hansen (University of North Carolina, Chapel Hill)
- Links: [Relevant papers: paper #1, paper #2]

Thursday, March 21, 2024 [Recording]
- Speaker: Etienne Roquain (Sorbonne Université)
- Title: Selecting informative conformal prediction sets with false coverage rate control
- Abstract: In supervised learning, including regression and classification, conformal methods provide distribution free prediction sets for the outcome/label with finite sample coverage for any machine learning predictors. We consider here the case where such prediction sets come after a selection process. The selection process requires that the selected prediction sets be informative in a well defined sense. We consider both the classification and regression settings where the analyst may consider as informative only the sample with prediction label sets or prediction intervals small enough, excluding a null value, or obeying other appropriate 'monotone' constraints. While this covers many settings of possible interest in various applications, we develop a unified framework for building such informative conformal prediction sets while controlling the false coverage rate (FCR) on the selected sample. This framework generalizes some recent results on selective conformal inference in the literature. We show the usefulness of our resulting procedures on real and simulated data.
- Discussant: Zijun Gao (University of Southern California)
- Links: [Relevant papers: paper#1] [slides#1 slides#2]

Thursday, March 14, 2024 [Recording]
- Speaker: Lars van der Laan (University of Washington)
- Title: Self-Consistent Conformal Prediction
- Abstract: In decision-making guided by machine learning, decision-makers often take identical actions in contexts with identical predicted outcomes. Conformal prediction helps decision-makers quantify outcome uncertainty for actions, allowing for better risk management. Inspired by this perspective, we introduce self-consistent conformal prediction, which yields both Venn-Abers calibrated predictions and conformal prediction intervals that are valid conditional on actions prompted by model predictions. Our procedure can be applied post-hoc to any black-box predictor to provide rigorous, action-specific decision-making guarantees. Numerical experiments show our approach strikes a balance between interval efficiency and conditional validity.
- Discussant: Tiffany Ding (UC Berkeley)
- Links: [Relevant papers: paper #1]

Thursday, March 7, 2024 [Recording]
- Speaker: Guanxun Li (Texas A&M University)
- Title: E-values, Multiple Testing and Beyond.
- Abstract: We discover a connection between the Benjamini-Hochberg (BH) procedure and the recently proposed e-BH procedure [Wang and Ramdas, 2022] with a suitably defined set of e-values. This insight extends to a generalized version of the BH procedure and the model-free multiple testing procedure in Barber and Cand`es [2015] (BC) with a general form of rejection rules. The connection provides an effective way of developing new multiple testing procedures by aggregating or assembling e-values resulting from the BH and BC procedures and their use in different subsets of the data. In particular, we propose new multiple testing methodologies in three applications, including a hybrid approach that integrates the BH and BC procedures, a multiple testing procedure aimed at ensuring a new notion of fairness by controlling both the group-wise and overall false discovery rates (FDR), and a structure adaptive multiple testing procedure that can incorporate external covariate information to boost detection power. One notable feature of the proposed methods is that we use a data-dependent approach for assigning weights to e-values, significantly enhancing the efficiency of the resulting e-BH procedure. The construction of the weights is non-trivial and is motivated by the leave-one-out analysis for the BH and BC procedures. In theory, we prove that the proposed e-BH procedures with data-dependent weights in the three applications ensure finite sample FDR control. Furthermore, we demonstrate the efficiency of the proposed methods through numerical studies in the three applications.
- Discussant: Peter W. MacDonald (McGill University)
- Links: [Relevant papers: paper #1]

Thursday, February 29, 2024 [Recording]
- Speaker: Livio Finos (University of Padova)
- Title: Post-selection Inference in Multiverse Analysis (PIMA): an inferential framework based on the sign flipping score test
- Abstract: When analyzing data researchers make some decisions that are either arbitrary, based on subjective beliefs about the data generating process, or for which equally justifiable alternative choices could have been made. This wide range of data-analytic choices can be abused, and has been one of the underlying causes of the replication crisis in several fields. Recently, the introduction of multiverse analysis provides researchers with a method to evaluate the stability of the results across reasonable choices that could be made when analyzing data. Multiverse analysis is confined to a descriptive role, lacking a proper and comprehensive inferential procedure. Recently, specification curve analysis adds an inferential procedure to multiverse analysis, but this approach is limited to simple cases related to the linear model, and only allows researchers to infer whether at least one specification rejects the null hypothesis, but not which specifications should be selected. In this paper we present a Post-selection Inference approach to Multiverse Analysis (PIMA) which is a flexible and general inferential approach that accounts for all possible models, i.e., the multiverse of reasonable analyses. The approach allows for a wide range of data specifications (i.e. pre-processing) and any generalized linear model; it provides strong control of the family-wise error rate such that it allows researchers to claim that the null hypothesis can be rejected for each specification that shows a significant effect. The inferential proposal is based on the sign-flip score test of Hemerik et al. (2020) and De Santis et al. (2022) that will be briefly introduced in the talk.
- Discussant: Jesse Hemerik (Erasmus University Rotterdam)
- Links: [Relevant papers: paper #1, paper #2, paper #3]

Thursday, February 22, 2024 [Recording]
- Speaker: Nick Koning (Erasmus University)
- Title: Post-hoc p-values
- Abstract: A pervasive methodological error is the post-hoc interpretation of p-values. A p-value p is the smallest significance level at which we would have rejected the null had we chosen level p. It is not the smallest significance level at which we reject the null. We introduce post-hoc p-values, that do admit such a post-hoc interpretation. We show that p is a post-hoc p-value if and only if 1/p is an e-value, a recently introduced statistical object. The product of independent post-hoc p-values is a post-hoc p-value, making them easy to combine. Moreover, any post-hoc p-value can be trivially improved if we permit external randomization, but only (essentially) non-randomized post-hoc p-values can be arbitrarily merged through multiplication. In addition, we discuss what constitutes a `good' post-hoc p-value. Finally, we argue that post-hoc p-values eliminate the need of a pre-specified significance level, such as \alpha = .05 or \alpha = .005 Benjamin et al. (2018). We believe this may take away incentives for p-hacking and contribute to solving the file-drawer problem, as both these issues arise from using a pre-specified significance level.
- Discussant: Ruodu Wang (University of Waterloo)
- Links: [Relevant papers: paper #1]

Thursday, February 15, 2024 [Recording]

Speaker: Lasse Fischer (University of Bremen)
Title: Sequential permutation testing by betting
Abstract: We develop an anytime-valid permutation test, where the dataset is fixed and the permutations are sampled sequentially one by one, with the objective of saving computational resources by sampling fewer permutations and stopping early. The core technical advance is the development of new test martingales (nonnegative martingales with initial value one) for testing exchangeability against a very particular alternative. These test martingales are constructed using new and simple betting strategies that smartly bet on the relative ranks of permuted test statistics. The betting strategy is guided by the derivation of a simple log-optimal betting strategy, and displays excellent power in practice. In contrast to a well-known method by Besag and Clifford, our method yields a valid e-value or a p-value at any stopping time, and with particular stopping rules, it yields computational gains under both the null and the alternative without compromising power.
Discussant: Anna Vesely (University of Bologna)
Links: [Relevant papers: paper #1]

Thursday, February 8, 2024 [Recording]

Speaker: Javier González-Delgado (McGill University)
Title: Post-clustering Inference under Dependency
Abstract: Recent work by Gao et al. has laid the foundations for post-clustering inference. For the first time, the authors established a theoretical framework allowing to test for differences between means of estimated clusters. Additionally, they studied the estimation of unknown parameters while controlling the selective type I error. However, their theory was developed for independent observations identically distributed as p-dimensional Gaussian variables with a spherical covariance matrix. In this talk, we aim at extending this framework to a more convenient scenario for practical applications, where arbitrary dependence structures between observations and features are allowed. We show that a p-value for post-clustering inference under general dependency can be defined, and we assess the theoretical conditions allowing the compatible estimation of a covariance matrix. The theory is developed for hierarchical agglomerative clustering algorithms with several types of linkages, and for the k-means algorithm. We illustrate our method with synthetic data and real data of protein structures.
Discussant: Youngjoo Yun (University of Wisconsin-Madison)
Links: [Relevant papers: paper #1]

Thursday, February 1, 2024 [Recording]

Speaker: Edgar Dobriban (University of Pennsylvania)
Title: Efficient and Multiply Robust Risk Estimation under General Forms of Dataset Shift
Abstract: Statistical machine learning methods often face the challenge of limited data available from the population of interest. One remedy is to leverage data from auxiliary source populations, which share some conditional distributions or are linked in other ways with the target domain. Techniques leveraging such dataset shift conditions are known as domain adaptation or transfer learning. Despite extensive literature on dataset shift, limited works address how to efficiently use the auxiliary populations to improve the accuracy of risk evaluation for a given machine learning task in the target population.

In this paper, we study the general problem of efficiently estimating target population risk under various dataset shift conditions, leveraging semiparametric efficiency theory. We consider a general class of dataset shift conditions, which includes three popular conditions---covariate, label and concept shift---as special cases. We allow for partially non-overlapping support between the source and target populations. We develop efficient and multiply robust estimators along with a straightforward specification test of these dataset shift conditions. We also derive efficiency bounds for two other dataset shift conditions, posterior drift and location-scale shift. Simulation studies support the efficiency gains due to leveraging plausible dataset shift conditions. This is joint work with Hongxiang David Qiu and Eric Tchetgen Tchetgen.

Discussant: Alex Luedtke (University of Washington)
Links: [Relevant papers: paper #1]

Thursday, January 25, 2024 [To be released]
- Speaker: Xinzhou Guo (The Hong Kong University of Science and Technology)
- Title: Inference on Potentially Identified Subgroups in Clinical Trials
- Abstract: When subgroup analyses are conducted in clinical trials with moderate or high dimensional covariates, we often need to identify candidate subgroups from the data and evaluate the potentially identified subgroups in a replicable way. The usual statistical inference applied to the potentially identified subgroups, assuming the subgroups are just what we observe from the data, might suffer from bias issue when the regularity assumption that heterogeneity exists is violated. In this talk, we introduce a shift-based method to address nonregularity bias issue and combined with subsampling, develop a de-biased inference procedure for potentially identified subgroups. The proposed method is model-free and asymptotically efficient. We show that with appropriate noise added to the shift, the proposed method can be viewed as an asymmetric smoothing approach and achieve privacy protection while remaining valid and efficient. We demonstrate the merits of the proposed method by re-analyzing the ACTG 175 trial.
- Discussant: Chengchun Shi (London School of Economics and Political Science)
- Links: [Relevant papers:]

Thursday, January 18, 2024 [Recording]
- Speaker: Zijun Gao (University of Southern California)
- Title: A constructive approach to selective risk control
- Abstract: Many modern applications require the use of data to both select the statistical tasks and make valid inference after selection. In this article, we provide a unifying approach to control for a class of selective risks. Our method is motivated by a reformulation of the celebrated Benjamini–Hochberg (BH) procedure for multiple hypothesis testing as the iterative limit of the Benjamini–Yekutieli (BY) procedure for constructing post-selection confidence intervals. Although several earlier authors have made noteworthy observations related to this, our discussion highlights that (1) the BH procedure is precisely the fixed-point iteration of the BY procedure; (2) the fact that the BH procedure controls the false discovery rate is almost an immediate corollary of the fact that the BY procedure controls the false coverage-statement rate. Building on this observation, we propose a constructive approach to control extra-selection risk (selection made after decision) by iterating decision strategies that control the post-selection risk (decision made after selection), and show that many previous methods and results are special cases of this general framework. We further extend this approach to problems with multiple selective risks and demonstrate how new methods can be developed. Our development leads to two surprising results about the BH procedure: (1) in the context of one-sided location testing, the BH procedure not only controls the false discovery rate at the null but also at other locations for free; (2) in the context of permutation tests, the BH procedure with exact permutation p-values can be well approximated by a procedure which only requires a total number of permutations that is almost linear in the total number of hypotheses.
- Discussant: Asaf Weinstein (Hebrew University of Jerusalem)
- Links: [Relevant papers:][Slides]

Wednesday, December 20, 2023 [Session not recorded]
- Speaker: Vladimir Vovk (Royal Holloway, University of London)
- Title: The diachronic Bayesian
- Abstract: It is well known that a Bayesian probability forecast for the future observations should form a probability measure in order to satisfy a natural condition of coherence. The topic of this paper is the evolution of the Bayesian probability measure over time. We model the process of updating the Bayesian’s beliefs in terms of prediction markets. The resulting picture is adapted to forecasting several steps ahead and making almost optimal decisions.
- Discussant: Philip Dawid (University of Cambridge)
- Links: [Relevant papers: paper #1][Slides]

Wednesday, December 13, 2023 [Session not recorded]
- Speaker: Lucas Janson (Harvard University)
- Title: Leveraging sparsity in the Gaussian linear model for improved inference
- Abstract: We develop novel LASSO-based methods for coefficient testing, confidence interval construction, and variable selection in the Gaussian linear model with n ≥ p that have the same finite-sample guarantees as their ubiquitous ordinary-least-squares-t-test-based analogues, yet have substantially higher power when the true coefficient vector is sparse. Empirically, our method often performs like the 1-sided t-test (despite not being given any information about the sign), and in particular our confidence intervals are typically about 15% shorter than the standard t-test based intervals. Our single coefficient testing framework trivially allows for exact adjustment conditional on LASSO selection for post-selection inference, and subsequently applying standard multiple testing procedures to the resulting post-selection-valid p-values again provides significant power gains over existing methods. None of our methods require resampling or Monte Carlo estimation. We perform a variety of simulations and a real data analysis on an HIV drug resistance data set to demonstrate the benefits of our methods over existing work. In the course of developing these methods, we also derive novel properties of the LASSO in the Gaussian linear model that are of independent interest. Finally, we argue, and in some cases demonstrate, that the principles we develop can be extended beyond Gaussian linear models with n ≥ p. This is joint work with Souhardya Sengupta.
- Discussant: Zhimei Ren (University of Pennsylvania)
- Links: [Relevant papers:]

Wednesday, December 6, 2023 [Recording]
- Speaker: Pierre Neuvial (Institut de Mathématiques de Toulouse (IMT))
- Title: Selective inference after convex clustering with ℓ1 penalization
- Abstract: Classical inference methods notoriously fail when applied to data-driven test hypotheses or inference targets. Instead, dedicated methodologies are required to obtain statistical guarantees for these selective inference problems. Selective inference is particularly relevant post-clustering, typically when testing a difference in mean between two clusters. In this paper, we address convex clustering with ℓ1 penalization, by leveraging related selective inference tools for regression, based on Gaussian vectors conditioned to polyhedral sets. In the one-dimensional case, we prove a polyhedral characterization of obtaining given clusters, than enables us to suggest a test procedure with statistical guarantees. This characterization also allows us to provide a computationally efficient regularization path algorithm. Then, we extend the above test procedure and guarantees to multi-dimensional clustering with ℓ1 penalization, and also to more general multi-dimensional clusterings that aggregate one-dimensional ones. With various numerical experiments, we validate our statistical guarantees and we demonstrate the power of our methods to detect differences in mean between clusters. Our methods are implemented in the R package poclin.
- Discussant: Yiqun Chen (Stanford University)
- Links: [Relevant papers: paper #1]

Wednesday, November 29, 2023 [Recording]
- Speaker: Trambak Banerjee (University of Kansas)
- Title: Harnessing The Collective Wisdom: Fusion Learning Using Decision Sequences From Diverse Sources
- Abstract: Learning from the collective wisdom of crowds enhances the transparency of scientific findings by incorporating diverse perspectives into the decision-making process. Synthesizing such collective wisdom is related to the statistical notion of fusion learning from multiple data sources or studies. However, fusing inferences from diverse sources is challenging since cross-source heterogeneity and potential data-sharing complicate statistical inference. Moreover, studies may rely on disparate designs, employ widely different modeling techniques for inferences, and prevailing data privacy norms may forbid sharing even summary statistics across the studies for an overall analysis. In this paper, we propose an Integrative Ranking and Thresholding (IRT) framework for fusion learning in multiple testing. IRT operates under the setting where from each study a triplet is available: the vector of binary accept-reject decisions on the tested hypotheses, the study-specific False Discovery Rate (FDR) level and the hypotheses tested by the study. Under this setting, IRT constructs an aggregated, nonparametric, and discriminatory measure of evidence against each null hypotheses, which facilitates ranking the hypotheses in the order of their likelihood of being rejected. We show that IRT guarantees an overall FDR control under arbitrary dependence between the evidence measures as long as the studies control their respective FDR at the desired levels. Furthermore, IRT synthesizes inferences from diverse studies irrespective of the underlying multiple testing algorithms employed by them. While the proofs of our theoretical statements are elementary, IRT is extremely flexible, and a comprehensive numerical study demonstrates that it is a powerful framework for pooling inferences.
- Discussant: Molei Liu (Columbia University)
- Links: [Relevant papers: paper #1]

Wednesday, November 15, 2023 [Recording]
- Speaker: Chiara Sabatti (Stanford University)
- Title: Catch me if you can: Signal localization with knockoff e-values
- Abstract: We consider problems where many, somewhat redundant, hypotheses are tested and we are interested in reporting the most precise rejections, with false discovery rate (FDR) control. For example, a common goal in genetics is to identify DNA variants that carry distinct information on a trait of interest. However, strong local dependencies between nearby variants make it challenging to distinguish which of the many correlated features most directly influence the phenotype. A common solution is then to identify sets of variants that cover the truly important ones. Depending on the signal strengths, it is possible to resolve the individual variant contributions with more or less precision. Assuring FDR control on the reported findings with these adaptive searches is, however, often impossible. To design a multiple comparison procedure that allows for an adaptive choice of resolution with FDR control, we leverage e-values and linear programming. We adapt this approach to problems where knockoffs and group knockoffs have been successfully applied to test conditional independence hypotheses. We demonstrate its efficacy by analyzing data from the UK Biobank.
- Discussant: Matteo Sesia (University of Southern California)
- Links: [Relevant papers: paper #1]

Wednesday, November 8, 2023 [Recording]
Speaker: Emmanuel Candès (Stanford University)
Title: Conformal Prediction with Conditional Validity
Abstract: We consider the problem of constructing distribution-free prediction sets with finite-sample conditional guarantees. Prior work has shown that it is impossible to provide exact conditional coverage universally in finite samples. Thus, most popular methods only provide marginal coverage over the covariates. We bridge this gap by defining a spectrum of problems that interpolate between marginal and conditional validity. After the reformulation of conditional coverage as coverage over a class of covariate shifts, we show how to simultaneously obtain exact finite sample coverage over all possible shifts when the target class of shifts is finite dimensional. For example, our algorithm outputs intervals with exact coverage over each group from a collection of protected groups. For more flexible, infinite dimensional classes where exact coverage is impossible, we provide a simple procedure for quantifying the gap between the coverage of our algorithm and the target level. Moreover, by tuning a single hyperparameter, we allow the practitioner to control the size of this gap across shifts of interest. Our methods can be easily incorporated into existing split conformal inference pipelines, and thus can be used to quantify the uncertainty of modern black-box algorithms without distributional assumptions.
Discussant: Aaron Roth (University of Pennsylvania)
Links: [Relevant papers: paper #1]

Wednesday, November 1, 2023 [Recording]
Speaker: Xiaoxia Shi (University of Wisconsin–Madison)
Title: Testing Inequalities Linear in Nuisance Parameters
Abstract: This paper proposes a new test for inequalities linear in possibly partially identified nuisance parameters. It extends the subvector conditional chi-squared (CC) test in Cox and Shi (2022, CS22) to a setting where the nuisance parameter is multiplied to an unknown and estimable matrix. Properly accounting for the estimation noise in this matrix while maintaining the simplicity of the CC test is the main innovation of this paper. As such, the paper provides a simple solution to a broad set of problems including subvector inference for models represented by linear programs, nonparametric instrumental variable models with discrete regressor and instruments, and linear unconditional moment inequality models. We also derive a simplified formula for computing the critical value that makes the computation of the CC test elementary.
Discussant: Thomas Russell (Carleton University)
Links: [Relevant papers:]

Wednesday, October 25, 2023 [Recording]
- Speaker: Aaditya Ramdas (Carnegie Mellon University)
- Title: Recent advances in multiple testing: negative dependence and randomization
- Abstract: The multiple testing literature has primarily dealt with three types of dependence assumptions between p-values: independence, positive regression dependence, and arbitrary dependence. In the first part half, I will first summarize what the first theoretical results under various notions of negative dependence. These include the Simes global null test and the Benjamini-Hochberg procedure, which are known experimentally to be anti-conservative under negative dependence. We prove that the anti-conservativeness of these procedures is bounded by factors smaller than that under arbitrary dependence (in particular, by factors independent of the number of hypotheses tested). In the second half, I will show that the famous Benjamini-Yekutieli procedure for FDR control under arbitrary dependence can be improved (usually strictly) via a simple external randomization. Along the way, we will improve other procedures as well, like the e-BH procedure for FDR control with e-values.
- Discussant: Sanat K. Sarkar (Temple University)
- Links: [Relevant papers: paper #1, paper #2]

Wednesday, October 18, 2023 [Recording]
- Speaker: Peter Grünwald (Centrum Wiskunde & Informatica and Leiden University)
- Title: Beyond Neyman-Pearson: testing and confidence without setting alpha in advance
- Abstract: A standard practice in statistical hypothesis testing is to mention the p-value alongside the accept/reject decision. We show a major advantage of mentioning an e-value instead. With p-values, we cannot easily use an extreme observation (e.g. p << alpha) for getting better frequentist decisions. With e-values we can, since they provide Type-I risk control in a generalized Neyman-Pearson setting with the decision task (a general loss function) determined post-hoc, after observation of the data --- thereby providing a handle on the age-old "roving alpha" problem in statistics: we obtain robust "Type-I risk bounds" which hold independently of any preset alpha or loss function. The reasoning can be extended to confidence intervals. When Type-II risks are taken into consideration, the only admissible decision rules in the post-hoc setting turn out to be e-value-based. Similarly, if the loss incurred when specifying a faulty confidence interval is not fixed in advance, standard confidence intervals and distributions may fail whereas e-confidence sets and e-posteriors still provide valid risk guarantees.
- Discussant: Will Hartog (Stanford University)
- Links: [Relevant papers: paper #1, paper #2]

Wednesday, October 11, 2023 [Recording]
- Speaker: Richard Samworth (University of Cambridge)
- Title: Isotonic subgroup selection
- Abstract: Given a sample of covariate-response pairs, we consider the subgroup selection problem of identifying a subset of the covariate domain where the regression function exceeds a pre-determined threshold. We introduce a computationally-feasible approach for subgroup selection in the context of multivariate isotonic regression based on martingale tests and multiple testing procedures for logically-structured hypotheses. Our proposed procedure satisfies a non-asymptotic, uniform Type I error rate guarantee with power that attains the minimax optimal rate up to poly-logarithmic factors. Extensions cover classification, isotonic quantile regression and heterogeneous treatment effect settings. Numerical studies on both simulated and real data confirm the practical effectiveness of our proposal, which is implemented in the R package ISS.
- Discussant: Xinzhou Guo (The Hong Kong University of Science and Technology)
- Links: [Relevant papers: paper #1]

Wednesday, June 28, 2023 [Recording]
- Speaker: Minge Xie (Rutgers University)
- Title: Repro Samples Method for Uncertainty Quantification of Irregular Inference Problems and for Unraveling Machine Learning Blackboxes
- Abstract: Rapid data science developments require us to have new frameworks to tackle highly non-trivial irregular inference problems, e.g., those involving discrete or non-numerical parameters and those involving non-numerical data, etc. This talk presents a novel, wide-reaching and effective simulation-inspired framework, called repro samples method, to conduct statistical inference for the irregular inference problems plus more. We systemically develop both exact and approximate (asymptotic) theories to support the development. An attractive feature is that the method doesn't need to rely on a likelihood or the large sample central limit theorem, and thus is especially effective for complicated and irregular inference problems encountered in data science. The effectiveness of the method is illustrated by solving two open inference problems in statistics: a) construct a confidence set for the unknown number of components in a normal mixture; b) construct confidence sets for the unknown true model, the regression coefficients, or both true model and coefficients jointly in a high dimensional regression model. Comparison studies show that the method has far superior performance to existing attempts. Although the case studies pertain to the traditional statistics models, the method also has direct extensions to complex machine learning models, e.g., (ensemble) tree models, neural networks, graphical models, etc. It is a new tool that has the potential to develop interpretable AI and unravel machine learning blackboxes.
- Discussant: Ryan Martin (North Carolina State University)
- Links: [Relevant papers: paper #1, paper #2][Slides]

Wednesday, June 21, 2023 [Recording]
- Speaker: Aldo Solari (University of Milano-Bicocca)
- Title: Simultaneous directional inference
- Abstract: We consider the problem of inference on the signs of n>1 parameters. Within a simultaneous inference framework, we aim to: identify as many of the signs of the individual parameters as possible; provide confidence bounds on the number of positive (or negative) parameters on subsets of interest. Our suggestion is as follows: start by using the data to select the direction of the hypothesis test for each parameter; then, adjust the one-sided p-values for the selection, and use them for simultaneous inference on the selected n one-sided hypotheses. The adjustment is straightforward assuming that the one-sided p-values are conditionally valid and mutually independent. Such assumptions are commonly satisfied in a meta-analysis, and we can apply our approach following a test of the global null hypothesis that all parameters are zero, or of the hypothesis of no qualitative interaction. We consider the use of two multiple testing principles: closed testing and partitioning. The novel procedure based on partitioning is more powerful, but slightly less informative: it only infers on positive and non-positive signs. The procedure takes at most a polynomial time, and we show its usefulness on a subgroup analysis of a medical intervention, and on a meta-analysis of an educational intervention.
- Discussant: Qingyuan Zhao (University of Cambridge)
- Links: [Relevant papers: paper #1]

Wednesday, June 14, 2023 [link to join]
- Speaker: Jonathan Roth (Brown University)
- Title: Inference for Linear Conditional Moment Inequalities
- Abstract: We show that moment inequalities in a wide variety of economic applications have a particular linear conditional structure. We use this structure to construct uniformly valid confidence sets that remain computationally tractable even in settings with nuisance parameters. We first introduce least favorable critical values which deliver non-conservative tests if all moments are binding. Next, we introduce a novel conditional inference approach which ensures a strong form of insensitivity to slack moments. Our recommended approach is a hybrid technique which combines desirable aspects of the least favorable and conditional methods. The hybrid approach performs well in simulations calibrated to Wollmann (2018), with favorable power and computational time comparisons relative to existing alternatives.
- Discussant: Kevin Chen (Harvard University)
- Links: [Relevant papers: paper #1][Slides]

Wednesday, June 7, 2023 [Recording]
- Speaker: Xianyang Zhang (Texas A&M University)
- Title: Joint Mirror Procedure: Controlling False Discovery Rate for Identifying Simultaneous Signals
- Abstract: In many applications, identifying a single feature of interest requires testing the statistical significance of several hypotheses. Examples include mediation analysis which simultaneously examines the existence of the exposure-mediator and the mediator-outcome effects, and replicability analysis aiming to identify simultaneous signals that exhibit statistical significance across multiple independent experiments. In this work, we develop a novel procedure, named joint mirror (JM), to detect such features while controlling the false discovery rate (FDR) in finite samples. The JM procedure iteratively shrinks the rejection region based on partially revealed information until a conservative false discovery proportion (FDP) estimate is below the target FDR level. We propose an efficient algorithm to implement the method. Extensive simulations demonstrate that our procedure can control the modified FDR, a more stringent error measure than the conventional FDR, and provide power improvement in several settings. Our method is further illustrated through real-world applications in mediation and replicability analyses.
- Discussant: Lin Gui (University of Chicago)
- Links: [Relevant papers: paper #1]

Wednesday, May 31, 2023 [Recording]
- Speaker: Arun Kumar Kuchibhotla (Carnegie Mellon University)
- Title: HulC: a computationally-efficient, assumption-lean statistical inference methodology
- Abstract: There are broadly two methods for statistical inference: the Wald technique and the resampling technique. The Wald method assumes the knowledge of the form of limiting distribution and estimates the unknown parameters of the limiting distribution to perform inference. The resampling method (including bootstrap and subsampling) assumes the existence of the limiting distribution and estimates the (unknown) limiting distribution using resampled data to perform inference. In this talk, a new method of inference called HulC (Convex hull-based confidence) that provably works under the weakest set of assumptions will be discussed. Additionally, the method has strong second-order coverage properties. I will also provide simulation support for the claims and show that method applies more generally than bootstrap and subsampling.
- Discussant: Ioannis Kosmidis (University of Warwick)
- Links: [Relevant papers: paper #1, paper #2, paper #3]

Wednesday, May 24, 2023 [Recording]
- Speaker: Vo Nguyen Le Duy (RIKEN Center for Advanced Intelligence Project)
- Title: Parametric Programming for More Powerful Conditional Selective Inference, with Applications to Deep Learning-Based Image Segmentation and Salient Region Detection
- Abstract: Conditional Selective Inference (SI) has been introduced as a promising approach for assessing the statistical reliability of data-driven hypotheses selected by data analysis algorithms. SI was first introduced as a statistical inference method for Lasso. The main idea of SI is to make inference conditional on the selection event (the event of hypothesis selection), which enables us to derive the exact conditional sampling distribution of the test statistic. A major drawback of the seminal SI method is that the selection event must be characterized in a simple form, i.e., a set of linear inequalities. To overcome the drawback, we propose a new computational approach for SI based on parametric programming (PP), which we call PP-based SI. We show that the proposed PP-based SI method is more powerful than the seminal SI method and applicable to a wider class of problems such as deep neural network-based image segmentation and deep learning-driven salient region. This is joint work with Professor Ichiro Takeuchi.
- Discussant: Snigdha Panigrahi (University of Michigan)
- Links: [Relevant papers: paper #1, paper #2, paper #3]

Wednesday, May 17, 2023 [Recording]
- Speaker: Ying Jin (Stanford University)
- Title: Selection by Prediction with (Weighted) Conformal p-values
- Abstract: In decision making or scientific discovery pipelines such as job hiring and drug discovery, before any resource-intensive step, there is often an initial screening step that uses predictions from a machine learning model to shortlist a few candidates from a large pool. We study a scenario where the goal is to select candidates whose unobserved outcomes exceed user-specified values. We formulate this problem as simultaneously testing multiple random hypotheses. Given a set of calibration data that are i.i.d. or exchangeable with the test sample, we leverage conformal inference ideas to construct p-values which obey an extended PRDS property and allow us to shortlist candidates with exact FDR control. The second part of the talk will cover an ongoing work that extends to settings with covariate shift between calibration and test samples, in which case weighted conformal p-values should be used. I will discuss potential failure of the PRDS property for weighted conformal p-values, and a conditional independence structure that allows to retain valid FDR control in finite sample. These are joint works with Emmanuel Candes.
- Discussant: Ariane Marandon (Sorbonne Université, LPSM)
- Links: [Relevant papers: paper #1]

Wednesday, May 10, 2023 [Recording]
- Speaker: Tijana Zrnic (University of California, Berkeley)
- Title: Locally Simultaneous Inference
- Abstract: In this talk I will discuss a new solution to selective inference called locally simultaneous inference. Unlike standard simultaneous inference, which conservatively delivers valid answers to all questions that could possibly have been asked, locally simultaneous inference only answers those questions that could plausibly have been asked in light of the observed data, all the while preserving rigorous type I error guarantees. For example, if the objective is to construct a confidence interval for the winning treatment effect in a clinical trial with multiple treatments, and it is obvious in hindsight that only one treatment had a chance to win, then locally simultaneous inference will return an interval that is nearly the same as the uncorrected, standard interval. Under mild regularity conditions, locally simultaneous inference strictly dominates standard simultaneous inference. Moreover, compared to conditional selective inference, which demands stronger, conditional guarantees, locally simultaneous inference is more easily applicable in nonparametric settings and is more numerically stable. This is joint work with Will Fithian.
- Discussant: Adam McColeskey (University of Colorado, Boulder)
- Links: [Relevant papers: paper #1]

Wednesday, May 3, 2023 [Recording]
- Speaker: Rajen Shah (University of Cambridge)
- Title: Rank-transformed subsampling: Inference for multiple data splitting and exchangeable p-values
- Abstract: Many testing problems are readily amenable to randomised tests such as those employing data splitting, which divide the data into disjoint parts for separate purposes. However despite their usefulness in principle, randomised tests have obvious drawbacks. Firstly, two analyses of the same dataset may lead to different results. Secondly, the test typically loses power because it does not fully utilise the entire sample. As a remedy to these drawbacks, we study how to combine the test statistics or p-values resulting from multiple random realisations such as through random data splits. We introduce rank-transformed subsampling as a general method for delivering large sample inference about the combined statistic or p-value under mild assumptions. We apply our methodology to a range of problems, including testing unimodality in high-dimensional data, testing goodness-of-fit of parametric quantile regression models, testing no direct effect in a sequentially randomised trial and calibrating cross-fit double machine learning confidence intervals. For the latter, our method improves coverage in finite samples and for the testing problems, our method is able to derandomise and improve power. Moreover, in contrast to existing p-value aggregation schemes that can be highly conservative, our method enjoys type-I error control that asymptotically approaches the nominal level.
- Discussant: Adel Javanmard (University of Southern California)
- Links: [Relevant papers: paper #1]

Wednesday, April 26, 2023 [Recording]
- Speaker: Ameer Dharamshi (University of Washington)
- Title: Generalized Data Thinning Using Sufficient Statistics
- Abstract: Our goal is to develop a general strategy to decompose a random variable X into multiple independent random variables, without sacrificing any information about unknown parameters. A recent paper showed that for some well-known natural exponential families, X can be "thinned" into independent random variables X(1),…,X(K), such that X=∑Kk=1X(k). In this paper, we generalize their procedure by relaxing this summation requirement and simply asking that some known function of the independent random variables exactly reconstruct X. This generalization of the procedure serves two purposes. First, it greatly expands the families of distributions for which thinning can be performed. Second, it unifies sample splitting and data thinning, which on the surface seem to be very different, as applications of the same principle. This shared principle is sufficiency. We use this insight to perform generalized thinning operations for a diverse set of families.
- Discussant: Yixiang Luo (University of California, Berkeley)
- Links: [Relevant papers: paper #1][Slides]

Wednesday, April 19, 2023 [Recording]
- Speaker: Sifan Liu (Stanford University)
- Title: An Exact Sampler for Inference after Polyhedral Model Selection
- Abstract: Inference after model selection can be computationally challenging when dealing with intractable conditional distributions. Markov chain Monte Carlo (MCMC) is a common method for drawing samples from these distributions, but its slow convergence can limit its practicality. In this work, we propose a Monte Carlo sampler specifically designed for Gaussian distributions and polyhedral selection events. The method uses importance sampling from a suitable proposal distribution with the separation-of-variable property, and employs conditional Monte Carlo and randomized quasi-Monte Carlo for further variance reduction. Compared to MCMC, our proposed estimator of p-values achieves much higher accuracy while providing reliable error estimation. We also develop a method for testing and constructing confidence intervals for multiple parameters using a single batch of samples, reducing the need for repeated sampling. This method provides an efficient and practical solution for conducting selective inference after a polyhedral model selection.
- Discussant: Drew Nguyen (University of California, Berkeley)
- Links: [Relevant papers:][Slides]

Wednesday, April 12, 2023 [Recording]
- Speaker: Lucas Janson (Harvard University)
- Title: Exact Conditional Independence Testing and Conformal Inference with Adaptively Collected Data
- Abstract: Randomization testing is a fundamental method in statistics, enabling inferential tasks such as testing for (conditional) independence of random variables, constructing confidence intervals in semiparametric location models, and constructing (by inverting a permutation test) model-free prediction intervals via conformal inference. Randomization tests are exactly valid for any sample size, but their use is generally confined to exchangeable data. Yet in many applications, data is routinely collected adaptively via, e.g., (contextual) bandit and reinforcement learning algorithms or adaptive experimental designs. In this paper we present a general framework for randomization testing on adaptively collected data (despite its non-exchangeability) that uses a weighted randomization test, for which we also present computationally tractable resampling algorithms for various popular adaptive assignment algorithms, data-generating environments, and types of inferential tasks. Finally, we demonstrate via a range of simulations the efficacy of our framework for both testing and confidence/prediction interval construction. This is joint work with Yash Nair.
- Discussant: Jing Lei (Carnegie Mellon University)
Links: [Relevant papers: paper #1]

Wednesday, April 5, 2023 [Recording]
- Speaker: Aleksandr (Sasha) Podkopaev (Carnegie Mellon University)
- Title: Independence Testing by Betting
- Abstract: Nonparametric independence testing --- testing the null hypothesis that the joint distribution of two random variables factorizes into the product of their respective marginals against the alternative that it does not --- is a classical statistical problem that has been extensively studied in the batch setting when an analyst specifies the sample size before collecting data. Sequential independence testing is a complementary approach which allows an analyst to analyze an incoming stream of data online. Following the principle of nonparametric testing by betting, we develop sequential kernelized independence tests (SKITs). Our tests (a) continuously monitor the data while controlling the false alarm rate, (b) are consistent, meaning that they are guaranteed to stop if the null is false, and (c) provably adapt to the complexity of a problem at hand, meaning that they stop earlier on easy tasks (and later on harder ones), exhibiting an interesting empirical-Bernstein behavior in the exponent of the power. In this talk, I will describe the key ideas that underlie our test, illustrate the theoretical and empirical results, and discuss extensions to settings where batch independence tests fail, such as testing the independence null in non-i.i.d., time-varying setups.
- Discussant: Will Hartog (Stanford University)
- Links: [Relevant papers: paper #1][Slides]

Wednesday, March 29, 2023 [Recording]
- Speaker: Yaniv Romano (Technion—Israel Institute of Technology)
- Title: Conformal Prediction is Robust to Label Noise
- Abstract: In this talk, we will explore the robustness of conformal prediction---a powerful tool for uncertainty quantification---to label noise. We will tackle both regression and classification problems and characterize when and how it is possible to construct uncertainty sets that correctly cover the unobserved noiseless ground truth labels.

Our theory and experiments suggest that conformal prediction with noisy labels and commonly used score functions conservatively covers the clean ground truth labels except in adversarial cases.

- Discussant: Hongxiang (David) Qiu
- Links: [Relevant papers: paper #1]

Wednesday, March 22, 2023 [Recording]
- Speaker: Kaspar Wuthrich (UC San Diego)
- Title: (When) should you adjust inferences for multiple hypothesis testing?
- Abstract: The use of multiple hypothesis testing adjustments varies widely in applied economic research, without consensus on when and how it should be done. We provide an economic foundation for this practice. Adjustments are often--but not always--appropriate in our framework when research influences multiple policy decisions. These adjustments depend on the nature of scale economies in the research production function and on economic interactions between policy decisions, with control of classical notions of compound error rates emerging in some but not all cases. Empirical analysis of a unique dataset on research project costs interpreted through the lens of the theory suggests both that MHT adjustments may be warranted and that standard procedures may be too conservative. When research examines multiple outcomes, on the other hand, this motivates aggregating outcomes into sufficient statistics for policy-making.
- Discussant: Aleksey Tetenov (University of Geneva)
- Links: [Relevant papers: paper #1]

Wednesday, March 15, 2023 [Recording]
- Speaker: Anastasios Angelopoulos (UC Berkeley)
- Title: Prediction-Powered Inference
- Abstract: In this talk I will discuss prediction-powered inference – a strategy for creating confidence intervals using machine-learning predictions and a small amount of ground-truth labels. When the predictions are accurate, the intervals shrink, yielding better statistical power. The validity of the intervals holds regardless of the machine-learning algorithm. Intervals can be computed for many estimands, such as means, quantiles, and linear and logistic regression coefficients. We demonstrate the benefits of prediction-powered inference with data sets from proteomics, genomics, electronic voting, remote sensing, census analysis, and ecology.
- Discussant: Ting Ye (University of Washington)
- Links: [Relevant papers: paper #1]

Wednesday, March 8, 2023 [Recording]
- Speaker: Werner Brannath (University of Bremen)
- Title: The population-wise error rate for clinical trials with overlapping populations
- Abstract: We introduce a new multiple type I error criterion for clinical trials with multiple, overlapping populations. Such trials are of interest in precision medicine where the goal is to develop treatments that are targeted to specific sub-populations defined by genetic and/or clinical biomarkers. The new criterion is based on the observation that not all type I errors are relevant to all patients in the overall population. If disjoint sub-populations are considered, no multiplicity adjustment appears necessary, since a claim in one sub-population does not affect patients in the other ones. For intersecting sub-populations we suggest to control the average multiple type I error rate, i.e. the probability that a randomly selected patient will be exposed to an inefficient treatment. We call this the population-wise error rate, exemplify it by a number of examples and illustrate how to control it with an adjustment of critical boundaries or adjusted 𝑝-values. We furthermore define corresponding simultaneous confidence intervals. We finally illustrate the power gain achieved by passing from family-wise to population-wise error rate control with two simple examples and a recently suggested multiple-testing approach for umbrella trials.
- Discussant: Dong Xi (Gilead)
- Links: [Relevant papers: paper #1]

Wednesday, March 1, 2023 [Recording]
- Speaker: Eugene Katsevich (University of Pennsylvania)
- Title: Reconciling model-X and doubly robust approaches to conditional independence testing
- Abstract: Model-X approaches to testing conditional independence between a predictor and an outcome variable given a vector of covariates usually assume exact knowledge of the conditional distribution of the predictor given the covariates. Nevertheless, model-X methodologies are often deployed with this conditional distribution learned in sample. We investigate the consequences of this choice through the lens of the distilled conditional randomization test (dCRT). We find that Type-I error control is still possible, but only if the mean of the outcome variable given the covariates is estimated well enough. This demonstrates that the dCRT is doubly robust, and motivates a comparison to the generalized covariance measure (GCM) test, another doubly robust conditional independence test. We prove that these two tests are asymptotically equivalent, and show that the GCM test is in fact optimal against (generalized) partially linear alternatives by leveraging semiparametric efficiency theory. In an extensive simulation study, we compare the dCRT to the GCM test. We find that the GCM test and the dCRT are quite similar in terms of both Type-I error and power, and that post-lasso based test statistics (as compared to lasso based statistics) can dramatically improve Type-I error control for both methods.
- Discussant: Shuangning Li (Harvard University)
- Links: [Relevant papers: paper #1][Slides]

Wednesday, February 22, 2023 [Recording]
- Speaker: Yuhao Wang (Tsinghua University)
- Title: Residual Permutation Test for High-Dimensional Regression Coefficient Testing
- Abstract: We consider the problem of testing whether a single coefficient is equal to zero in high-dimensional fixed-design linear models. In the high-dimensional setting where the dimension of covariates p is allowed to be in the same order of magnitude as sample size n, to achieve finite-population validity, existing methods usually require strong distributional assumptions on the noise vector (such as Gaussian or rotationally invariant), which limits their applications in practice. In this paper, we propose a new method, called \emph{residual permutation test} (RPT), which is constructed by projecting the regression residuals onto the space orthogonal to the union of the column spaces of the original and permuted design matrices. RPT can be proved to achieve finite-population size validity under fixed design with just exchangeable noises, whenever p<n/2. Moreover, RPT is shown to be asymptotically powerful for heavy tailed noises with bounded (1+t)-th order moment when the true coefficient is at least of order n−t/(1+t) for t∈[0,1]. We further proved that this signal size requirement is essentially optimal in the minimax sense. Numerical studies confirm that RPT performs well in a wide range of simulation settings with normal and heavy-tailed noise distributions.
- Discussant: Panos Toulis (University of Chicago)
- Links: [Relevant papers: paper #1][Slides][Discussion Slides]

Wednesday, February 15, 2023 [Recording]
- Speaker: Jesse Hemerik (Wageningen University)
- Title: Flexible estimation and control of the false discovery proportion
- Abstract: When we choose a multiple testing method, there are always tradeoffs between type I error control, power and flexibility. This is particularly true for multiple testing methods that estimate or control the proportion of false discoveries (FDP). At the beginning of this talk, an overview of such methods will be given. We then introduce a multiple testing procedure that controls the median of the FDP in a flexible way. The procedure only requires a vector of p-values as input and is comparable to the Benjamini-Hochberg method, which controls the mean of the FDP. Benjamini-Hochberg requires choosing the target FDP, alpha, before looking at the data, but our method does not. Our procedure is inspired by a popular estimator of the total number of true hypotheses. We adapt this estimator to provide simultaneously median unbiased estimators of the FDP. This simultaneity allows for the claimed flexibility.
- Discussant: Pallavi Basu (Indian School of Business)
- Links: [Relevant papers: paper #1, paper #2]

Wednesday, February 8, 2023 [Recording]
- Speaker: Asher Spector (Stanford University)
- Title: Asymptotically Optimal Knockoff Statistics via the Masked Likelihood Ratio
- Abstract: This paper introduces a class of asymptotically most powerful knockoff statistics based on a simple principle: that we should prioritize variables in order of our ability to distinguish them from their knockoffs. Our contribution is threefold. First, we argue that feature statistics should estimate "oracle masked likelihood ratios," which are Neyman-Pearson statistics for discriminating between features and knockoffs using partially observed (masked) data. Second, we introduce the masked likelihood ratio (MLR) statistic, a knockoff statistic that estimates the oracle MLR. We show that MLR statistics are asymptotically average-case optimal, i.e., they maximize the expected number of discoveries made by knockoffs when averaging over a user-specified prior on unknown parameters. Our optimality result places no explicit restrictions on the problem dimensions or the unknown relationship between the response and covariates; instead, we assume a "local dependence" condition which depends only on simple quantities that can be calculated from the data. Third, in simulations and three real data applications, we show that MLR statistics outperform state-of-the-art feature statistics, including in settings where the prior is highly misspecified. We implement MLR statistics in the open-source python package knockpy; our implementation is often (although not always) faster than computing a cross-validated lasso.
- Discussant: Xin Xing (Virginia Tech)
- Links: [Relevant papers: paper #1]

Wednesday, February 1, 2023 [Recording]
- Speaker: Ruth Heller (Tel Aviv University)
- Title: Replicability Across Multiple Studies
- Abstract: Meta-analysis is routinely performed in many scientific disciplines. This analysis is attractive since discoveries are possible even when all the individual studies are underpowered. However, the meta-analytic discoveries may be entirely driven by a signal in a single study, and thus non-replicable. The lack of replicability of scientific findings has been of great concern following the influential paper of Ioannidis (2005). Although the great majority of meta-analyses carried out to date do not infer on the replicability of their findings, it is possible to do so. We provide a selective overview of analyses that can be carried out towards establishing replicability of the scientific findings for the setting where multiple studies each examine multiple features (as in genomics applications). We also discuss some of the current shortcomings and future directions.
- Discussant: Jingshu Wang (University of Chicago)
- Links: [Relevant papers: paper #1][Slides]

Wednesday, January 25, 2023 [Recording]
- Speaker: Jinzhou Li (ETH Zürich)
- Title: Simultaneous false discovery proportion bounds via knockoffs and closed testing
- Abstract: We propose new methods to obtain simultaneous false discovery proportion bounds for knockoff-based approaches. We first investigate an approach based on Janson and Su's k-familywise error rate control method and interpolation. We then generalize it by considering a collection of k values, and show that the bound of Katsevich and Ramdas is a special case of this method and can be uniformly improved. Next, we further generalize the method by using closed testing with a multi-weighted-sum local test statistic. This allows us to obtain a further uniform improvement and other generalizations over previous methods. We also develop an efficient shortcut for its implementation. We compare the performance of our proposed methods in simulations and apply them to a data set from the UK Biobank.
- Discussant: Eugene Katsevich (University of Pennsylvania)
- Links: [Relevant papers: paper #1][Slides]

Thursday, December 15, 2022 [Recording]
- Speaker: Stephen Bates (UC Berkeley)
- Title: Principal-Agent Hypothesis Testing
- Abstract: Consider the relationship between the FDA (the principal) and a pharmaceutical company (the agent). The pharmaceutical company wishes to sell a product to make a profit, and the FDA wishes to ensure that only efficacious drugs are released to the public. The efficacy of the drug is not known to the FDA, so the pharmaceutical company must run a costly trial to prove efficacy to the FDA. Critically, the statistical protocol used to establish efficacy affects the behavior of a strategic, self-interested pharmaceutical company; a lower standard of statistical evidence incentivizes the pharmaceutical company to run more trials for drugs that are less likely to be effective, since the drug may pass the trial by chance, resulting in large profits. The interaction between the statistical protocol and the incentives of the pharmaceutical company is crucial to understanding this system and designing protocols with high social utility. In this work, we discuss how the principal and agent can enter into a contract with payoffs based on statistical evidence. When there is stronger evidence for the quality of the product, the principal allows the agent to make a larger profit. We show how to design contracts that are robust to an agent's strategic actions, and derive the optimal contract in the presence of strategic behavior.
- Discussant: Roshni Sahoo (Stanford University)
- Links: [Relevant papers: paper #1]

Friday, December 9, 2022 (STAMPS-ISSI joint seminar, 10:30 am PT/1:30 pm ET / 6:30 pm London / 8:30 pm Tel Aviv) [link to join]
- Speaker: Rebecca Willett (University of Chicago)
- Title: Machine Learning for Inverse Problems in Climate Science
- Abstract: Machine learning has the potential to transform climate research. This fundamental change cannot be realized through the straightforward application of existing off-the-shelf machine learning tools alone. Rather, we need novel methods for incorporating physical models and constraints into learning systems. In this talk, I will discuss inverse problems central to climate science — data assimilation and simulator model fitting — and how machine learning yields methods with high predictive skill and computational efficiency. First, I will describe a machine learning framework for learning dynamical systems in data assimilation. Our auto-differentiable ensemble Kalman filters blend ensemble Kalman filters for state recovery with machine learning tools for learning the dynamics. In doing so, our methods leverage the ability of ensemble Kalman filters to scale to high-dimensional states and the power of automatic differentiation to train high-dimensional surrogate models for the dynamics. Second, I will describe learning emulators of high-dimensional climate forecasting models targeting parameter estimation with uncertainty estimation. We assume access to a computationally complex climate simulator that inputs a candidate parameter and outputs a corresponding multichannel time series. Our task is to accurately estimate a range of likely values of the underlying parameters that best fit data. Our framework learns feature embeddings of observed dynamics jointly with an emulator that can replace high-cost simulators for parameter estimation. These methods build upon insights from inverse problems, data assimilation, stochastic filtering, and optimization, highlighting how theory can inform the design of machine learning systems in the natural sciences.

Thursday, November 24, 2022 (no seminar)

Thursday, December 1, 2022 [Recording]
- Speaker: Alexandre Blain (Inria)
- Title: Notip: Non-parametric True Discovery Proportion control for brain imaging
- Abstract: Cluster-level inference procedures are widely used for brain mapping. These methods compare the size of clusters obtained by thresholding brain maps to an upper bound under the global null hypothesis, computed using Random Field Theory or permutations. However, the guarantees obtained by this type of inference - i.e. at least one voxel is truly activated in the cluster - are not informative with regards to the strength of the signal therein. There is thus a need for methods to assess the amount of signal within clusters; yet such methods have to take into account that clusters are defined based on the data, which creates circularity in the inference scheme. This has motivated the use of post hoc estimates that allow statistically valid estimation of the proportion of activated voxels in clusters. In the context of fMRI data, the All-Resolutions Inference framework introduced in Rosenblatt et al. (2018) provides post hoc estimates of the proportion of activated voxels. However, this method relies on parametric threshold families, which results in conservative inference. In this paper, we leverage randomization methods to adapt to data characteristics and obtain tighter false discovery control. We obtain Notip, for Non-parametric True Discovery Proportion control: a powerful, non-parametric method that yields statistically valid guarantees on the proportion of activated voxels in data-derived clusters. Numerical experiments demonstrate substantial gains in number of detections compared with state-of-the-art methods on 36 fMRI datasets. The conditions under which the proposed method brings benefits are also discussed.
- Discussant: Angela Andreella (University Ca’ Foscari Venezia)
- Links: [Relevant papers: paper #1][Slides][Discussion Slides]

Thursday, November 17, 2022 [Recording]
- Speaker: Etienne Roquain (Sorbonne Université)
- Title: Machine learning meets false discovery rate
- Abstract: Classical false discovery rate (FDR) controlling procedures offer strong and interpretable guarantees but often lack flexibility to work with complex data. By contrast, machine learning-based classification algorithms have superior performances on modern datasets but typically fall short of error-controlling guarantees. In this paper, we make these two meet by introducing a new adaptive novelty detection procedure with FDR control, called AdaDetect. It extends the scope of recent works of multiple testing literature to the high dimensional setting, notably the one in Yang et al. (2021). We prove that AdaDetect comes with finite sample guarantees: it controls the FDR strongly and approximates the oracle in terms of the power, with explicit remainder terms that are small under mild conditions. In practice, AdaDetect can be used in combination with *any* machine learning-based classifier, which allows the user to choose the most relevant classification approach. We illustrate this with classical real-world datasets, for which random forest and neural network classifiers are particularly efficient. The versatility of our method is also shown with an astrophysical application.
- Discussant: Matteo Sesia (University of Southern California)
- Links: [Relevant papers: paper #1][Slides][Discussion Slides]

Thursday, November 10, 2022 [Recording]
- Speaker: Zhimei Ren (University of Chicago)
- Title: Derandomized knockoffs: leveraging e-values for false discovery rate control
- Abstract: Model-X knockoffs is a flexible wrapper method for high-dimensional regression algorithms, which provides guaranteed control of the false discovery rate (FDR). Due to the randomness inherent to the method, different runs of model-X knockoffs on the same dataset often result in different sets of selected variables, which is undesirable in practice. In this paper, we introduce a methodology for derandomizing model-X knockoffs with provable FDR control. The key insight of our proposed method lies in the discovery that the knockoffs procedure is in essence an e-BH procedure. We make use of this connection, and derandomize model-X knockoffs by aggregating the e-values resulting from multiple knockoff realizations. We prove that the derandomized procedure controls the FDR at the desired level, without any additional conditions (in contrast, previously proposed methods for derandomization are not able to guarantee FDR control). The proposed method is evaluated with numerical experiments, where we find that the derandomized procedure achieves comparable power and dramatically decreased selection variability when compared with model-X knockoffs.
- Discussant: Ruodu Wang (University of Waterloo)
- Links: [Relevant papers: paper #1][Slides]

Thursday, November 3, 2022 [Recording]
- Speaker: Genevera Allen (Rice University)
- Title: Model-Agnostic Confidence Intervals for Feature Importance: A Fast and Powerful Approach Using Minipatch Ensembles
- Abstract: To promote new scientific discoveries from complex data sets, feature importance inference has been a long-standing statistical problem. Instead of testing for parameters that are only interpretable for specific models, there has been increasing interest in model-agnostic methods, often in the form of feature occlusion or leave-one-covariate-out (LOCO) inference. Existing approaches often make distributional assumptions, which can be difficult to verify in practice, or require model-refitting and data-splitting, which are computationally intensive and lead to losses in power. In this work, we develop a novel, mostly model-agnostic and distribution-free inference framework for feature importance that is computationally efficient and statistically powerful. Our approach is fast as we avoid model-refitting by leveraging a form of random observation and feature subsampling called minipatch ensembles; this approach also improves statistical power by avoiding data splitting. Our framework can be applied on tabular data and with any machine learning algorithm, together with minipatch ensembles, for regression and classification tasks. Despite the dependencies induced by using minipatch ensembles, we show that our approach provides asymptotic coverage for the feature importance score of any model, and only assumes algorithmic stability. Finally, our same procedure can also be leveraged to provide valid confidence intervals for predictions, hence providing fast, simultaneous quantification of the uncertainty of both predictions and feature importance. We validate our intervals on a series of synthetic and real data examples, including non-linear settings with interactions, showing that our approach detects the correct important features and exhibits many computational and statistical advantages over existing methods. Joint work with Luqin Gan and Lili Zheng.
- Discussant: Byol Kim (University of Washington)
- Links: [Relevant papers: paper #1][Slides]

Thursday, October 27, 2022 [link to join]
- Speaker: Weijie Su (University of Pennsylvania)
- Title: Statistical Estimation via a Truthful Owner-Assisted Scoring Mechanism
- Abstract: In 2014, NeurIPS received 1,678 paper submissions, while this number increased to 10,411 in 2022, putting a tremendous strain on the peer review process. In this talk, we attempt to address this challenge starting by considering the following scenario: Alice submits a large number of papers to a machine learning conference and knows about the ground-truth quality of her papers; Given noisy ratings provided by independent reviewers, can Bob obtain accurate estimates of the ground-truth quality of the papers by asking Alice a question about the ground truth? First, if Alice would truthfully answer the question because by doing so her payoff as additive convex utility over all her papers is maximized, we show that the questions must be formulated as pairwise comparisons between her papers. Moreover, if Alice is required to provide a ranking of her papers, which is the most fine-grained question via pairwise comparisons, we prove that she would be truth-telling. By incorporating the ground-truth ranking, we show that Bob can obtain an estimator with the optimal squared error in certain regimes based on any possible ways of truthful information elicitation. Moreover, the estimated ratings are substantially more accurate than the raw ratings when the number of papers is large and the raw ratings are very noisy. Finally, we conclude the talk with several extensions and some refinements for practical considerations.
Discussant: Davide Viviano (Stanford University)
Links: [Relevant papers: paper #1, paper #2]

Thursday, October 20, 2022 [Recording]
- Speaker: Timothy Armstrong (University of Southern California)
- Title: Empirical Bayes Confidence Intervals, Average Coverage and the False Discovery Rate
- Abstract: This talk presents a general method for constructing intervals satisfying an average coverage property. Given an estimate of average squared bias of estimates of $n$ parameters, one computes a critical value that takes into account possible undercoverage due to bias, on average over the $n$ intervals. Applying our approach to shrinkage estimators in an empirical Bayes setting, we obtain confidence intervals that satisfy the empirical Bayes coverage property of Morris (1983), while avoiding parametric assumptions on the prior previously used to construct such intervals.

While tests based on average coverage intervals do not control size in the usual frequentist sense, certain results on false discovery rate (FDR) control of multiple testing procedures continue to hold when applied to such tests. In particular, the Benjamini and Hochberg (1995) step-up procedure still controls FDR in the asymptotic regime with many weakly dependent $p$-values, and certain adjustments for dependent $p$-values such as the Benjamini and Yekutieli (2001) procedure continue to yield FDR control in finite samples.

- Discussant: Jiaying Gu (University of Toronto)
- Links: [Relevant papers: paper #1, paper #2][Slides]

Thursday, October 13, 2022 [Recording]
- Speaker: Aaditya Ramdas (Carnegie Mellon University)
- Title: E-values as unnormalized weights in multiple testing
- Abstract: The last two years have seen a flurry of new work on using e-values for multiple testing. This talk will summarize old ideas and present some new, unsubmitted work. I will briefly summarize what e-values and e-processes are (nonparametric, composite generalizations of likelihood ratios and Bayes factors), and recap the e-BH and e-BY procedures for FDR and FCR control, and their utility in a bandit context.

Then, I will present a simple, yet powerful, idea: using e-values as unnormalized weights in multiple testing. Most standard weighted multiple testing methods require the weights to deterministically add up to the number of hypotheses being tested (equivalently, the average weight is unity). But this normalization is not required when the weights are e-values obtained from independent data. This could result in a massive increase in power, especially if the non-null hypotheses have e-values much larger than one. More broadly, we study how to combine an e-value and a p-value, and design multiple testing procedures where both e-values and p-values are available for some hypotheses. A case study with RNA-seq and microarray data will demonstrate the practical power benefits.

These are joint works with Ruodu Wang, Neil Xu and Nikos Ignatiadis.

Discussant: Peter Grünwald (Centrum Wiskunde & Informatica and Leiden University)
Links: [Relevant papers: paper #1, paper #2, paper #3, paper #4][Slides]

Thursday, July 28, 2022 [Recording]
- Speaker: Trambak Banerjee (University of Kansas)
- Title: Nonparametric Empirical Bayes Estimation On Heterogeneous Data
- Abstract: The simultaneous estimation of many parameters based on data collected from corresponding studies is a key research problem that has received renewed attention in the high-dimensional setting. Many practical situations involve heterogeneous data where heterogeneity is captured by a nuisance parameter. Effectively pooling information across samples while correctly accounting for heterogeneity presents a significant challenge in large-scale estimation problems. We address this issue by introducing the ``Nonparametric Empirical Bayes Structural Tweedie" (NEST) estimator, which efficiently estimates the unknown effect sizes and properly adjusts for heterogeneity via a generalized version of Tweedie's formula. For the normal means problem, NEST simultaneously handles the two main selection biases introduced by heterogeneity: one, the selection bias in the mean, which cannot be effectively corrected without also correcting for, two, selection bias in the variance. Our theoretical results show that NEST has strong asymptotic properties and in our simulation studies NEST outperforms competing methods, with much efficiency gains in many settings. The proposed method is demonstrated on estimating the batting averages of baseball players and Sharpe ratios of mutual fund returns.
- Discussant: Jake Soloff (University of Chicago)
- Links: [Relevant papers: paper #1]

Thursday, July 21, 2022 (postponed)
- Speaker: Dacheng Xiu (University of Chicago)
- Title: Prediction When Factors are Weak
- Abstract: Principal component analysis (PCA) has been the most prevalent approach to the recovery of factors. Nevertheless, the theoretical justification of the PCA-based approach often relies on a convenient and critical assumption that factors are pervasive. To incorporate information from weaker factors in the context of prediction, we propose a new procedure based on supervised PCA, which iterates over selection, PCA, and projection. The selection step finds a subset of predictors most correlated with the prediction target, whereas the projection step permits multiple weak factors of distinct strength. We justify our procedure in an asymptotic scheme where both the sample size and the cross-sectional dimension increase at potentially different rates. Our empirical analysis highlights the role of weak factors in predicting inflation.
- Discussant: Yiqiao Zhong (Stanford University)
- Links: [Relevant papers: ]

Thursday, July 14, 2022 (100-th ISSI seminar) [Recording]
- Speaker: Yoav Benjamini (Tel Aviv University)
- Title: Trends and challenges in research about selective inference and its practice
- Abstract: The international seminar on selective inference gives us an opportunity to identify trends in this important research area, discuss common topics of interest and raise some challenges. I’ll try to use this opportunity for these purposes, but obviously the challenges will reflect my own point of view.

Thursday, June 30, 2022 [Recording]
- Speaker: Zhanrui Cai (Carnegie Mellon University)
- Title: Robust Cross Validation with Confidence
- Abstract: Cross validation is one of the most popular tools for model selection and tunning parameter selection in the modern statistics and machine learning community. By dividing the sample into K-folds, cross validation first train the models on $K-1$ folds of data, and test the prediction error on the remaining dataset. Then it chooses the model / tunning parameter that has the smallest test error. Recent studies aim to improve the confidence level for the models selected by cross validation (Lei, 2020), but may not be suitable for skewed/ heavy tailed data, or data with outliers. In this paper, we propose a robust cross validation method. Instead of comparing the mean of the prediction error, we propose to compare the quantiles of the test error due to its skewness nature. We illustrate the necessity of rank-sum comparison through motivating examples, and demonstrate the advantage of the proposed robust cross validation method through extensive simulation and real data analysis. In order to study the limiting distribution of the evaluation criterion, we develop the Gaussian approximation theory for high dimensional two sample U-statistics, which may be of independent interest.
- Discussant: Morgane Austern (Harvard University)
- Links: [Relevant papers: ]

Thursday, June 23, 2022 [Recording]
- Speaker: Yixiang Luo (University of California, Berkeley)
- Title: Improving knockoffs with conditional calibration
- Abstract: The knockoff filter of Barber and Candès (2015) is a flexible framework for multiple testing in supervised learning models, based on introducing synthetic predictor variables to control the false discovery rate (FDR). Using the conditional calibration framework of Fithian and Lei (2020), we introduce the calibrated knockoff procedure, a method that uniformly improves the power of any knockoff procedure. We implement our method for fixed-X knockoffs and show theoretically and empirically that the improvement is especially notable in two contexts where knockoff methods can be nearly powerless: when the rejection set is small, and when the structure of the design matrix prevents us from constructing good knockoff variables. In these contexts, calibrated knockoffs even outperform competing FDR-controlling methods like the (dependence-adjusted) Benjamini– Hochberg procedure in many scenarios.

This is joint work with Will Fithian and Lihua Lei.

- Discussant: Lucas Janson (Harvard University)
- Links: [Relevant papers: paper #1]

Thursday, June 16, 2022 (ISSI-STAMPS joint seminar) [Recording]
- Speaker: Ann Lee (Carnegie Mellon University)
- Title: Likelihood-Free Frequentist Inference: Confidence Sets with Correct Conditional Coverage
- Abstract: Many areas of science make extensive use of computer simulators that implicitly encode likelihood functions of complex systems. Classical statistical methods are poorly suited for these so-called likelihood-free inference (LFI) settings, outside the asymptotic and low-dimensional regimes. Although new machine learning methods, such as normalizing flows, have revolutionized the sample efficiency and capacity of LFI methods, it remains an open question whether they produce confidence sets with correct conditional coverage. In this talk, I will describe our group's recent and ongoing research on developing scalable and modular procedures for (i) constructing Neyman confidence sets with finite-sample guarantees of nominal coverage, and for (ii) computing diagnostics that estimate conditional coverage over the entire parameter space. We refer to our framework as likelihood-free frequentist inference (LF2I). Any method that defines a test statistic, like the likelihood ratio, can be adapted to LF2I to create valid confidence sets and diagnostics, without costly Monte Carlo samples at fixed parameter settings. In my talk, I will discuss where we stand with LF2I and challenges that still remain. (Part of these efforts are joint with Niccolo Dalmasso, Rafael Izbicki, Luca Masserano, Tommaso Dorigo, Mikael Kuusela, and David Zhao. Our general framework is described in arXiv:2107.03920)
- Discussant: Minge Xie (Rutgers University)
- Links: [Relevant papers: paper #1, paper #2][Slides]

Thursday, June 9, 2022 [Recording]
- Speaker: Anna Neufeld (University of Washington)
- Title: Inference after latent variable estimation for single-cell RNA sequencing data
- Abstract: In the analysis of single-cell RNA sequencing data, researchers often ﬁrst characterize the variation between cells by estimating a latent variable, representing some aspect of the individual cell’s state. They then test each gene for association with the estimated latent variable. If the same data are used for both of these steps, then standard methods for computing p-values and conﬁdence intervals in the second step will fail to achieve statistical guarantees such as Type 1 error control or nominal coverage. Furthermore, approaches such as sample splitting that can be fruitfully applied to solve similar problems in other settings are not applicable in this context. In this paper, we introduce count splitting, an extremely ﬂexible framework that allows us to carry out valid inference in this setting, for virtually any latent variable estimation technique and inference approach, under a Poisson assumption. We demonstrate the Type 1 error control and power of count splitting in a simulation study, and apply count splitting to a dataset of pluripotent stem cells diﬀerentiating to cardiomyocytes.
- Discussant: James Leiner (Carnegie Mellon University)
- Links: [Relevant papers: ][Slides]

Thursday, June 2, 2022 [Recording]
- Speaker: Matteo Sesia (University of Southern California)
- Title: Individualized conditional independence testing under model-X with heterogeneous samples and interactions
- Abstract: Model-X knockoffs and the conditional randomization test are methods that search for conditional associations in large data sets, controlling the type-I errors if the joint distribution of the predictors is known. However, they cannot test for interactions nor find whether an association is only significant within a latent subset of a heterogeneous population. We address this limitation by developing an extension of the knockoff filter that tests conditional associations within automatically detected subsets of individuals, provably controlling the false discovery rate for the selected hypotheses. Then, under the additional assumption of a partially linear model with a binary predictor, we extend the conditional randomization test as to make inferences about quantiles of individual effects that are robust to sample heterogeneity and interactions. The performances of these methods are investigated through simulations and with the analysis of data from a randomized blood donation experiment with several treatments.
- Discussant: Brad Ross (Stanford University)
- Links: [Relevant papers: paper #1]

Thursday, May 26, 2022 [Recording]
- Speaker: James Leiner (Carnegie Mellon University)
- Title: Data Blurring Fission: sample splitting a single sample
- Abstract: Suppose we observe a random vector $X$ from some distribution $P$ in a known family with unknown parameters. We ask the following question: when is it possible to split $X$ into two parts $f(X)$ and $g(X)$ such that neither part is sufficient to reconstruct $X$ by itself, but both together can recover $X$ fully, and the joint distribution of $(f(X),g(X))$ is tractable? As one example, if $X=(X_1,\dots,X_n)$ and $P$ is a product distribution, then for any $m<n$, we can split the sample to define $f(X)=(X_1,\dots,X_m)$ and $g(X)=(X_{m+1},\dots,X_n)$. Rasines and Young (2021) offers an alternative route of accomplishing this task through randomization of $X$ with additive Gaussian noise which enables post-selection inference in finite samples for Gaussian distributed data and asymptotically for non-Gaussian additive models. In this paper, we offer a more general methodology for achieving such a split in finite samples by borrowing ideas from Bayesian inference to yield a (frequentist) solution that can be viewed as a continuous analog of data splitting. We call our method data blurring, as an alternative to data splitting, data carving and p-value masking. We exemplify the method on a few prototypical applications, such as post-selection inference for trend filtering and other regression problems.
- Discussant: Daniel Garcia Rasines (ICMAT - CSIC)
- Links: [Relevant papers: paper #1][Slides]

Thursday, May 19, 2022 [Recording]
- Speaker: Daniel Wilhelm (University College London)
- Title: Inference for Ranks
- Abstract: It is often desired to rank different populations according to the value of some feature of each population. For example, it may be desired to rank neighborhoods according to some measure of intergenerational mobility or countries according to some measure of academic achievement. These rankings are invariably computed using estimates rather than the true values of these features. As a result, there may be considerable uncertainty concerning the rank of each population. In this paper, we consider the problem of accounting for such uncertainty by constructing confidence sets for the rank of each population. We consider both the problem of constructing marginal confidence sets for the rank of a particular population as well as simultaneous confidence sets for the ranks of all populations. We show how to construct such confidence sets under weak assumptions. An important feature of all of our constructions is that they remain computationally feasible even when the number of populations is very large. We apply our theoretical results to re-examine the rankings of both neighborhoods in the United States in terms of intergenerational mobility and developed countries in terms of academic achievement. The conclusions about which countries do best and worst at reading, math, and science are fairly robust to accounting for uncertainty. The confidence sets for the ranking of the 50 most populous commuting zones by measures of mobility are also found to be small. These rankings, however, become much less informative if one includes all commuting zones, if one considers neighborhoods at a more granular level (counties, Census tracts), or if one uses movers across areas to address concerns about selection.
- Discussant: Aldo Solari (University of Milano-Bicocca)
- Links: [Relevant papers: paper #1, paper #2][Slides]

Thursday, May 12, 2022 [Recording]
- Speaker: Colin Fogarty (Massachusetts Institute of Technology)
- Title: Sensitivity and Multiplicity
- Abstract: Corrections for multiple comparisons generally imagine that all other modeling assumptions are met for the hypothesis tests being conducted, such that the only reason for inflated false rejections is the fact that multiplicity has been ignored when performing inference. In reality, such modes of inference often rest upon unverifiable assumptions. Common expedients include the assumption of ``representativeness" of the sample at hand for the population of interest; and of "no unmeasured confounding" when inferring treatment effects in observational studies. In a sensitivity analysis, one quantifies the magnitude of the departure from unverifiable assumptions required to explain away the findings of a study. Individually, both sensitivity analyses and multiplicity controls can reduce the rate at which true signals are detected and reported. In studies with multiple outcomes resting upon untestable assumptions, one may be concerned that correcting for multiple comparisons while also conducting a sensitivity analysis could render the study entirely devoid of power. We present results on sensitivity analysis for observational studies with multiple endpoints, where the researcher must simultaneously account for multiple comparisons and assess robustness to hidden bias. We find that of the two pursuits, it is recognizing the potential for hidden bias that plays the largest role in determining the conclusions of a study: individual findings that are robust to hidden bias are remarkably persistent in the face of multiple comparisons, while sensitive findings are quickly erased regardless of the number of comparisons. Through simulation studies and empirical examples, we show that through the incorporation of the proposed methodology within a closed testing framework, in a sensitivity analysis one can often attain the same power for testing individual hypotheses that one would have attained had one not accounted for multiple comparisons at all. This suggests that once one commits to conducting a sensitivity analysis, the additional loss in power from controlling for multiple comparisons may be substantially attenuated.
- Discussant: Bo Zhang (Fred Hutchinson Cancer Center)
- Links: [Relevant papers: paper #1, paper #2, paper #3][Slides]

Thursday, May 5, 2022 [Recording]
- Speaker: Ariane Marandon (Sorbonne Université, LPSM)
- Title: False clustering rate control in mixture models
- Abstract: The clustering task consists in delivering labels to the members of a sample. For most data sets, some individuals are ambiguous and intrinsically difficult to attribute to one or another cluster. However, in practical applications, misclassifying individuals is potentially disastrous. To overcome this difficulty, the idea followed here is to classify only a part of the sample in order to obtain a small misclassification rate. This approach is well known in the supervised setting, and referred to as classification with an abstention option. The purpose of this paper is to revisit this approach in an unsupervised mixture-model framework. The problem is formalized in terms of controlling the false clustering rate (FCR) below a prescribed level α, while maximizing the number of classified items. New procedures are introduced and their behavior is shown to be close to the optimal one by establishing theoretical results and conducting numerical experiments.
- Discussant: Gilles Blanchard (Université Paris Sud)
- Links: [Relevant papers: paper #1][Slides]

Thursday, April 28, 2022 [Recording]
- Speaker: Pragya Sur (Harvard University)
- Title: A modern central limit theorem for the classical doubly robust estimator: variance inflation and beyond
- Abstract: Estimating the average treatment effect (ATE) is a central problem in causal inference. Modern advances in the field studied estimation and inference for the ATE in high dimensions through a variety of approaches. Doubly robust estimators form a popular approach in this context. However, the high-dimensional literature surrounding these estimators relies on sparsity conditions, either on the outcome regression (OR) or the propensity score (PS) model. This talk will introduce a new central limit theorem for the classical doubly robust (DR) estimator, that applies agnostic to such sparsity-type assumptions. Specifically, we will study properties of the cross-fit version of the estimator under well-specified OR and PS models, and the common modern regime where the number of features and samples are both large and comparable. In this regime, under assumptions on the covariate distribution, our CLT will uncover two crucial phenomena among others: (i) the DR estimator exhibits a substantial variance inflation that can be precisely quantified in terms of the signal-to-noise ratio and other problem parameters, (ii) the asymptotic covariance between the estimators used while cross-fitting is not negligible even on the root-n scale. These findings are strikingly different from their classical counterparts, and open a vista of possibilities for studying similar other high-dimensional effects. On the technical front, our work utilizes a novel interplay between three distinct tools—approximate message passing theory, the theory of deterministic equivalents, and the leave-one-out approach. Time permitting, I will outline some of these techniques. This is based on joint work with Kuanhao Jiang, Rajarshi Mukherjee, and Subhabrata Sen.
- Discussant: Michael Celentano (University of California, Berkeley)
- Links: [Relevant papers: paper #1][Slides]

Thursday, April 21, 2022 [Recording]
- Speaker: Zongming Ma (University of Pennsylvania)
- Title: Testing equivalence of clustering
- Abstract: In this talk, we test whether two datasets measured on the same set of subjects share a common clustering structure. As a leading example, we focus on comparing clustering structures in two independent random samples from two deterministic two-component mixtures of multivariate Gaussian distributions. Mean parameters of these Gaussian distributions are treated as potentially unknown nuisance parameters and are allowed to differ. Assuming knowledge of mean parameters, we first determine the phase diagram of the testing problem over the entire range of signal-to-noise ratios by providing both lower bounds and tests that achieve them. When nuisance parameters are unknown, we propose tests that achieve the detection boundary adaptively as long as ambient dimensions of the datasets grow at a sub-linear rate with the sample size. The talk is based on a joint work with Chao Gao.
- Discussant: Kaizheng Wang (Columbia University)
- Links: [Relevant papers: paper #1]

Thursday, April 14, 2022 [Recording]
- Speaker: Zheng (Tracy) Ke (Harvard University)
- Title: Power Analysis and Phase Transitions for FDR Control Methods
- Abstract: Many recent FDR control methods have been proposed under sparse linear regression models. In this talk, we are interested in two questions: 1) How to design an FDR control method to achieve good power? 2) Does the operation of adding fake variables in an FDR control method lead to any unwanted power loss? We tackle these questions by viewing an FDR control method as having three components: ranking algorithm, tampered design and symmetric statistic. We consider a collection of different combinations of the three components, where each combination corresponds to a specific FDR control method. This collection covers the recent methods of knockoff filter (Barber and Candes, 2015), Gaussian mirror (Xing, Zhao and Liu, 2021) and their variants. We evaluate the power of each FDR control method by deriving their theoretical phase diagrams under a Rare/Weak signal model. We then answer Question (1) by comparing the phase diagrams of different FDR control methods and deriving insights of power boost. We answer Question (2) by comparing the phase diagram of an FDR control method with its prototype – a method that uses an ideal threshold. We give encouraging examples where an FDR control method has a negligible power loss relative to its prototype.
- Discussant: Asaf Weinstein (Hebrew University of Jerusalem)
- Links: [Relevant papers: paper #1]

Thursday, April 7, 2022 [Recording]
- Speaker: Yiqun Chen (University of Washington)
- Title: Selective inference for k-means clustering
- Abstract: We consider the problem of testing for a difference in means between clusters of observations identified via k-means clustering. In this setting, classical hypothesis tests lead to an inflated Type I error rate, because the clusters were obtained on the same data used for testing. To overcome this problem, we propose a selective inference approach. We describe an efficient algorithm to compute a finite-sample p-value that controls the selective Type I error for a test of the difference in means between a pair of clusters obtained using k-means clustering. We apply our proposal in simulation, and on hand-written digits data and single-cell RNA-sequencing data.
- Discussant: Govinda Kamath (10x Genomics)
- Links: [Relevant papers: paper #1][Slides]

Thursday, March 31, 2022 [Recording]
- Speaker: Jake Soloff (UC Berkeley)
- Title: The edge of discovery: Controlling the local false discovery rate at the margin
- Abstract: Despite the popularity of the false discovery rate (FDR) as an error control metric for large-scale multiple testing, its close Bayesian counterpart the local false discovery rate (lfdr), defined as the posterior probability that a particular null hypothesis is false, is a more directly relevant standard for justifying and interpreting individual rejections. However, the lfdr is difficult to work with in small samples, as the prior distribution is typically unknown. We propose a simple multiple testing procedure and prove that it controls the expectation of the maximum lfdr across all rejections; equivalently, it controls the probability that the rejection with the largest p-value is a false discovery. Our method operates without knowledge of the prior, assuming only that the p-value density is uniform under the null and decreasing under the alternative. We also show that our method asymptotically implements the oracle Bayes procedure for a weighted classification risk, optimally trading off between false positives and false negatives. We derive the limiting distribution of the attained maximum lfdr over the rejections, and the limiting empirical Bayes regret relative to the oracle procedure.

This is joint work with Daniel Xiang and Will Fithian.

- Discussant: Jinjin Tian (Carnegie Mellon University)
- Links: [Relevant paper: paper #1][Slides]

Thursday, March 24, 2022 [Recording]
- Speaker: Ziyu (Neil) Xu (Carnegie Mellon University)
- Title: Post-selection inference for e-value based confidence intervals
- Abstract: Suppose that one can construct a valid (1-delta)-CI for each of K parameters of potential interest. A data analyst uses an arbitrary data-dependent criterion to select some subset S of them for reporting, or highlighting. The confidence intervals for the selected parameters are no longer valid, due to the selection bias, so the question is how one must adjust these in order to account for selection. We focus on the popular notion of false coverage rate (FCR), which is the expected ratio of the number of selected intervals that miscover, to the number of selected intervals |S|. The main established method is the ``BY procedure'' from a seminal work by Benjamini and Yekutieli (JASA, 2005), that was inspired by the Benjamini-Hochberg (BH) procedure. Unfortunately, the BY procedure involves restrictions on the dependence between CIs and the selection criterion. We propose a natural and much simpler method---both in implementation, and in proof---which is valid under any dependence structure between the original CIs, and any (unknown) selection criterion, but which only applies to a special, yet broad, class of CIs. Our procedure reports (1-delta|S|/K)-CIs for the selected parameters, and we prove that it controls the FCR at delta for confidence intervals that implicitly invert *e-values*; examples include those constructed via supermartingale methods, or via universal inference, or via Chernoff-style bounds on the moment generating function, among others.

Our work also has implications for multiple testing in sequential settings, since it applies at stopping times to continuously-monitored confidence sequences and multi-armed bandit sampling.

- Discussant: Zhimei Ren (University of Chicago)
- Links: [Relevant papers: paper #1][Slides]

Thursday, March 17, 2022 [Recording: (part 1) (part 2) ]
- Speaker: Yachong Yang (University of Pennsylvania)
- Title: Double robust prediction with covariate shift
- Abstract: Conformal prediction has received tremendous attention in recent years with several applications across health and social sciences. Recently, conformal inference has offered new solutions to problems in causal inference, which has led to advances in modern discipline of semiparametric statistics for constructing novel, efficient prediction uncertainty quantification. In this paper, we consider the problem of obtaining distribution-free prediction regions when there is a shift in the distribution of the covariates between the training and test data. We propose a method built on the efficient influence function for the average treatment effect among treated (ATT) functional that can be combined with an arbitrary training algorithm, without compromising asymptotic coverage. The prediction set attains nominal average coverage. This guaranty is a consequence of the product bias form of our proposal which implies correct coverage if either the propensity score or the conditional distribution of the response can be estimated sufficiently well, also known as double robustness. We also discuss parameter tuning for optimal performance, and resolve a number of open problems at the intersection of causal inference, semiparametric theory, and conformal prediction.
- Discussant: James Robins (Harvard University)
- Links: [Relevant papers: paper #1][Slides]

Thursday, March 10, 2022 [Recording]
- Speaker: Leying Guan (Yale University)
- Title: Localized Conformal Prediction
- Abstract: We propose an inference framework called localized conformal prediction. It generalizes conformal prediction and offers a single-test-sample adaptive construction by emphasizing a local region around it, and can be combined with different conformal score constructions. The proposed framework enjoys an assumption-free finite sample marginal coverage guarantee. In addition, it offers approximate/asymptotic conditional coverage guarantees under suitable assumptions. We demonstrate how to change from conformal prediction to localized conformal prediction using several conformal scores and an associated potential gain via numerical examples.
- Discussant: Rafael Izbicki (Federal University of São Carlos)
- Links: [Relevant papers: paper #1]

Thursday, March 3, 2022 [Recording]
- Speaker: Ajit Tamhane (Northwestern University)
- Title: Testing Primary and Secondary Endpoints in Group Sequential Clinical Trials
- Abstract: In this talk I will give an overview of my work (in collaboration with others) over the last decade on the important practical problem of testing primary and secondary endpoints subject to a gatekeeping constraint in group sequential clinical trials. I will also mention some current work that is under way on interesting extensions. The focus of the talk will be the ideas behind the results and not the technical proofs. As such, this talk should be accessible to all.
- Discussant: Jason Hsu (The Ohio State University)
- Links: [Relevant papers: paper #1, paper #2, paper #3, paper #4]

Thursday, February 24, 2022 [Recording]
- Speaker: Bradley Rava (University of Southern California)
- Title: A Burden Shared is a Burden Halved: A Fairness-Adjusted Approach to Classification
- Abstract: We study fairness in classification, where one wishes to make automated decisions for people from different protected groups. When individuals are classified, the decision errors can be unfairly concentrated in certain protected groups. We develop a fairness-adjusted selective inference (FASI) framework and data-driven algorithms that achieve statistical parity in the sense that the false selection rate (FSR) is controlled and equalized among protected groups. The FASI algorithm operates by converting the outputs from black-box classifiers to R-values, which are intuitively appealing and easy to compute. Selection rules based on R-values are provably valid for FSR control, and avoid disparate impacts on protected groups. The effectiveness of FASI is demonstrated through both simulated and real data. Joint work with Wenguang Sun, Gareth James and Xin Tong.
- Discussant: Yaniv Romano (Technion—Israel Institute of Technology)
- Links: [Relevant papers: paper #1]

Thursday, February 17, 2022 [Recording]
- Speaker: Dillon Bowen (University of Pennsylvania)
- Title: Inference for Losers
- Abstract: Researchers frequently report league tables ranking units (neighborhoods or firms, for instance) based on estimated coefficients. Since the rankings are formed based on estimates, however, coefficients reported in league tables suffer from selection bias, with estimates for highly-ranked units biased upwards and those for low-ranked units biased downwards. Further, conventional confidence intervals can undercover. This paper introduces corrected estimators and confidence intervals that address these biases, ensuring that estimates and confidence intervals reported for each position in a league table are median-unbiased and have correct coverage, respectively.
- Discussant: Roger Koenker (University College London)
- Links: [Relevant papers: paper #1]

Thursday, February 10, 2022 [Recording]
- Speaker: Ying Ding (University of Pittsburgh)
- Title: Logic Inference and Testing in Targeted Treatment Development with Survival Outcomes
- Abstract: There has been growing interest in discovering precision medicine in modern drug development and biomedical research. One aspect of precision medicine is to develop new therapies that target a subgroup of patients who exhibit enhanced treatment efficacy (as compared to the complement of the subgroup) through randomized controlled trials (RCTs). In this talk, we will address two important statistical problems in such target treatment development process when outcome is time-to-event type: (1) establish a correct and logic inference procedure when population is a mixture of subgroups with differential efficacy; (2) develop a multiple-testing-based procedure to simultaneously identify and infer subgroups with enhanced treatment efficacy. Specifically, we propose a subgroup mixable estimation (SME) procedure to estimate efficacy in subgroups and their mixtures. We also develop a confident effect (CE4) approach which formulates the multiple testing problem through contrasts and construct their simultaneous confidence intervals. Such a testing procedure rigorously controls both within- and across-marker multiplicity. We illustrate the methods on a large RCT of an eye disease, age-related macular degeneration (AMD), by discovering consistent differential treatment effects on delaying AMD progression in subgroups defined by SNPs.
- Discussant: Thorsten Dickhaus (University of Bremen)
- Links: [Relevant papers: paper #1, paper #2, paper #3]

Thursday, February 3, 2022 [Recording]
- Speaker: Yonghoon Lee (University of Chicago)
- Title: Distribution-free inference for regression: discrete, continuous, and in between
- Abstract: In data analysis problems where we are not able to rely on distributional assumptions, what types of inference guarantees can still be obtained? Many popular methods, such as holdout methods, cross-validation methods, and conformal prediction, are able to provide distribution-free guarantees for predictive inference, but the problem of providing inference for the underlying regression function (for example, inference on the conditional mean 𝔼[Y|X]) is more challenging. In the setting where the features X are continuously distributed, recent work has established that any confidence interval for 𝔼[Y|X] must have non-vanishing width, even as sample size tends to infinity. At the other extreme, if X takes only a small number of possible values, then inference on 𝔼[Y|X] is trivial to achieve. In this work, we study the problem in settings in between these two extremes. We find that there are several distinct regimes in between the finite setting and the continuous setting, where vanishing-width confidence intervals are achievable if and only if the effective support size of the distribution of X is smaller than the square of the sample size.
- Discussant: Ying Jin (Stanford University)
- Links: [Relevant papers: paper #1][Slides][Discussion Slides]

Thursday, January 27, 2022 [Recording]
- Speaker: Richard Berk (University of Pennsylvania)
- Title: Improving Fairness in Criminal Justice Algorithmic Risk Assessments Using Optimal Transport and Conformal Prediction Sets
- Abstract: In the United States and elsewhere, risk assessment algorithms are being used to help inform criminal justice decision-makers. A common intent is to forecast an offender's ``future dangerousness.'' Such algorithms have been correctly criticized for potential unfairness, and there is an active cottage industry trying to make repairs. In this paper, we use counterfactual reasoning to consider the prospects for improved fairness when members of a less privileged group are treated by a risk algorithm as if they are members of a more privileged group. We combine a machine learning classifier trained in a novel manner with an optimal transport adjustment for the relevant joint probability distributions, which together provide a constructive response to claims of bias-in-bias-out. A key distinction is between fairness claims that are empirically testable and fairness claims that are not. We then use confusion tables and conformal prediction sets to evaluate achieved fairness for projected risk. Our data are a random sample of 300,000 offenders at their arraignments for a large metropolitan area in the United States during which decisions to release or detain are made. We show that substantial improvement in fairness can be achieved consistent with a Pareto improvement for protected groups.
- Discussant: Emmanuel Candès (Stanford University)
- Links: [Relevant papers: paper #1][Slides][Discussion Slides]

Thursday, January 20, 2022 [Recording]
- Speaker: Richard Samworth (University of Cambridge)
- Title: Optimal subgroup selection
- Abstract: In clinical trials and other applications, we often see regions of the feature space that appear to exhibit interesting behaviour, but it is unclear whether these observed phenomena are reflected at the population level. Focusing on a regression setting, we consider the subgroup selection challenge of identifying a region of the feature space on which the regression function exceeds a pre-determined threshold. We formulate the problem as one of constrained optimisation, where we seek a low-complexity, data-dependent selection set on which, with a guaranteed probability, the regression function is uniformly at least as large as the threshold; subject to this constraint, we would like the region to contain as much mass under the marginal feature distribution as possible. This leads to a natural notion of regret, and our main contribution is to determine the minimax optimal rate for this regret in both the sample size and the Type I error probability. The rate involves a delicate interplay between parameters that control the smoothness of the regression function, as well as exponents that quantify the extent to which the optimal selection set at the population level can be approximated by families of well-behaved subsets. Finally, we expand the scope of our previous results by illustrating how they may be generalised to a treatment and control setting, where interest lies in the heterogeneous treatment effect.
- Discussant: Charles Doss (University of Minnesota)
- Links: [Relevant papers: paper #1][Slides]

Thursday, December 16, 2021 [Recording]
- Speaker: Marina Bogomolov (Technion - Israel Institute of Technology)
- Title: Adaptive methods for testing hypotheses with group structure while simultaneously controlling several error rates
- Abstract: In many statistical applications a large set of hypotheses is tested, and the hypotheses can be naturally classified into groups based on different criteria, defined by the characteristics of the problem. Examples of such applications include brain imaging, microbiome, and genome-wide association studies. In such settings, it may be of interest to identify groups containing signals, for each partition into groups, with control over false discoveries. This goal was addressed by Barber and Ramdas (2017) and by Ramdas, Barber, Wainwright, and Jordan (2019), who developed the p-filter method for controlling the group-level false discovery rate (FDR) simultaneously for all partitions. We address the same goal, and aim to increase the power of the p-filter method by capturing the group structure of the hypotheses using adaptive weights developed by Nandi, Sarkar, and Chen (2021). We prove that the modified p-filter method controls the group-level FDR for each partition into groups under independence, and show by simulations that it seems to retain the control under certain forms of positive dependence. Our simulation study shows that the proposed modification increases the power of the method in the settings where the signals are concentrated within some groups. We compare the performance of the modified method to that of the original p-filter on real brain imaging data, where the hypotheses are grouped with respect to two criteria. This is a joint work with Ido Griness.
- Discussant: Shinjini Nandi (Montana State University)
- Links: [Relevant papers:]

Thursday, December 9, 2021 [Recording (part I) Recording (part II)]
- Speaker: Jiaying Gu (University of Toronto)
- Title: Invidious Comparisons: Ranking and Selection as Compound Decisions
- Abstract: There is an innate human tendency, one might call it the “league table mentality,” to construct rankings. Schools, hospitals, sports teams, movies, and myriad other objects are ranked even though their inherent multi-dimensionality would suggest that – at best – only partial orderings were possible. We consider a large class of elementary ranking problems in which we observe noisy, scalar measurements of merit for n objects of potentially heterogeneous precision and are asked to select a group of the objects that are “most meritorious.” The problem is naturally formulated in the compound decision framework of Robbins’s (1956) empirical Bayes theory, but it also exhibits close connections to the recent literature on multiple testing. The nonparametric maximum likelihood estimator for mixture models (Kiefer and Wolfowitz (1956)) is employed to construct optimal ranking and selection rules. Performance of the rules is evaluated in simulations and an application to ranking U.S kidney dialysis centers
- Discussant: Soonwoo Kwon (Brown University)
- Links: [Relevant papers: paper #1][Slides]

Thursday, December 2, 2021 [Recording]
- Speaker: Daniel Garcia Rasines (ICMAT - CSIC)
- Title: Splitting strategies for post-selection inference
- Abstract: We consider the problem of providing valid inference for a selected parameter in a sparse regression setting. It is well known that classical regression tools can be unreliable in this context due to the bias generated in the selection step. Many approaches have been proposed in recent years to ensure inferential validity. Here, we consider a simple alternative to data splitting based on randomising the response vector, which allows for higher selection and inferential power than the former and is applicable with an arbitrary selection rule. We provide a theoretical and empirical comparison of both methods and extend the randomisation approach to non-normal settings. Our investigations show that the gain in power can be substantial.
- Discussant: Tijana Zrnic (UC Berkeley)
- Links: [Relevant papers: paper #1][Slides]

Thursday, November 18, 2021 [Recording]
- Speaker: Cynthia Rush (Columbia University)
- Title: Characterizing the Type 1-Type 2 Error Trade-off for SLOPE
- Abstract: Sorted L1 regularization has been incorporated into many methods for solving high-dimensional statistical estimation problems, including the SLOPE estimator in linear regression. In this talk, we study how this relatively new regularization technique improves variable selection by characterizing the optimal SLOPE trade-off between the false discovery proportion (FDP) and true positive proportion (TPP) or, equivalently, between measures of type I and type II error. Additionally, we show that on any problem instance, SLOPE with a certain regularization sequence outperforms the Lasso, in the sense of having a smaller FDP, larger TPP and smaller L2 estimation risk simultaneously. Our proofs are based on a novel technique that reduces a variational calculus problem to a class of infinite-dimensional convex optimization problems and a very recent result from approximate message passing (AMP) theory. With SLOPE being a particular example, we discuss these results in the context of a general program for systematically deriving exact expressions for the asymptotic risk of estimators that are solutions to a broad class of convex optimization problems via AMP.
- Discussant: Yuting Wei (University of Pennsylvania)
- Links: [Relevant papers: paper #1, paper #2, paper #3]

Thursday, November 11, 2021 [Recording]
- Speaker: Shuangning Li (Stanford University)
- Title: Deploying the Conditional Randomization Test in High Multiplicity Problems
- Abstract: This paper introduces the sequential CRT, which is a variable selection procedure that combines the conditional randomization test (CRT) and Selective SeqStep+. Valid p-values are constructed via the flexible CRT, which are then ordered and passed through the selective SeqStep+ filter to produce a list of discoveries. We develop theory guaranteeing control on the false discovery rate (FDR) even though the p-values are not independent. We show in simulations that our novel procedure indeed controls the FDR and are competitive with -- and sometimes outperform -- state-of-the-art alternatives in terms of power. Finally, we apply our methodology to a breast cancer dataset with the goal of identifying biomarkers associated with cancer stage.
- Discussant: Jingyi Jessica Li (UCLA)
- Links: [Relevant papers: paper #1] [Slides]

Thursday, November 4, 2021 [Recording]
- Speaker: Kai Zhang (The University of North Carolina at Chapel Hill)
- Title: BEAUTY Powered BEAST
- Abstract: We study nonparametric dependence detection with the proposed binary expansion approximation of uniformity (BEAUTY) approach, which generalizes the celebrated Euler's formula, and approximates the characteristic function of any copula with a linear combination of expectations of binary interactions from marginal binary expansions. This novel theory enables a unification of many important tests through approximations from some quadratic forms of symmetry statistics, where the deterministic weight matrix characterizes the power properties of each test. To achieve a robust power, we study test statistics with data-adaptive weights, referred to as the binary expansion adaptive symmetry test (BEAST). By utilizing the properties of the binary expansion filtration, we show that the Neyman-Pearson test of uniformity can be approximated by an oracle weighted sum of symmetry statistics. The BEAST with this oracle provides a benchmark of feasible power against any alternative by leading all existing tests with a substantial margin. To approach this oracle power, we develop the BEAST through a regularized resampling approximation of the oracle test. The BEAST improves the empirical power of many existing tests against a wide spectrum of common alternatives and provides clear interpretation of the form of dependency when significant.
- Discussant: Bhaswar Bhattacharya (University of Pennsylvania)
- Links: [Relevant papers: paper #1, paper #2][Slides]

Thursday, October 28, 2021 [Recording]
- Speaker: Chiara Sabatti (Stanford University)
- Title: Searching for consistent associations with a multi-environment knockoff filter
- Abstract: This paper develops a method based on model-X knockoffs to find conditional associations that are consistent across diverse environments, controlling the false discovery rate. The motivation for this problem is that large data sets may contain numerous associations that are statistically significant and yet misleading, as they are induced by confounders or sampling imperfections. However, associations consistently replicated under different conditions may be more interesting. In fact, consistency sometimes provably leads to valid causal inferences even if conditional associations do not. While the proposed method is flexible and can be deployed in a wide range of applications, this paper highlights its relevance to genome-wide association studies, in which consistency across populations with diverse ancestries mitigates confounding due to unmeasured variants. The effectiveness of this approach is demonstrated by simulations and applications to the UK Biobank data.
- Discussant: Niklas Pfister (University of Copenhagen)
- Links: [Relevant papers: paper #1][Slides]

Thursday, October 21, 2021 [Recording]
- Speaker: Yao Zhang (University of Cambridge)
- Title: Multiple conditional randomization tests
- Abstract: We propose a general framework for (multiple) conditional randomization tests that incorporate several important ideas in the recent literature. We establish a general sufficient condition on the construction of multiple conditional randomization tests under which their p-values are "independent", in the sense that their joint distribution stochastically dominates the product of uniform distributions under the null. Conceptually, we argue that randomization should be understood as the mode of inference precisely based on randomization. We show that under a change of perspective, many existing statistical methods, including permutation tests for (conditional) independence and conformal prediction, are special cases of the general conditional randomization test. The versatility of our framework is further illustrated with an example concerning lagged treatment effects in stepped-wedge randomized trials.
- Discussant: Panos Toulis (University of Chicago)
- Links: [Relevant papers: paper #1][Slides][Discussion slides]

Thursday, October 14, 2021 [Recording]
- Speaker: Byol Kim (University of Chicago)
- Title: Predictive inference is free with the jackknife+-after-bootstrap
- Abstract: Ensemble learning is widely used in applications to make predictions in complex decision problems --- for example, averaging models fitted to a sequence of samples bootstrapped from the available training data. While such methods offer more accurate, stable, and robust predictions and model estimates, much less is known about how to perform valid, assumption-lean inference on the output of these types of procedures. In this paper, we propose the jackknife+-after-bootstrap (J+aB), a procedure for constructing a predictive interval, which uses only the available bootstrapped samples and their corresponding fitted models, and is therefore "free" in terms of the cost of model fitting. The J+aB offers a predictive coverage guarantee that holds with no assumptions on the distribution of the data, the nature of the fitted model, or the way in which the ensemble of models are aggregated --- at worst, the failure rate of the predictive interval is inflated by a factor of 2. Our numerical experiments verify the coverage and accuracy of the resulting predictive intervals on real data. This work is joint with Chen Xu and Rina Foygel Barber.
- Discussant: Yachong Yang (University of Pennsylvania)
- Links: [Relevant papers: paper #1][Slides]

Thursday, October 7, 2021 [Recording]
- Speaker: Kenneth Hung (Facebook)
- Title: Statistical Methods for Replicability Assessment
- Abstract: Large-scale replication studies like the Reproducibility Project: Psychology (RP:P) provide invaluable systematic data on scientific replicability, but most analyses and interpretations of the data fail to agree on the definition of “replicability” and disentangle the inexorable consequences of known selection bias from competing explanations. We discuss three concrete definitions of replicability based on: (1) whether published findings about the signs of effects are mostly correct, (2) how effective replication studies are in reproducing whatever true effect size was present in the original experiment and (3) whether true effect sizes tend to diminish in replication. We apply techniques from multiple testing and post-selection inference to develop new methods that answer these questions while explicitly accounting for selection bias. Our analyses suggest that the RP:P dataset is largely consistent with publication bias due to selection of significant effects. The methods in this paper make no distributional assumptions about the true effect sizes.
- Discussant: Marcel van Assen (Tilburg University)
- Links: [Relevant papers: paper #1][Slides]

Thursday, September 30, 2021 [Recording]
- Speaker: Pallavi Basu (Indian School of Business)
- Title: Empirical Bayes Control of the False Discovery Exceedance
- Abstract: We propose an empirical Bayes procedure that guarantees control of the False Discovery eXceedance (FDX) by ranking and thresholding hypotheses based on their local false discovery rate (lfdr) test statistic. In a two-group independent model or Gaussian with exchangeable hypotheses, we show that ranking by the lfdr delivers the ``optimal'' ranking for FDX control. We propose a computationally efficient procedure that does not empirically lose validity and power and illustrate its properties by analyzing two million stock trading strategies.

Joint work with Luella Fu, Alessio Saretto, and Wenguang Sun.

- Discussant: Sebastian Döhler (Darmstadt University of Applied Sciences)
- Links: [Relevant papers: paper #1]

Thursday, August 12, 2021 [Recording]
- Speaker: Sanat K. Sarkar (Temple University)
- Title: Adjusting the Benjamini-Hochberg method for controlling the false discovery rate in knockoff-assisted variable selection
- Abstract: The knockoff-based multiple testing setup of Barber & Candès (2015) for variable selection in multiple regression where sample size is as large as the number of explanatory variables is considered. The Benjamini-Hochberg method based on ordinary least squares estimates of the regression coefficients is adjusted to the setup, transforming it to a valid p-value based FDR controlling method not relying on any specific correlation structure of the explanatory variables. Simulations and real data applications show that our proposed method that is agnostic to $\pi_0$, the proportion of unimportant explanatory variables, and a data-adaptive version of it that uses an estimate of $\pi_0$ are powerful competitors of the FDR controlling methods in Barber & Candès (2015).
- Discussant: Lucas Janson (Harvard University)
- Links: [Relevant papers: paper #1]

Thursday, August 5, 2021 [Recording]
- Speaker: Snigdha Panigrahi (University of Michigan)
- Title: Approximate Methods for Joint Estimation of Group-sparse Parameters post Selection
- Abstract: In this talk, I will present a post-selective Bayesian framework to jointly and consistently estimate parameters within automatic group-sparse regression models. Selected through an indispensable class of learning algorithms, e.g. the Group LASSO, the overlapping Group LASSO, the sparse Group LASSO etc., uncertainty estimates for the matched parameters are unreliable in the absence of adjustments for selection bias. Limiting however the application of state of the art tools for the group-sparse problem include estimation strictly tailored to (i) real-valued projections onto very specific selected subspaces, (ii) selection events admitting representations as linear inequalities in the data variables. The proposed approximate Bayesian methods address these gaps by deriving an adjustment factor in an easily feasible analytic form that eliminates bias from the selection of promising groups. Paying a very nominal price for this adjustment, experiments on simulated data demonstrate the efficiency of our methods at a joint estimation of group-sparse parameters learned from data.

This talk is based upon joint work with Peter W. Macdonald and Daniel Kessler.

- Discussant: Joshua Loftus (London School of Economics)
- Links: [Relevant papers: paper #1][Slides]

Thursday, July 29, 2021 [Link to join]
- Speaker: Wesley Tansey (Memorial Sloan Kettering Cancer Center)
- Title: Efficient, robust, and powerful machine learning approaches to conditional independence testing
- Abstract: In this talk, I will present two approaches to conditional independence testing using deep neural networks. The first half of the talk focuses on the model-X knockoffs framework. I will present an optimization approach, Deep Direct Likelihood Knockoffs (DDLK), to learning the knockoff distribution directly through minimizing an adversarial swap objective. In the second half of the talk, I will shift to the conditional randomization test (CRT) framework. CRTs have higher power than knockoffs but come with a computational burden that generally makes them intractable. I will present an information-theoretic approach to CRTs, the Decoupled Independence Test (DIET), that overcomes this burden by reducing the CRT to a series of marginal independence tests. DIET estimates the residual information about the response and target variable after removing mutual information with the covariates. Under mild conditions, testing for conditional independence then reduces to testing for marginal independence between these two residuals. Both DDLK and DIET achieve higher power than existing methods and empirically control the target error rate in a broad class of benchmarks on synthetic and semi-synthetic data.
- Discussant: Thomas Berrett (University of Warwick)
- Links: [Relevant papers][Slides]

Thursday, July 22, 2021 [Recording]
- Speaker: Matthew Plumlee (Northwestern University)
- Title: Inexact computer model calibration: Concerns, controversy, credibility, and confidence
- Abstract: There has been a recent surge in statistical methods for calibration of inexact models. Alongside these developments, a controversy has emerged about the goals of calibration of inexact models. This talk will trace a swath of research stemming from twenty years ago and potential concerns are marked along the way. The talk will also present some new ideas in this setting that might help close some of these philosophical and practical issues.
- Discussant: Rui Tuo (Texas A&M University)
- Links: [Relevant papers: paper #1, paper #2][Slides]

Thursday, July 15, 2021 [Recording]
- Speaker: Armin Schwartzman (UCSD)
- Title: Spatial inference for excursion sets
- Abstract: Spatial inference for excursion sets refers to the problem of estimating the set of locations where a function is greater than a threshold. This problem appears in analyses of 2D climate data and 3D brain imaging data. The purpose of solving such a problem is to provide an alternative to the standard large-scale multiple testing approach, where all locations in an image are tested for the presence of signal. As sample sizes in large imaging studies keep increasing, the statistical power becomes sufficient to detect the presence of signal in large portions of the image, making it difficult to localize important effects. Moreover, the multiple testing approach does not provide a measure of spatial uncertainty. We directly address the question of where the important effects are by estimating excursion sets and by constructing spatial confidence sets, given as nested regions that spatially bound the true excursion set with a given probability. We develop this approach for excursion sets of the mean function in a signal-plus-noise model, including coefficients in pointwise regression models, and further extend it to the Cohen's d parameter in order to handle spatial heteroscedasticity. Examples and computational issues are discussed for 3D fMRI data.
- Discussant: Jelle Goeman (Leiden University)
- Links: [Relevant papers: paper #1, paper #2, paper #3][Slides][Discussion Slides]

Thursday, July 1, 2021 [Recording]
- Speaker: Xiao Li (UC Berkeley)
- Title: Whiteout: when do fixed-X knockoffs fail?
- Abstract: A core strength of knockoff methods is their virtually limitless customizability, allowing an analyst to exploit machine learning algorithms and domain knowledge without threatening the method’s robust finitesample false discovery rate control guarantee. While several previous works have investigated regimes where specific implementations of knockoffs are provably powerful, negative results are more difficult to obtain for such a flexible method. In this work we recast the fixed-X knockoff filter for the Gaussian linear model as a conditional post-selection inference method. It adds user-generated Gaussian noise to the ordinary least squares estimator βˆ to obtain a “whitened” estimator β˜ with uncorrelated entries, and performs inference using sgn(β˜j ) as the test statistic for Hj : βj = 0. We prove equivalence between our whitening formulation and the more standard formulation based on negative control predictor variables, showing how the fixed-X knockoffs framework can be used for multiple testing on any problem with (asymptotically) multivariate Gaussian parameter estimates. Relying on this perspective, we obtain the first negative results that universally upper-bound the power of all fixed-X knockoff methods, without regard to choices made by the analyst. Our results show roughly that, if the leading eigenvalues of Var(βˆ) are large with dense leading eigenvectors, then there is no way to whiten βˆ without irreparably erasing nearly all of the signal, rendering sgn(β˜j ) too uninformative for accurate inference. We give conditions under which the true positive rate (TPR) for any fixed-X knockoff method must converge to zero even while the TPR of Bonferroni-corrected multiple testing tends to one, and we explore several examples illustrating this phenomenon.
- Discussant: Asher Spector (Harvard University)
- Links: [Relevant papers: paper #1][Slides]

Thursday, June 24, 2021 [Recording]
- Speaker: Jason Hsu (The Ohio State University)
- Title: Confident Directional Selective Inference, from Multiple Comparisons with the Best to Precision Medicine
- Abstract: MCB (multiple comparisons with the best, 1981, 1984), comparing treatments to the best without knowing which one is the best, can be considered an early example of selective inference. With the thinking that "there is only one true best", the relevance of MCB to this presentation is it led to the Partitioning Principle, which is essential for deriving confidence sets for stepwise tests. Inference based on confidence sets control the directional error rate, inference based on tests of equalities may not.

The FDA gave Accelerated Approval to Aduhelm^{TM} (aducanumab) for Alzheimer's Disease (AD) on 8 June 2021, based on its reduction of beta-amyloid plaque (a surrogate biomarker endpoint). When clinical efficacy of a treatment for the overall population is not shown, genome-wide association studies (GWAS) are often used to discover SNPs that might predict efficacy in subgroups. In the process of working on GWAS with real data, we came to realization that, if one causal SNP makes its zero-null hypothesis false, then all other zero-null hypotheses are statistically false as well. While the majority of no-association null hypotheses might well be true biologically, statistically they are false (if one is false) in GWAS. I will indeed illustrate this with a causal SNP for the ApoE gene which is involved in the clearance of beta-amyloid plaque in AD. We suggest our confidence interval CE4 approach instead.

Targeted therapies such as OPDIVO and TECENTRIQ naturally have patient subgroups, already defined by the extent to which the drug target is present or absent in them, subgroups that may derive differential efficacy. An additional danger of testing equality nulls in the presence of subgroups is that the illusory logical relationships among efficacy in subgroups and their mixtures created by exact quality nulls leads to too drastic a stepwise multiplicity reduction, resulting in inflated directional error rates, as I will explain. Instead, Partition Tests, which would be called Confident Direction methods in the language of Tukey, might be safer to use.

- Discussant: Will Fithian (UC Berkeley)
- Links: [Relevant papers: paper #1][Slides]

Thursday, June 17, 2021 [Recording]
- Speaker: Patrick Chao (University of Pennsylvania)
- Title: AdaPT-GMM: Powerful and robust covariate-assisted multiple testing
- Abstract: We propose a new empirical Bayes method for covariate-assisted multiple testing with false discovery rate (FDR) control, where we model the local false discovery rate for each hypothesis as a function of both its covariates and p-value. Our method refines the adaptive p-value thresholding (AdaPT) procedure by generalizing its masking scheme to reduce the bias and variance of its false discovery proportion estimator, improving the power when the rejection set is small or some null p-values concentrate near 1. We also introduce a Gaussian mixture model for the conditional distribution of the test statistics given covariates, modeling the mixing proportions with a generic user-specified classifier, which we implement using a two-layer neural network. Like AdaPT, our method provably controls the FDR in finite samples even if the classifier or the Gaussian mixture model is misspecified. We show in extensive simulations and real data examples that our new method, which we call AdaPT-GMM, consistently delivers high power relative to competing state-of-the-art methods. In particular, it performs well in scenarios where AdaPT is underpowered, and is especially well-suited for testing composite null hypothesis, such as whether the effect size exceeds a practical significance threshold.
- Discussant: Patrick Kimes (Genentech)
- Links: [Relevant papers: paper #1][Slides]

Thursday, June 10, 2021 [Recording]
- Speaker: Wooseok Ha (UC Berkeley)
- Title: Interpreting deep neural networks in a transformed domain
- Abstract: Machine learning lies at the heart of new possibilities for scientific discovery, knowledge generation, and artificial intelligence. Its potential benefits to these fields require going beyond predictive accuracy and focusing on interpretability. In particular, many scientific problems require interpretations in domain-specific interpretable feature space (e.g. the frequency or wavelet domain) whereas attributions to the raw features (e.g. the pixel space) may be unintelligible or even misleading. To address this challenge, we propose TRIM (Transformation Importance), a novel approach which attributes importances to features in a transformed space and can be applied post-hoc to a fully trained model. We focus on a problem in cosmology, where it is crucial to interpret how a model trained on simulations predicts fundamental cosmological parameters. By using TRIM in interesting ways, we next introduce adaptive wavelet distillation (AWD), a method that aims to distill information from a trained neural network into a wavelet transform. Specifically, AWD penalizes feature attributions of a neural network in the wavelet domain to learn an effective multi-resolution wavelet transform. The resulting model is highly predictive, concise, computationally efficient, and has properties (such as a multi-scale structure) which make it easy to interpret. We showcase how AWD addresses challenges in two real-world settings: cosmological parameter inference and molecular-partner prediction. In both cases, AWD informs predictive features that are scientifically meaningful in the context of respective domains.
- Discussant: Sarah Tan (Facebook)
- Links: [Relevant papers: paper #1][Slides]

Thursday, June 3, 2021 [Recording]
- Speakers: Song Zhai (UC Riverside)
- Title: Learning from Real World Data About Combinatorial Treatment Selection for COVID-19
- Abstract: COVID-19 is an unprecedented global pandemic with a serious negative impact on virtually every part of the world. Although much progress has been made in preventing and treating the disease, much remains to be learned about how best to treat the disease while considering patient and disease characteristics. This paper reports a case study of combinatorial treatment selection for COVID-19 based on real-world data from a large hospital in Southern China. In this observational study, 417 confirmed COVID-19 patients were treated with various combinations of drugs and followed for four weeks after discharge (or until death). Treatment failure is defined as death during hospitalization or recurrence of COVID-19 within four weeks of discharge. Using a virtual multiple matching method to adjust for confounding, we estimate and compare the failure rates of different combinatorial treatments, both in the whole study population and in subpopulations defined by baseline characteristics. Our analysis reveals that treatment effects are substantial and heterogeneous, and that the optimal combinatorial treatment may depend on baseline age, systolic blood pressure, and c-reactive protein level. Using these three variables to stratify the study population leads to a stratified treatment strategy that involves several different combinations of drugs (for patients in different strata). Our findings are exploratory and require further validation.
- Discussant: Hongyuan Cao (Florida State University)
- Links: [Slides]

Thursday, May 27, 2021 [Recording]
- Speaker: Matthew Stephens (University of Chicago)
- Title: A simple new approach to variable selection in regression, with application to genetic fine-mapping
- Abstract: We introduce a simple new approach to variable selection in linear regression, with a particular focus on quantifying uncertainty in which variables should be selected. The approach is based on a new model — the “Sum of Single Effects” (SuSiE) model — which comes from writing the sparse vector of regression coefficients as a sum of “single-effect” vectors, each with one non-zero element. We also introduce a corresponding new fitting procedure — Iterative Bayesian Stepwise Selection (IBSS) — which is a Bayesian analogue of stepwise selection methods. IBSS shares the computational simplicity and speed of traditional stepwise methods, but instead of selecting a single variable at each step, IBSS computes a distribution on variables that captures uncertainty in which variable to select. We provide a formal justification of this intuitive algorithm by showing that it optimizes a variational approximation to the posterior distribution under the SuSiE model. Further, this approximate posterior distribution naturally yields convenient novel summaries of uncertainty in variable selection, providing a Credible Set of variables for each selection. Our methods are particularly well-suited to settings where variables are highly correlated and detectable effects are sparse, both of which are characteristics of genetic fine-mapping applications. We demonstrate through numerical experiments that our methods outperform existing methods for this task, and illustrate their application to fine-mapping genetic variants influencing alternative splicing in human cell-lines. We also discuss the potential and challenges for applying these methods to generic variable selection problems.
- Discussant: Peter Bühlmann (ETH Zürich)
- Links: [Relevant papers: paper #1][Slides][Discussant Slides]

Thursday, May 20, 2021 [Recording]
- Speaker: Dan Kluger (Stanford University)
- Title: A central limit theorem for the Benjamini-Hochberg false discovery proportion under a factor model
- Abstract: The Benjamini-Hochberg (BH) procedure remains widely popular despite having limited theoretical guarantees in the commonly encountered scenario of correlated test statistics. Of particular concern is the possibility that the method could exhibit bursty behavior, meaning that it might typically yield no false discoveries while occasionally yielding both a large number of false discoveries and a false discovery proportion (FDP) that far exceeds its own well controlled mean. In this paper, we investigate which test statistic correlation structures lead to bursty behavior and which ones lead to well controlled FDPs. To this end, we develop a central limit theorem for the FDP in a multiple testing setup where the test statistic correlations can be either short-range or long-range as well as either weak or strong. The theorem and our simulations from a data-driven factor model suggest that the BH procedure exhibits severe burstiness when the test statistics have many strong, long-range correlations, but does not otherwise.
- Discussant: Grant Izmirlian (NCI DCP Biometry Research Group)
- Links: [Relevant papers: paper #1][Slides][Discussion Slides]

Thursday, May 13, 2021 [Recording]
- Speaker: Chirag Gupta (Carnegie Mellon University)
- Title: Recent advances in distribution-free uncertainty quantification
- Abstract: Uncertainty quantification seeks to supplement point predictions with estimates of confidence or reliability. In the distribution-free (DF) framework, we require these confidence estimates to make valid statistical claims that provably hold no matter how the data is distributed, as long as the training and test data follow the same distribution. We present some recent results in DF uncertainty quantification for classification and regression problems. First, we discuss nested conformal, a framework to produce prediction sets that are guaranteed to contain the true output with a pre-defined probability. We then describe an ensemble-based conformal algorithm, QOOB. QOOB has DF guarantees, is computationally efficient, and produces prediction sets that exhibit strong practical performance on regression tasks. Next, we describe the notion of calibration in binary classification and connect it to prediction sets and confidence intervals. This relationship leads to an impossibility result for continuous-output DF calibration. We then show DF calibration guarantees for a popular discrete-output calibration algorithm called histogram binning. Based on our guarantees, we make practical recommendations for choosing the number of bins in histogram binning.
- Discussant: Rina Foygel Barber (University of Chicago)
- Links: [Relevant papers: paper #1, paper #2, paper #3]

Thursday, May 6, 2021 [Recording]
- Speaker: Marie Perrot-Dockès (Université de Paris)
- Title: Post hoc false discovery proportion inference under a Hidden Markov Model
- Abstract: We address the multiple testing problem under the assumption that the true/false hypotheses are driven by a Hidden Markov Model (HMM), which is recognized as a fundamental setting to model multiple testing under dependence since the seminal work of Sun and Cai (2009). While previous work has concentrated on deriving specific procedures with a controlled False Discovery Rate (FDR) under this model, following a recent trend in selective inference, we consider the problem of establishing confidence bounds on the false discovery proportion (FDP), for a user-selected set of hypotheses that can depend on the observed data in an arbitrary way. We develop a methodology to construct such confidence bounds first when the HMM model is known, then when its parameters are unknown and estimated, including the data distribution under the null and the alternative, using a nonparametric approach. In the latter case, we propose a bootstrap-based methodology to take into account the effect of parameter estimation error. We show that taking advantage of the assumed HMM structure allows for a substantial improvement of confidence bound sharpness over existing agnostic (structure-free) methods, as witnessed both via numerical experiments and real data examples.
- Discussant: Jesse Hemerik (Wageningen University)
- Links: [Relevant papers: paper #1][Slides]

Thursday, April 29, 2021 [Recording]
- Speaker: Thorsten Dickhaus (University of Bremen)
- Title: Randomized p-values in replicability analysis
- Abstract: We will be concerned with testing replicability hypotheses for many endpoints simultaneously. This constitutes a multiple test problem with composite null hypotheses. Traditional p-values, which are computed under least favourable parameter configurations (LFCs), are over-conservative in the case of composite null hypotheses. As demonstrated in prior work, this poses severe challenges in the multiple testing context, especially when one goal of the statistical analysis is to estimate the proportion $\pi_0$ of true null hypotheses. We will discuss the application of randomized p-values in the sense of [1] in replicability analysis. By means of theoretical considerations as well as computer simulations, we will demonstrate that their usage typically leads to a much more accurate estimation of $\pi_0$ than the LFC-based approach. Furthermore, we will draw connections to other recently proposed methods for dealing with conservative p-values in the multiple testing context. Finally, we will present a real data example from genomics. The presentation is based on [2] and [3].
- Discussant: Ruth Heller (Tel Aviv University)
- Links: [Relevant papers: paper #1, paper #2, paper #3][Slides]

Thursday, April 22, 2021 [Recording]
- Speaker: Feng Ruan (UC Berkeley)
- Title: A Self-Penalizing Objective Function for Scalable Interaction Detection
- Abstract: We tackle the problem of nonparametric variable selection with a focus on discovering interactions between variables. With p variables there are O(ps) possible order-s interactions making exhaustive search infeasible. It is nonetheless possible to identify the variables involved in interactions with only linear computation cost, O(p). The trick is to maximize a class of parametrized nonparametric dependence measures which we call metric learning objectives; the landscape of these nonconvex objective functions is sensitive to interactions but the objectives themselves do not explicitly model interactions. Three properties make metric learning objectives highly attractive:

(a) The stationary points of the objective are automatically sparse (i.e. performs selection) -- no explicit ℓ1 penalization is needed.

(b) All stationary points of the objective exclude noise variables with high probability.

(c) Guaranteed recovery of all signal variables without needing to reach the objective's global maxima or special stationary points.

The second and third properties mean that all our theoretical results apply in the practical case where one uses gradient ascent to maximize the metric learning objective. While not all metric learning objectives enjoy good statistical power, we design an objective based on ℓ1 kernels that does exhibit favorable power: it recovers (i) main effects with n∼logp samples, (ii) hierarchical interactions with n∼logp samples and (iii) order-s pure interactions with n∼p^{2(s−1)}logp samples.

- Discussant: Sumanta Basu (Cornell University)
- Links: [Relevant papers: paper #1][Slides]

Thursday, April 15, 2021 [Recording]
- Speaker: Nikolaos Ignatiadis (Stanford University)
- Title: Confidence Intervals for Nonparametric Empirical Bayes Analysis
- Abstract: In an empirical Bayes analysis, we use data from repeated sampling to imitate inferences made by an oracle Bayesian with extensive knowledge of the data-generating distribution. Existing results provide a comprehensive characterization of when and why empirical Bayes point estimates accurately recover oracle Bayes behavior. In this work, we develop flexible and practical confidence intervals that provide asymptotic frequentist coverage of empirical Bayes estimands, such as the posterior mean or the local false sign rate. The coverage statements hold even when the estimands are only partially identified or when empirical Bayes point estimates converge very slowly. This is joint work with Stefan Wager.
- Discussant: Timothy Armstrong (Yale University)
- Links: [Relevant papers: paper #1][Slides]

Thursday, April 8, 2021 [Recording]
- Speaker: Hongyuan Cao (Florida State University)
- Title: Optimal False Discovery Rate Control For Large Scale Multiple Testing With Auxiliary Information
- Abstract: Large-scale multiple testing is a fundamental problem in high dimensional statistical inference. It is increasingly common that various types of auxiliary information, reflecting the structural relationship among the hypotheses, are available. Exploiting such auxiliary information can boost statistical power. To this end, we propose a framework based on a two-group mixture model with varying probabilities of being null for different hypotheses a priori, where a shape constrained relationship is imposed between the auxiliary information and the prior probabilities of being null. An optimal rejection rule is designed to maximize the expected number of true positives when average false discovery rate is controlled. Focusing on the ordered structure, we develop a robust EM algorithm to estimate the prior probabilities of being null and the distribution of p-values under the alternative hypothesis simultaneously. We show that the proposed method has better power than state-of-the-art competitors while controlling the false discovery rate, both empirically and theoretically. Extensive simulations demonstrate the advantage of the proposed method. Datasets from genome-wide association studies are used to illustrate the new methodology.
- Discussant: James Scott (University of Texas at Austin)
- Links: [Relevant papers: paper #1][Slides]

Thursday, April 1, 2021 [Recording]
- Speaker: Jingshen Wang (UC Berkeley)
- Title: Sharp Inference on Selected Subgroups in Observational Studies
- Abstract: In modern drug development, the broader availability of high-dimensional observational data provides opportunities for scientist to explore subgroup heterogeneity, especially when randomized clinical trials are unavailable due to cost and ethical constraints. However, a common practice that naively searches the subgroup with a high treatment level is often misleading due to the “subgroup selection bias.” More importantly, the nature of high-dimensional observational data has further exacerbated the challenge of accurately estimating the subgroup treatment effects. To resolve these issues, we provide new inferential tools based on resampling to assess the replicability of post-hoc identified subgroups from observational studies. Through careful theoretical justification and extensive simulations, we show that our proposed approach delivers asymptotically sharp confidence intervals and debiased estimates for the selected subgroup treatment effects in the presence of high-dimensional covariates. We further demonstrate the merit of the proposed methods by analyzing the UK Biobank data. The R package “debiased.subgroup" implementing the proposed procedures is available on GitHub.
- Discussant: Rui Wang (Harvard University)
- Links: [Relevant papers: paper #1]

Thursday, March 25, 2021 [Recording]
- Speaker: Jackson Loper (Columbia University)
- Title: Smoothed Nested Testing on Directed Acyclic Graphs
- Abstract: We consider the problem of multiple hypothesis testing when there is a logical nested structure to the hypotheses. When one hypothesis is nested inside another, the outer hypothesis must be false if the inner hypothesis is false. We model the nested structure as a directed acyclic graph, including chain and tree graphs as special cases. Each node in the graph is a hypothesis and rejecting a node requires also rejecting all of its ancestors. We propose a general framework for adjusting node-level test statistics using the known logical constraints. Within this framework, we study a smoothing procedure that combines each node with all of its descendants to form a more powerful statistic. We prove a broad class of smoothing strategies can be used with existing selection procedures to control the familywise error rate, false discovery exceedance rate, or false discovery rate, so long as the original test statistics are independent under the null. When the null statistics are not independent but are derived from positively-correlated normal observations, we prove control for all three error rates when the smoothing method is arithmetic averaging of the observations. Simulations and an application to a real biology dataset demonstrate that smoothing leads to substantial power gains.
- Discussant: Wenge Guo (New Jersey Institute of Technology)
- Links: [Relevant papers: paper #1]

Thursday, March 18, 2021 [Recording]
- Speaker: Ruodu Wang (University of Waterloo)
- Title: Multiple hypothesis testing with e-values and dependence
- Abstract: E-values have gained attention as potential alternatives to p-values as measures of uncertainty, significance and evidence. In brief, e-values are realized by random variables with expectation at most one under the null; examples include betting scores, (point null) Bayes factors, likelihood ratios and stopped supermartingales. We design a natural analog of the Benjamini-Hochberg (BH) procedure for false discovery rate (FDR) control that utilizes e-values, called the e-BH procedure, and compare it with the standard procedure for p-values. One of our central results is that, unlike the usual BH procedure, the e-BH procedure controls the FDR at the desired level---with no correction---for any dependence structure between the e-values. We illustrate that the new procedure is convenient in various settings of complicated dependence, structured and post-selection hypotheses, and multi-armed bandit problems. Moreover, the BH procedure is a special case of the e-BH procedure through calibration between p-values and e-values. Overall, the e-BH procedure is a novel, powerful and general tool for multiple testing under dependence, that is complementary to the BH procedure, each being an appropriate choice in different applications.
- Discussant: Lihua Lei (Stanford University)
- Links: [Relevant papers: paper #1, paper #2][Slides]

Thursday, March 11, 2021 [Recording]
- Speaker: Stephen Bates (UC Berkeley)
- Title: Distribution-Free, Risk-Controlling Prediction Sets
- Abstract: While improving prediction accuracy has been the focus of machine learning in recent years, this alone does not suffice for reliable decision-making. Deploying learning systems in consequential settings also requires calibrating and communicating the uncertainty of predictions. To convey instance-wise uncertainty for prediction tasks, we show how to generate set-valued predictions from a black-box predictor that control the expected loss on future test points at a user-specified level. Our approach provides explicit finite-sample guarantees for any dataset by using a holdout set to calibrate the size of the prediction sets. This framework enables simple, distribution-free, rigorous error control for many tasks, and we demonstrate it in five large-scale machine learning problems: (1) classification problems where some mistakes are more costly than others; (2) multi-label classification, where each observation has multiple associated labels; (3) classification problems where the labels have a hierarchical structure; (4) image segmentation, where we wish to predict a set of pixels containing an object of interest; and (5) protein structure prediction. Lastly, we discuss extensions to uncertainty quantification for ranking, metric learning and distributionally robust learning.
- Discussant: Vladimir Vovk (Royal Holloway, University of London)
- Links: [Relevant papers: paper #1][Slides]

Thursday, March 4, 2021 [Recording]
- Speaker: Boyan Duan (Carnegie Mellon University)
- Title: Interactive identification of individuals with positive treatment effect while controlling false discoveries
- Abstract: Out of the participants in a randomized experiment with anticipated heterogeneous treatment effects, is it possible to identify which ones have a positive treatment effect, even though each has only taken either treatment or control but not both? While subgroup analysis has received attention, claims about individual participants are more challenging. We frame the problem in terms of multiple hypothesis testing: we think of each individual as a null hypothesis (the potential outcomes are equal, for example) and aim to identify individuals for whom the null is false (the treatment potential outcome stochastically dominates the control, for example). We develop a novel algorithm that identifies such a subset, with nonasymptotic control of the false discovery rate (FDR). Our algorithm allows for interaction — a human data scientist (or a computer program acting on the human’s behalf) may adaptively guide the algorithm in a data-dependent manner to gain high identification power. We also propose several extensions: (a) relaxing the null to nonpositive effects, (b) moving from unpaired to paired samples, and (c) subgroup identification. We demonstrate via numerical experiments and theoretical analysis that the proposed method has valid FDR control in finite samples and reasonably high identification power.
- Discussant: Bikram Karmakar (University of Florida)
- Links: [Relevant papers: paper #1][Slides]

Thursday, February 25, 2021 [Recording]
- Speaker: Anna Vesely, University of Padua
- Title: Permutation-based true discovery guarantee by sum tests
- Abstract: Sum-based global tests are highly popular in multiple hypothesis testing. In this paper we propose a general closed testing procedure for sum tests, which provides confidence lower bounds for the proportion of true discoveries (TDP), simultaneously over all subsets of hypotheses. Our method allows for an exploratory approach, as simultaneity ensures control of the TDP even when the subset of interest is selected post hoc. It adapts to the unknown joint distribution of the data through permutation testing. Any sum test may be employed, depending on the desired power properties. We present an iterative shortcut for the closed testing procedure, based on the branch and bound algorithm. It converges to the full closed testing results, often after few iterations. Even if it is stopped early, it controls the TDP. The feasibility of the method for high dimensional data is illustrated on brain imaging data. We compare the properties of different choices for the sum test through simulations.
- Discussant: Pierre Neuvial (Institut de Mathématiques de Toulouse (IMT))
- Links: [Relevant papers: paper #1][Slides]

Thursday, February 18, 2021 [Recording]
- Speaker: Tijana Zrnic (UC Berkeley)
- Title: Post-Selection Inference via Algorithmic Stability
- Abstract: Modern approaches to data analysis make extensive use of data-driven model selection. The resulting dependencies between the selected model and data used for inference invalidate statistical guarantees derived from classical theories. The framework of post-selection inference (PoSI) has formalized this problem and proposed corrections which ensure valid inferences. Yet, obtaining general principles that enable computationally-efficient, powerful PoSI methodology with formal guarantees remains a challenge. With this goal in mind, we revisit the PoSI problem through the lens of algorithmic stability. Under an appropriate formulation of stability---one that captures closure under post-processing and compositionality properties---we show that stability parameters of a selection method alone suffice to provide non-trivial corrections to classical z-test and t-test intervals. Then, for several popular model selection methods, including the LASSO, we show how stability can be achieved through simple, computationally efficient randomization schemes. Our algorithms offer provable unconditional simultaneous coverage and are computationally efficient; in particular, they do not rely on MCMC sampling. Importantly, our proposal explicitly relates the magnitude of randomization to the resulting confidence interval width, allowing the analyst to tune interval width to the loss in utility due to randomizing selection. This is joint work with Michael I. Jordan.
- Discussant: Arun Kumar Kuchibhotla (Carnegie Mellon University)
- Links: [Relevant papers: paper #1][Slides]

Thursday, February 11, 2021 [Recording]
- Speaker: Jelle Goeman (Leiden University)
- Title: Only closed testing procedures are admissible for controlling false discovery proportions
- Abstract: We consider a general class of procedures controlling the tail probability of the number or proportion of false discoveries, either in a single (random) set or in several such sets simultaneously. This class includes, among others, (generalized) familywise error, false discovery exceedance, simultaneous false discovery proportion control, and other selective inference methods. We put these procedures in a general framework, formulating all of them as special cases of true discovery guarantee procedures. We formulate both necessary and sufficient conditions for admissibility. Most importantly, we show that all such procedures are either a special case of closed testing, or they can be uniformly improved by a closed testing procedure. The practical value of our results is illustrated by giving uniform improvements of existing selective inference procedures, achieved by formulating them as a closed testing procedures. In particular, we investigate when procedures controlling conditional familywise error rate, and data-splitting methods, can be uniformly improved by closed testing.
- Discussant: Will Fithian (UC Berkeley)
- Links: [Relevant papers: paper #1][Slides]

Thursday, February 4, 2021 [Recording]
- Speaker: Arian Maleki (Columbia University)
- Title: Comparing Variable Selection Techniques Under a High-Dimensional Asymptotic
- Abstract: In this talk, we discuss the problem of variable selection for linear models under the high-dimensional asymptotic setting, where the number of observations, n, grows at the same rate as the number of predictors, p. We consider two-stage variable selection techniques (TVS) in which the first stage obtains an estimate of the regression coefficients, and the second stage simply thresholds this estimate to select the “important” predictors. The asymptotic false discovery proportion (AFDP) and true positive proportion (ATPP) of these TVS are evaluated, and their optimality will be discussed.
- Discussant: Pragya Sur (Harvard University)
- Links: [Relevant papers: paper #1, paper #2][Slides]

Thursday, January 28, 2021 [Recording]
- Speaker: Ali Shojaie (University of Washington)
- Title: Nonparametric Inference for Infinite-Dimensional Parameters via a Generalized Score Test
- Abstract: Infinite-dimensional parameters that can be defined as the minimizer of a population risk arise naturally in many applications. Classic examples include the conditional mean function and the density function. Though there is extensive literature on constructing consistent estimators for infinite-dimensional risk minimizers, there is limited work on quantifying the uncertainty associated with such estimates via, e.g., hypothesis testing and construction of confidence regions. We propose a general inferential framework for infinite-dimensional risk minimizers as a nonparametric extension of the score test. We illustrate that our framework requires only mild assumptions and is applicable to a variety of estimation problems. In examples, we specialize our proposed methodology to estimation of regression functions with continuous outcomes and also consider a partially additive model as an extension of the more classical partially linear model.
- Discussant: Mladen Kolar (University of Chicago Booth School of Business)
- Links: [Slides]

Thursday, January 21, 2021 [Recording]
- Speaker: Etienne Roquain (Sorbonne Université)
- Title: Structured multiple testing: can one mimic the oracle?
- Abstract: Knowing the model structure can significantly help to perform a multiple testing inference. Hence, a general aim is to build a procedure mimicking the performances of the oracle, that is, of a benchmark procedure that knows (and uses) this structure. As a case in point, classical structures are derived from the famous two-group model or its extensions, by specifying particular assumptions on the corresponding parameters, as the null/alternative distributions, or the false/null occurrence process. We will discuss the issue of mimicking the oracle for the three following structures and various multiple testing error rates:
  (1) structure = Gaussian null distribution family, error rate= FDR (see https://arxiv.org/abs/1912.03109, joint work with Nicolas Verzelen and https://arxiv.org/abs/1809.08330, joint work with Alexandra Carpentier, Sylvain Delattre and Nicolas Verzelen)
  (2) structure = stochastic block model for the false/null occurrence process, error rate = FDR (see https://arxiv.org/abs/1907.10176, joint work with Tabea Rebafka and Fanny Villers)
  (3) structure = hidden Markov model for the false/null occurrence process, error rate = FDP confidence post hoc bound (preprint to come, joint work with Marie Perrot-Dockès, Gilles Blanchard and Pierre Neuvial) We will emphasize the work (1) above, and show that building a confidence region for the structure parameter can be fruitful to know whether mimicking the oracle is possible and how to mimic it when it is possible.
- Discussant: Ery Arias-Castro (UC San Diego)
- Links: [Relevant papers: paper #1, paper #2, paper #3] [Slides]

Thursday, January 14, 2021 [Recording]
- Speaker: Qingyuan Zhao (University of Cambridge)
- Title: Selecting and Ranking Individualized Treatment Rules With Unmeasured Confounding
- Abstract: It is common to compare individualized treatment rules based on the value function, which is the expected potential outcome under the treatment rule. Although the value function is not point-identified when there is unmeasured confounding, it still defines a partial order among the treatment rules under Rosenbaum’s sensitivity analysis model. We first consider how to compare two treatment rules with unmeasured confounding in the single-decision setting and then use this pairwise test to rank multiple treatment rules. We consider how to, among many treatment rules, select the best rules, and select the rules that are better than a control rule. The proposed methods are illustrated using two real examples, one about the benefit of malaria prevention programs to different age groups and another about the effect of late retirement on senior health in different gender and occupation groups.
- Discussant: Edward Kennedy (Carnegie Mellon University)
- Links: [Relevant paper] [Slides]

Thursday, January 7, 2021 [Recording]
- Speaker: Yuval Benjamini (Hebrew University of Jerusalem)
- Title: Localizing differences between correlation matrix populations in resting-state fMRI
- Abstract: Resting state fMRI consists of continuous neural-activity recordings over a period of several minutes without structured experimental manipulation. These measurements are summarized into a correlation matrix between activity in p predetermined brain-regions (p between 90 and 500). Neurologists are interested in identifying localized differences in correlation between, e.g. disease and control populations, but the relatively high noise, small samples and many comparisons make mass univariate approaches impractical due to low signal. Therefore, resting-state fMRI analysis can be a model problem for data-adaptive pooling of hypotheses.
  However, as I discuss in the talk, even static pooling of effects across different correlation values is not simple in this type of data. We reparametrize the matrix of differences between populations as p main effects representing change for each region, with the goal of replacing p^2/2 hypotheses with p main ones. For this new model, we derive likelihood estimators that require explicit or implicit characterisation of the dependence in the data. We show that the method preforms well on simulations, and discuss an example from Amnesia data.
  This is joint work with Itamar Faran, Michael Peer and Shahar Arzi.
- Discussant: Lucy Gao (University of Waterloo)
- Relevant links: [Slides]

Thursday, December 10, 2020 [Recording]
- Speaker: Toru Kitagawa (University College London)
- Title: Inference on Winners
- Abstract: Many empirical questions concern target parameters selected through optimization. For example, researchers may be interested in the effectiveness of the best policy found in a randomized trial, or the best-performing investment strategy based on historical data. Such settings give rise to a winner’s curse, where conventional estimates are biased and conventional confidence intervals are unreliable. This paper develops optimal confidence intervals and median-unbiased estimators that are valid conditional on the target selected and so overcome this winner’s curse. If one requires validity only on average over targets that might have been selected, we develop hybrid procedures that combine conditional and projection confidence intervals to offer further performance gains relative to existing alternatives. This is joint work with Isaiah Andrews and Adam McCloskey.
- Discussant: Kenneth Hung (Facebook)
- Links: [Relevant paper] [Slides]

Thursday, December 3, 2020 [Recording]
- Speaker: Jingyi Jessica Li (UCLA)
- Title: Clipper: p-value-free FDR control on high-throughput data from two conditions
- Abstract: High-throughput biological data analysis commonly involves the identification of “interesting” features (e.g., genes, genomic regions, and proteins), whose values differ between two conditions, from numerous features measured simultaneously. To ensure the reliability of such analysis, the most widely-used criterion is the false discovery rate (FDR), the expected proportion of uninteresting features among the identified ones. Existing bioinformatics tools primarily control the FDR based on p-values. However, obtaining valid p-values relies on either reasonable assumptions of data distribution or large numbers of replicates under both conditions, two requirements that are often unmet in biological studies. To address this issue, we propose Clipper, a general statistical framework for FDR control without relying on p-values or specific data distributions. Clipper is applicable to identifying both enriched and differential features from high-throughput biological data of diverse types. In comprehensive simulation and real-data benchmarking, Clipper outperforms existing generic FDR control methods and specific bioinformatics tools designed for various tasks, including peak calling from ChIP-seq data, differentially expressed gene identification from RNA-seq data, differentially interacting chromatin region identification from Hi-C data, and peptide identification from mass spectrometry data. Notably, our benchmarking results for peptide identification are based on the first mass spectrometry data standard that has a realistic dynamic range. Our results demonstrate Clipper’s flexibility and reliability for FDR control, as well as its broad applications in high-throughput data analysis.
- Discussant: Nikos Ignatiadis (Stanford University)
- Links: [Relevant paper] [Slides]

Thursday, November 19, 2020 [Recording]
- Speaker: Oscar Hernan Madrid Padilla (UCLA)
- Title: Optimal post-selection inference for sparse signals: a nonparametric empirical-Bayes
- Abstract: Many recently developed Bayesian methods have focused on sparse signal detection. However, much less work has been done addressing the natural follow-up question: how to make valid inferences for the magnitude of those signals after selection. Ordinary Bayesian credible intervals suffer from selection bias, owing to the fact that the target of inference is chosen adaptively. Existing Bayesian approaches for correcting this bias produce credible intervals with poor frequentist properties, while existing frequentist approaches require sacrificing the benefits of shrinkage typical in Bayesian methods, resulting in confidence intervals that are needlessly wide. We address this gap by proposing a nonparametric empirical-Bayes approach for constructing optimal selection-adjusted confidence sets. Our method produces confidence sets that are as short as possible on average, while both adjusting for selection and maintaining exact frequentist coverage uniformly over the parameter space. Our main theoretical result establishes an important consistency property of our procedure: that under mild conditions, it asymptotically converges to the results of an oracle-Bayes analysis in which the prior distribution of signal sizes is known exactly. Across a series of examples, the method outperforms existing frequentist techniques for post-selection inference, producing confidence sets that are notably shorter but with the same coverage guarantee. This is joint work with Spencer Woody and James G. Scott.
- Discussant: Małgorzata Bogdan (Uniwersytet Wroclawski, Instytut Matematyki)
- Links: [Relevant paper] [Slides]

Thursday, November 12, 2020 [Recording]
- Speaker: Peter Grünwald (Centrum Wiskunde & Informatica and Leiden University)
- Title: E is the New P: Tests that are safe under optional stopping, with an application to time-to-event data
- Abstract: The E-value is a notion of evidence which, unlike p-values, allows for effortlessly combining evidence from several tests, even in the common scenario where the decision to perform a new test depends on previous test outcomes. 'Safe' tests based on E-values generally preserve Type-I error guarantees under such `optional continuation', thereby potentially alleviating one of the main causes for the reproducibility crisis.
  E-values, also known as 'betting scores', are the basic constituents of test martingales and always-valid confidence sequences - a dormant cluster of ideas going back to Ville and Robbins and suddenly rapidly gaining popularity due to recent work by Vovk, Shafer, Ramdas and Wang. For simple nulls they are just likelihood ratios or Bayes factors, but for composite nulls it's trickier - we show how to construct them in this case using the 'joint information projection'. We then zoom in on time-to-event data and show how to define an E-value based on Cox' partial likelihood, illustrating with (hypothetical!) data on covid vaccine RCTs. If all research groups were to report their results in terms of E-values rather than p-values, then in principle, one could even do meta-analysis that retains an overall Type-I error guarantee - thus saving greatly on 'research waste'.
  Joint Work with R. de Heide, W. Koolen, A. Ly, M. Perez, R. Turner and J. Ter Schure.
- Discussant: Ruodu Wang (University of Waterloo)
- Links: [Relevant papers: paper #1, paper #2] [Slides]

Thursday, November 5, 2020 [Recording]
- Speaker: Gilles Blanchard (Université Paris Sud)
- Title: Agnostic post hoc approaches to false positive control
- Abstract: Classical approaches to multiple testing grant control over the amount of false positives for a specific method prescribing the set of rejected hypotheses. In practice many users tend to deviate from a strictly prescribed multiple testing method and follow ad-hoc rejection rules, tune some parameters by hand, compare several methods and pick from their results the one that suits them best, etc. This will invalidate standard statistical guarantees because of the selection effect. To compensate for any form of such ”data snooping”, an approach which has garnered significant interest recently is to derive ”user-agnostic”, or post hoc, bounds on the false positives valid uniformly over all possible rejection sets; this allows arbitrary data snooping from the user. We present two contributions: starting from a common approach to post hoc bounds taking into account the p-value level sets for any candidate rejection set, we analyze how to calibrate the bound under different assumptions concerning the distribution of p-values. We then build towards a general approach to the problem using a family of candidate rejection subsets (call this a reference family) together with associated bounds on the number of false positives they contain, the latter holding uniformly over the family. It is then possible to interpolate from this reference family to find a bound valid for any candidate rejection subset. This general program encompasses in particular the p-value level sets considered earlier; we illustrate its interest in a different context where the reference subsets are fixed and spatially structured. (Joint work with Pierre Neuvial and Etienne Roquain.)
- Discussant: Arun Kumar Kuchibhotla (Carnegie Mellon University)
- Links: [Relevant papers: paper #1, paper #2, paper #3] [Slides]

Thursday, October 29, 2020 [Recording]
- Speaker: Robert Lunde (University of Texas, Austin)
- Title: Resampling for Network Data
- Abstract: Network data, which represent complex relationships between different entities, have become increasingly common in fields ranging from neuroscience to social network analysis. To address key scientific questions in these domains, versatile inferential methods for network-valued data are needed. In this talk, I will discuss our recent work on network analogs of the three main resampling methods: subsampling, the jackknife, and the bootstrap. While network data are generally dependent, under the sparse graphon model, we show that these resampling procedures exhibit similar properties to their IID counterparts. I will also discuss related theoretical results, including central limit theorems for eigenvalues and a network Efron-Stein inequality. This is joint work with Purnamrita Sarkar and Qiaohui Lin.
- Discussant: Liza Levina (University of Michigan)
- Links: [Relevant papers: paper #1, paper #2, paper #3] [Slides]

Thursday, October 22, 2020 [Recording]
- Speaker: Yuan Liao (Rutgers University)
- Title: Deep Learning Inference on Semi-Parametric Models with Weakly Dependent Data
- Abstract: Deep Neural Networks (DNNs) are nonlinear sieves that can approximate nonlinear functions of high dimensional variables more effectively than various linear sieves (or series). This paper considers efficient inference (estimation and confidence intervals) of functionals of nonparametric conditional moment restrictions via penalized DNNs, for weakly dependent beta-mixing time series data. The functionals of interest are either known or unknown expected functionals, such as weighted average derivatives , averaged partial means and averaged squared partial derivatives. Nonparametric conditional quantile instrumental variable models are a particular example of interest in this paper. This is joint work with Jiafeng Chen, Xiaohong Chen, and Elie Tamer.
- Discussant: Matteo Sesia (University of Southern California)
- Links: [Slides]

Thursday, October 15, 2020 [Recording]
- Speaker: Zhimei Ren (Stanford University)
- Title: Derandomizing Knockoffs
- Abstract: Model-X knockoffs is a general procedure that can leverage any feature importance measure to produce a variable selection algorithm, which discovers true effects while rigorously controlling the number or fraction of false positives. Model-X knockoffs relies on the construction of synthethic random variables and is, therefore, random. In this paper, we propose a method for derandomizing model-X knockoffs. By aggregating the selection results across multiple runs of the knockoffs algorithm, our method provides stable decisions without compromising statistical power. The derandomization step is designed to be flexible and can be adapted to any variable selection base procedure. When applied to the base procedure of Janson et al. (2016), we prove that derandomized knockoffs controls both the per family error rate (PFER) and the k family-wise error rate (k-FWER). Further, we carry out extensive numerical studies demonstrating tight type-I error control and markedly enhanced power when compared with alternative variable selection algorithms. Finally, we apply our approach to multi-stage GWAS of prostate cancer and report locations on the genome that are significantly associated with the disease. When cross-referenced with other studies, we find that the reported associations have been replicated.
- Discussant: Richard Samworth (University of Cambridge)
- Links: [Relevant paper] [Slides]

Thursday, October 8, 2020 [Recording]
- Speaker: Nilesh Tripuraneni (UC Berkeley)
- Title: Single Point Transductive Prediction
- Abstract: Standard methods in supervised learning separate training and prediction: the model is fit independently of any test points it may encounter. However, can knowledge of the next test point $\mathbf{x}_{\star}$ be exploited to improve prediction accuracy? We address this question in the context of linear prediction, showing how techniques from semi-parametric inference can be used transductively to combat regularization bias. We first lower bound the $\mathbf{x}_{\star}$ prediction error of ridge regression and the Lasso, showing that they must incur significant bias in certain test directions. We then provide non-asymptotic upper bounds on the $\mathbf{x}_{\star}$ prediction error of two transductive prediction rules. We conclude by showing the efficacy of our methods on both synthetic and real data, highlighting the improvements single point transductive prediction can provide in settings with distribution shift. This is joint work with Lester Mackey.
- Discussant: Leying Guan (Yale University)
- Links: [Relevant paper] [Slides]

Thursday, October 1, 2020 [Recording]
- Speaker: Asaf Weinstein (Hebrew University of Jerusalem)
- Title: A Power Analysis for Knockoffs with the Lasso Coefficient-Difference Statistic
- Abstract: In a linear model with possibly many predictors, we consider variable selection procedures given by $\{1\leq j\leq p: |\widehat{\beta}_j(\lambda)| > t\}$, where $\widehat{\beta}(\lambda)$ is the Lasso estimate of the regression coefficients, and where $\lambda$ and $t$ may be data dependent. Ordinary Lasso selection is captured by using $t=0$, thus allowing to control only $\lambda$, whereas thresholded-Lasso selection allows to control both $\lambda$ and $t$. Figuratively, thresholded-Lasso opens up the possibility to look further down the Lasso path, which typically leads to dramatic improvement in power. This phenomenon has been quantified recently leveraging advances in approximate message-passing (AMP) theory, but the implications are actionable only when assuming substantial knowledge of the underlying signal.In this work we study theoretically the power of a knockoffs-calibrated counterpart of thresholded-Lasso that enables us to control FDR in the realistic situation where no prior information about the signal is available. Although the basic AMP framework remains the same, our analysis requires a significant technical extension of existing theory in order to handle the pairing between original variables and their knockoffs. Relying on this extension we obtain exact asymptotic predictions for the true positive proportion achievable at a prescribed type I error level. In particular, we show that the knockoffs version of thresholded-Lasso can (still) perform much better than ordinary Lasso selection if $\lambda$ is chosen by cross-validation on the augmented matrix. This is joint work with Malgorzata Bogdan, Weijie Su, Rina Foygel Barber and Emmanuel Candes.
- Discussant: Zheng (Tracy) Ke (Harvard University)
- Links: [Relevant paper] [Slides]

Thursday, September 24, 2020 [Recording]
- Speaker: Ruth Heller (Tel Aviv University)
- Title: Inference following aggregate level hypothesis testing
- Abstract: The practice of pooling several individual test statistics to form aggregate tests is common in many statistical applications where individual tests may be underpowered. Following aggregate-level testing, it is naturally of interest to infer on the individual units that drive the signal. Failing to account for selection will produce biased inference. We develop a hypothesis testing framework that guarantees control over false positives conditional on the selection by aggregate tests. We illustrate the usefulness of our procedures in two genomic applications: whole-genome expression quantitative loci (eQTL) analysis across multiple tissue types, and rare variant testing. This talk is based on joint works with Nilanjan Chatterjee, Abba Krieger, Amit Meir, and Jianxin Shi.
- Discussant: Jingshu Wang (University of Chicago)
- Links: [Relevant papers: paper #1, paper #2] [Slides]

Thursday, September 17, 2020 [Recording]
- Speaker: Hannes Leeb (University of Vienna)
- Title: Conditional Predictive Inference for High-Dimensional Stable Algorithms
- Abstract: We investigate generically applicable and intuitively appealing prediction intervals based on leave-one-out residuals. The conditional coverage probability of the proposed intervals, given the observations in the training sample, is close to the nominal level, provided that the underlying algorithm used for computing point predictions is sufficiently stable under the omission of single feature/response pairs. Our results are based on a finite sample analysis of the empirical distribution function of the leave-one-out residuals and hold in non-parametric settings with only minimal assumptions on the error distribution. To illustrate our results, we also apply them to high-dimensional linear predictors, where we obtain uniform asymptotic conditional validity as both sample size and dimension tend to infinity at the same rate. These results show that despite the serious problems of resampling procedures for inference on the unknown parameters (cf. Bickel and Freedman, 1983; El Karoui and Purdom, 2015; Mammen, 1996), leave-one-out methods can be successfully applied to obtain reliable predictive inference even in high dimensions.
  Joint work with Lukas Steinberger.
- Discussant: Yuansi Chen (ETH Zürich)
- Links: [Relevant paper] [Slides]

Thursday, September 10, 2020 [Recording]
- Speaker: Michael Celentano (Stanford University)
- Title: The Lasso with general Gaussian designs with applications to hypothesis testing
- Abstract: The Lasso is a method for high-dimensional regression, which is now commonly used when the number of covariates p is of the same order or larger than the number of observations n. Classical asymptotic normality theory is not applicable to this model for two fundamental reasons: (1) The regularized risk is non-smooth; (2) The distance between the estimator and the true parameter vector cannot be neglected. As a consequence, standard perturbative arguments that are the traditional basis for asymptotic normality fail.
  On the other hand, the Lasso estimator can be precisely characterized in the regime in which both n and p are large, while n/p is of order one. This characterization was first obtained in the case of standard Gaussian designs, and subsequently generalized to other high-dimensional estimation procedures. We extend the same characterization to Gaussian correlated designs with non-singular covariance structure.
  Using this theory, we study (i) the debiased Lasso, and show that a degrees-of-freedom correction is necessary for computing valid confidence intervals, (ii) confidence intervals constructed via a leave-one-out technique related to conditional randomization tests, and (iii) a simple procedure for hyper-parameter tuning which is provably optimal for prediction error under proportional asymptotics.
  Based on joint work with Andrea Montanari and Yuting Wei.
- Discussant: Dongming Huang (National University of Singapore)
- Links: [Relevant paper] [Slides]

Thursday, September 3, 2020 [Recording]
- Speaker: Rina Foygel Barber (University of Chicago)
- Title: Is distribution-free inference possible for binary regression?
- Abstract: For a regression problem with a binary label response, we examine the problem of constructing confidence intervals for the label probability conditional on the features. In a setting where we do not have any information about the underlying distribution, we would ideally like to provide confidence intervals that are distribution-free---that is, valid with no assumptions on the distribution of the data. Our results establish an explicit lower bound on the length of any distribution-free confidence interval, and construct a procedure that can approximately achieve this length. In particular, this lower bound is independent of the sample size and holds for all distributions with no point masses, meaning that it is not possible for any distribution-free procedure to be adaptive with respect to any type of special structure in the distribution.
- Discussant: Aaditya Ramdas (Carnegie Mellon University)
- Links: [Relevant paper] [Slides]

Thursday, August 27, 2020 [Recording]
- Speaker: Daniel Yekutieli (Tel Aviv University)
- Title: Bayesian selective inference
- Abstract: I will discuss selective inference from a Bayesian perspective. I will revisit existing work. I will demonstrate the effectiveness of Bayesian methods for specifying FDR-controlling selection rules and providing valid selection-adjusted marginal inferences in two simulated multiple testing examples: (a) Normal sequence model with continuous-valued parameters and (b) two-group model with dependent Normal observations.
- Discussant: Zhigen Zhao (Temple University)
- Links: [Relevant papers: paper #1, paper #2] [Slides]

Thursday, August 20, 2020 [Recording]
- Speaker: Eugene Katsevich (University of Pennsylvania)
- Title: The conditional randomization test in theory and in practice
- Abstract: Consider the problem of testing whether a predictor X is independent of a response Y given a covariate vector Z. If we have access to the distribution of X given Z (the Model-X assumption), the conditional randomization test (Candes et al., 2018) is a simple and powerful conditional independence test, which does not require any knowledge of the distribution of Y given X and Z. The key obstacle to the practical implementation of the CRT is its computational cost, due to its reliance on repeatedly refitting a statistical machine learning model on resampled data. This motivated the development of distillation, a technique which speeds up the CRT by orders of magnitude while sacrificing little or no power (Liu, Katsevich, Janson, and Ramdas, 2020). I will also discuss recent theoretical developments that help us understand how the choice of CRT test statistic impacts its power (Katsevich and Ramdas, 2020). Finally, I'll illustrate an application of the CRT to the analysis of single cell CRISPR regulatory screens, where it helps circumvent the difficulties of modeling single cell gene expression (Katsevich and Roeder, 2020).
- Discussant: Wesley Tansey (Memorial Sloan Kettering Cancer Center)
- Links: [Relevant papers: paper #1, paper #2, paper #3] [Slides]

Thursday, August 13, 2020 [Recording]
- Speaker: Lucy Gao (University of Waterloo)
- Title: Selective Inference for Hierarchical Clustering
- Abstract: It is common practice in fields such as single-cell transcriptomics to use the same data set to define groups of interest via clustering algorithms and to test whether these groups are different. Because the same data set is used for both hypothesis generation and hypothesis testing, simply applying a classical statistical test (e.g. the t-test) in this setting would yield an extremely inflated Type I error rate. We propose a selective inference framework for testing the null hypothesis of no difference in means between two clusters obtained using agglomerative hierarchical clustering. Using this framework, we can efficiently compute exact p-values for many commonly used linkage criteria. We demonstrate the utility of our test in simulated data and in single-cell RNA-seq data. This is joint work with Jacob Bien and Daniela Witten.
- Discussant: Yuval Benjamini (Hebrew University of Jerusalem)
- Links: [Slides]

Thursday, July 30, 2020 [Recording]
- Speaker: Kathryn Roeder (Carnegie Mellon University)
- Title: Adaptive approaches for augmenting genetic association studies with multi-omics covariates
- Abstract: To correct for a large number of hypothesis tests, most researchers rely on simple multiple testing corrections. Yet, new selective inference methodologies could improve power by enabling exploration of test statistics with covariates for informative weights while retaining desired statistical guarantees. We explore one such framework, adaptive p-value thresholding (AdaPT), in the context of genome-wide association studies (GWAS) under two types of regimes: (1) testing individual single nucleotide polymorphisms (SNPs) for schizophrenia (SCZ) and (2) the aggregation of SNPs into gene-based test statistics for autism spectrum disorder (ASD). In both settings, we focus on enriched expression quantitative trait loci (eQTLs) and demonstrate a substantial increase in power using flexible gradient boosted trees to account for covariates constructed with GWAS statistics from genetically-correlated phenotypes, as well as measures capturing association with gene expression and coexpression subnetwork membership. We address the practical challenges of implementing AdaPT in high-dimensional -omics settings, such as approaches for tuning gradient boosted trees without compromising error-rate control as well as handling the subtle issues of working with publicly available summary statistics (e.g., p-values reported to be exactly equal to one). Specifically, because a popular approach for computing gene-level p-values is based on an invalid approximation for the combination of dependent two-sided test statistics, it yields an inflated error rate. Additionally, the resulting improper null distribution violates the mirror-conservative assumption required for masking procedures. We believe our results are critical for researchers wishing to build new methods in this challenging area and emphasize that our pipeline of analysis can be implemented in many different high-throughput settings to ultimately improve power. This is joint work with Ronald Yurko, Max G’Sell, and Bernie Devlin.
- Discussant: Chiara Sabatti (Stanford University)
- Links: [Relevant paper] [Slides]

Thursday, July 23, 2020 [Recording]
- Speaker: Will Fithian (UC Berkeley)
- Title: Conditional calibration for false discovery rate control under dependence
- Abstract: We introduce a new class of methods for finite-sample false discovery rate (FDR) control in multiple testing problems with dependent test statistics where the dependence is fully or partially known. Our approach separately calibrates a data-dependent p-value rejection threshold for each hypothesis, relaxing or tightening the threshold as appropriate to target exact FDR control. In addition to our general framework we propose a concrete algorithm, the dependence-adjusted Benjamini-Hochberg (dBH) procedure, which adaptively thresholds the q-value for each hypothesis. Under positive regression dependence the dBH procedure uniformly dominates the standard BH procedure, and in general it uniformly dominates the Benjamini–Yekutieli (BY) procedure (also known as BH with log correction). Simulations and real data examples illustrate power gains over competing approaches to FDR control under dependence. This is joint work with Lihua Lei.
- Discussant: Etienne Roquain (Sorbonne Université)
- Links: [Relevant paper] [Slides]

Thursday, July 16, 2020 [Recording]
- Speaker: Arun Kumar Kuchibhotla (University of Pennsylvania)
- Title: Optimality in Universal Post-selection Inference
- Abstract: Universal post-selection inference refers to valid inference after an arbitrary variable selection in regression models. In the context of linear regression and GLMs, universal post-selection inference methods have been suggested by Berk et al. (2013, AoS) and Bachoc et al. (2020, AoS). Both these works use the so-called "max-t" approach to obtain valid inference after arbitrary variable selection. Although tight, this approach can lead to a conservative inference for several sub-models. (Tightness refers to the existence of a variable selection procedure for which the inference is exact/sharp.) In this talk, I present a different approach to universal post-selection inference called "Hierarchical PoSI" that scales differently for different sub-model sizes. The basic idea stems from pre-pivoting, introduced by Beran (1987, 1988, JASA) and also from multi-scale testing. Some numerical results will be presented to illustrate the benefits. No guarantees of optimality will be made.
- Discussant: Daniel Yekutieli (Tel Aviv University)
- Links: [Notes][Slides]

Thursday, July 9, 2020 [Recording]
- Speaker: Lihua Lei (Stanford University)
- Title: AdaPT: An interactive procedure for multiple testing with side information
- Abstract: We consider the problem of multiple‐hypothesis testing with generic side information: for each hypothesis we observe both a p‐value p i and some predictor x i encoding contextual information about the hypothesis. For large‐scale problems, adaptively focusing power on the more promising hypotheses (those more likely to yield discoveries) can lead to much more powerful multiple‐testing procedures. We propose a general iterative framework for this problem, the adaptive p‐value thresholding procedure which we call AdaPT, which adaptively estimates a Bayes optimal p‐value rejection threshold and controls the false discovery rate in finite samples. At each iteration of the procedure, the analyst proposes a rejection threshold and observes partially censored p‐values, estimates the false discovery proportion below the threshold and proposes another threshold, until the estimated false discovery proportion is below α . Our procedure is adaptive in an unusually strong sense, permitting the analyst to use any statistical or machine learning method she chooses to estimate the optimal threshold, and to switch between different models at each iteration as information accrues. This is a joint work with Will Fithian.
- Discussant: Kun Liang (University of Waterloo)
- Links: [Relevant paper] [Slides]

Thursday, July 2, 2020 [Recording]
- Speaker: Lucas Janson (Harvard University)
- Title: Floodgate: inference for model-free variable importance
- Abstract: Many modern applications seek to understand the relationship between an outcome variable Y and a covariate X in the presence of confounding variables Z = (Z_1,...,Z_p). Although much attention has been paid to testing whether Y depends on X given Z, in this paper we seek to go beyond testing by inferring the strength of that dependence. We first define our estimand, the minimum mean squared error (mMSE) gap, which quantifies the conditional relationship between Y and X in a way that is deterministic, model-free, interpretable, and sensitive to nonlinearities and interactions. We then propose a new inferential approach called floodgate that can leverage any regression function chosen by the user (including those fitted by state-of-the-art machine learning algorithms or derived from qualitative domain knowledge) to construct asymptotic confidence bounds, and we apply it to the mMSE gap. In addition to proving floodgate’s asymptotic validity, we rigorously quantify its accuracy (distance from confidence bound to estimand) and robustness. We demonstrate floodgate’s performance in a series of simulations and apply it to data from the UK Biobank to infer the strengths of dependence of platelet count on various groups of genetic mutations. This is joint work with Lu Zhang.
- Discussant: Weijie Su (University of Pennsylvania)
- Links: [Relevant paper] [Slides]

Thursday, June 25, 2020 [Recording]
- Speaker: Alexandra Carpentier (Otto-von-Guericke-Universität Magdeburg)
- Title: Adaptive inference and its relations to sequential decision making
- Abstract: Adaptive inference - namely adaptive estimation and adaptive confidence statements - is particularly important in high of infinite dimensional models in statistics. Indeed whenever the dimension becomes high or infinite, it is important to adapt to the underlying structure of the problem. While adaptive estimation is often possible, it is often the case that adaptive and honest confidence sets do not exist. This is known as the adaptive inference paradox. And this has consequences in sequential decision making. In this talk, I will present some classical results of adaptive inference and discuss how they impact sequential decision making. This is joint work with Andrea Locatelli, Matthias Loeffler, Olga Klopp, Richard Nickl, James Cheshire, and Pierre Menard.
- Discussant: Jing Lei (Carnegie Mellon University)
- Links: [Relevant papers: paper #1, paper #2, paper #3] [Slides]

Thursday, June 18, 2020 [Recording]
(Seminar hosted jointly with the CIRM-Luminy meeting on Mathematical Methods of Modern Statistics 2)
- Speaker: Weijie Su (University of Pennsylvania)
- Title: Gaussian Differential Privacy
- Abstract: Privacy-preserving data analysis has been put on a firm mathematical foundation since the introduction of differential privacy (DP) in 2006. This privacy definition, however, has some well-known weaknesses: notably, it does not tightly handle composition. In this talk, we propose a relaxation of DP that we term "f-DP", which has a number of appealing properties and avoids some of the difficulties associated with prior relaxations. First, f-DP preserves the hypothesis testing interpretation of differential privacy, which makes its guarantees easily interpretable. It allows for lossless reasoning about composition and post-processing, and notably, a direct way to analyze privacy amplification by subsampling. We define a canonical single-parameter family of definitions within our class that is termed "Gaussian Differential Privacy", based on hypothesis testing of two shifted normal distributions. We prove that this family is focal to f-DP by introducing a central limit theorem, which shows that the privacy guarantees of any hypothesis-testing based definition of privacy (including differential privacy) converge to Gaussian differential privacy in the limit under composition. This central limit theorem also gives a tractable analysis tool. We demonstrate the use of the tools we develop by giving an improved analysis of the privacy guarantees of noisy stochastic gradient descent. This is joint work with Jinshuo Dong and Aaron Roth.
- Discussant: Yu-Xiang Wang (UC Santa Barbara)
- Links: [Relevant papers: paper #1, paper #2, paper #3] [Slides]

Thursday, June 11, 2020 [Recording]
- Speaker: Dongming Huang (Harvard University)
- Title: Controlled Variable Selection with More Flexibility
- Abstract: The recent model-X knockoffs method selects variables with provable and non-asymptotical error control and with no restrictions or assumptions on the dimensionality of the data or the conditional distribution of the response given the covariates. The one requirement for the procedure is that the covariate samples are drawn independently and identically from a precisely-known distribution. In this talk, I will show that the exact same guarantees can be made without knowing the covariate distribution fully, but instead knowing it only up to a parametric model with as many as Ω(np) parameters, where p is the dimension and n is the number of covariate samples (including unlabeled samples if available). The key is to treat the covariates as if they are drawn conditionally on their observed value for a sufficient statistic of the model. Although this idea is simple, even in Gaussian models, conditioning on a sufficient statistic leads to a distribution supported on a set of zero Lebesgue measure, requiring techniques from topological measure theory to establish valid algorithms. I will demonstrate how to do this for medium-dimensional Gaussian models, high-dimensional Gaussian graphical models, and discrete graphical models. Simulations show the new approach remains powerful under the weaker assumptions. This talk is based on joint work with Lucas Janson.
- Discussant: Snigdha Panigrahi (University of Michigan)
- Links: [Relevant paper][Slides]

Thursday, June 4, 2020 [Recording]
- Speaker: Saharon Rosset (Tel Aviv University)
- Title: Optimal multiple testing procedures for strong control and for the two-group model
- Abstract: Multiple testing problems are a staple of modern statistics. The fundamental objective is to reject as many false null hypotheses as possible, subject to controlling an overall measure of false discovery, like family-wise error rate (FWER) or false discovery rate (FDR). We formulate multiple testing of simple hypotheses as an infinite-dimensional optimization problem, seeking the most powerful rejection policy which guarantees strong control of the selected measure. We show that for exchangeable hypotheses, for FWER or FDR and relevant notions of power, these problems lead to infinite programs that can provably be solved. We explore maximin rules for complex alternatives, and show they can be found in practice, leading to improved practical procedures compared to existing alternatives. We derive explicit optimal tests for FWER or FDR control for three independent normal means. We find that the power gain over natural competitors is substantial in all settings examined. We apply our optimal maximin rule to subgroup analyses in systematic reviews from the Cochrane library, leading to an increased number of findings compared to existing alternatives.
  As time permits I will also review our follow-up work on optimal rules for controlling FDR or positive FDR in the two-group model, in high dimension and under arbitrary dependence. Our results show substantial and interesting differences between the standard approach for controlling the mFDR and our new solutions, in particular we attain substantially increased power (expected number of true rejections).
  Joint work with Ruth Heller, Amichai Painsky and Udi Aharoni.
- Discussant: Wenguang Sun (University of Southern California)
- Links: [Relevant papers: paper #1, paper #2] [Slides]

Thursday May 28, 2020 [Recording]
- Speaker: Jingshu Wang (University of Chicago)
- Title: Detecting Multiple Replicating Signals using Adaptive Filtering Procedures
- Abstract: Replicability is a fundamental quality of scientific discoveries: we are interested in those signals that are detectable in different laboratories, study populations, across time etc. Unlike meta-analysis which accounts for experimental variability but does not guarantee replicability, testing a partial conjunction (PC) null aims specifically to identify the signals that are discovered in multiple studies. In many contemporary applications, ex. comparing multiple high-throughput genetic experiments, a large number M of PC nulls need to be tested simultaneously, calling for a multiple comparison correction. However, standard multiple testing adjustments on the M PC p-values can be severely conservative, especially when M is large and the signals are sparse. We introduce AdaFilter, a new multiple testing procedure that increases power by adaptively filtering out unlikely candidates of PC nulls. We prove that AdaFilter can control FWER and FDR as long as data across studies are independent, and has much higher power than other existing methods. We illustrate the application of AdaFilter with three examples: microarray studies of Duchenne muscular dystrophy, single-cell RNA sequencing of T cells in lung cancer tumors and GWAS for metabolomics.
- Discussant: Eugene Katsevich (Carnegie Mellon University)
- Links: [Relevant paper] [Slides]

Thursday, May 21, 2020 [Recording]
- Speaker: Yoav Benjamini (Tel Aviv University)
- Title: Confidence Intervals for selected parameters
- Abstract: Practical or scientific considerations may lead to selecting a subset of parameters as ‘important’. Inferences about the selected parameters often are based on the same data used for selection. We present a taxonomy of error-rates for selective confidence intervals then focus on controlling the probability that one or more intervals for selected parameter do not cover–the simultaneous over the selected (SoS) error-rate. We use two approaches to construct SoS-controlling confidence intervals for k location parameters out of m, deemed most important because their estimators are the largest. The new intervals improve substantially over Sidak intervals when k<<m, and approach Bonferroni corrected when k is close to m. (Joint work with Yotam Hechtlinger and Philip Stark)
- Discussant: Aaditya Ramdas (Carnegie Mellon University)
- Links: [Relevant paper] [Slides]

Thursday, May 14, 2020 [Recording]
- Speaker: Malgorzata Bogdan (Uniwersytet Wroclawski)
- Title: Adaptive Bayesian Version of SLOPE
- Abstract: Sorted L-One Penalized Estimation (SLOPE) is a convex optimization procedure for identifying predictors in large data bases. It extends the popular Least Absolute Shrinkage and Selection Estimator (LASSO) by replacing the L1 norm penalty with the Sorted L-One Norm. It provably controls FDR under orthogonal designs and yields asymptotically minimax estimators of regression coefficients in sparse high-dimensional regression. In this talk I will briefly introduce the method and explain problems with FDR control under correlated designs. We will then discuss a novel adaptive Bayesian version of SLOPE (ABSLOPE), which addresses these issues and allows for simultaneous variable selection and parameter estimation, despite the missing values. We will also discuss a strong screening rule for discarding predictors for SLOPE, which substantially speeds up the SLOPE and ABSLOPE algorithms .
- Discussant: Cynthia Rush (Columbia University)
- Links: [Slides] [Relevant papers: paper #1, paper #2, paper #3]

Thursday, May 7, 2020 [Recording]
- Speaker: Aldo Solari (University of Milano-Bicocca)
- Title: Exploratory Inference for Brain Imaging
- Abstract: Modern data analysis can be highly exploratory. In brain imaging, for example, researchers often highlight patterns of brain activity suggested by the data, but false discoveries are likely to intrude into this selection. How confident can the researcher be about a pattern that has been found, if that pattern has been selected from so many potential patterns?
  In this talk we present a recent approach - termed 'All-Resolutions Inference' (ARI) - that delivers lower confidence bounds to the number of true discoveries in any selected set of voxels. Notably, these bounds are simultaneously valid for all possible selections. This allows a truly interactive approach to post-selection inference, that does not set any limits on the way the researcher chooses to perform the selection.
- Discussant: Genevera Allen (Rice University)
- Links: [Relevant papers: paper #1, paper #2, paper #3] [Slides]

Thursday, Apr 30, 2020 [Recording]
- Speaker: Yingying Fan (University of Southern California)
- Title: Universal Rank Inference via Residual Subsampling with Application to Large Networks
- Abstract: Determining the precise rank is an important problem in many large-scale applications with matrix data exploiting low-rank plus noise models. In this paper, we suggest a universal approach to rank inference via residual subsampling (RIRS) for testing and estimating rank in a wide family of models, including many popularly used network models such as the degree corrected mixed membership model as a special case. Our procedure constructs a test statistic via subsampling entries of the residual matrix after extracting the spiked components. The test statistic converges in distribution to the standard normal under the null hypothesis, and diverges to infinity with asymptotic probability one under the alternative hypothesis. The effectiveness of RIRS procedure is justified theoretically, utilizing the asymptotic expansions of eigenvectors and eigenvalues for large random matrices recently developed in Fan et al. (2019a) and Fan et al. (2019b). The advantages of the newly suggested procedure are demonstrated through several simulation and real data examples. This work is joint with Xiao Han and Qing Yang.
- Discussant: Yuekai Sun (University of Michigan)
- Links: [Relevant paper] [Slides]

Thursday, Apr 23, 2020 [Recording]
- Speaker: Aaditya Ramdas (Carnegie Mellon University)
- Title: Ville’s inequality, Robbins’ confidence sequences, and nonparametric supermartingales
- Abstract:

Standard textbook confidence intervals are only valid at fixed sample sizes, but scientific datasets are often collected sequentially and potentially stopped early, thus introducing a critical selection bias. A "confidence sequence” is a sequence of intervals, one for each sample size, that are uniformly valid over all sample sizes, and are thus valid at arbitrary data-dependent sample sizes. One can show that constructing the former at every time step guarantees false coverage rate control, while constructing the latter at each time step guarantees post-hoc familywise error rate control. We show that at a price of about two (doubling of width), pointwise asymptotic confidence intervals can be extended to uniform nonparametric confidence sequences. The crucial role of some beautiful nonnegative supermartingales will be made transparent in enabling “safe anytime-valid inference".
This talk will mostly feature joint work with Steven R. Howard (Berkeley, Voleon), Jon McAuliffe (Berkeley, Voleon), Jas Sekhon (Berkeley, Bridgewater) and recently Larry Wasserman (CMU) and Sivaraman Balakrishnan (CMU). I will also cover interesting historical and contemporary contributions to this area.

- Discussant: Wouter Koolen (Centrum Wiskunde & Informatica)

Links: [Relevant papers: paper #1, paper #2, paper #3, paper #4] [Slides]

Thursday, Apr 16, 2020 [Recording]

- Speaker: Emmanuel Candès (Stanford University)
- Title: Causal Inference in Genetic Trio Studies
- Abstract:

We introduce a method to rigorously draw causal inferences — inferences immune to all possible confounding — from genetic data that include parents and offspring. Causal conclusions are possible with these data because the natural randomness in meiosis can be viewed as a high-dimensional randomized experiment. We make this observation actionable by developing a novel conditional independence test that identifies regions of the genome containing distinct causal variants. The proposed Digital Twin Test compares an observed offspring to carefully constructed synthetic offspring from the same parents to determine statistical significance, and it can leverage any black-box multivariate model and additional non-trio genetic data to increase power. Crucially, our inferences are based only on a well-established mathematical model of recombination and make no assumptions about the relationship between the genotypes and phenotypes.

- Discussant: Matthew Stephens (University of Chicago)
- Links: [Relevant paper] [Slides]