8/29/2025
Title: Data Science-Powered Provider Profiling to Enhance Quality and Equity in Health Care Delivery
Speaker: Wenbo Wu, Assistant Professor, Department of Health Policy and Management, Johns Hopkins University
Abstract: Provider profiling is a widely used comparative evaluation tool to inform patients’ care decision making and to improve the quality of care delivered by health care providers. Based on standardized quality measures of patient outcomes, this process entails quantifying provider performance and pinpointing providers with subpar performance. Current methods for profiling activities rely on risk adjustment models with the linearity assumption, often too restrictive to characterize complex associations between risk factors and outcomes. Moreover, these methods, having been historically driven by the demand for controlling care expenditures, tend to pool all racial/ethnic groups without accounting for their socioeconomic heterogeneity. Despite the importance of distinguishing between cost-driven and equity-driven profiling, a theoretical framework capable of addressing these different but related profiling objectives is still lacking, due in part to the absence of a unifying approach that defines context-specific performance benchmarks. To address these issues, we propose a versatile probability framework based on hypothetical reference providers corresponding to specific profiling objectives. Furthermore, we develop flexible machine learning approaches that relax the linearity assumption. These methods will advance the methodology of provider profiling, thereby triggering improved care-seeking decision-making by patients and stakeholders and evidence-based accountability of providers.
4/25/2025
Title: Bayesian optimality of testing procedures for survival data in the nonproportional hazards setting
Speaker: Andrea Arfe, Assistant Attending Biostatistician, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center
Discussant/Visitor: Yingying Wei, Associate Professor, Department of Statistics, The Chinese University of Hong Kong
Abstract: Most statistical tests for treatment effects used in randomized clinical trials with survival outcomes are based on the proportional hazards assumption, which often fails in practice. Data from early exploratory studies may provide evidence of nonproportional hazards, which can guide the choice of alternative tests in the design of practice-changing confirmatory trials. We developed a test to detect treatment effects in a late-stage trial, which accounts for the deviations from proportional hazards suggested by early-stage data. Conditional on early-stage data, among all tests that control the frequentist Type I error rate at a fixed α level, our testing procedure maximizes the Bayesian predictive probability that the study will demonstrate the efficacy of the experimental treatment. Hence, the proposed test provides a useful benchmark for other tests commonly used in the presence of nonproportional hazards, for example, weighted log-rank tests. We illustrate this approach in simulations based on data from a published cancer immunotherapy phase III trial.
(Joint work with Lorenzo Trippa and Brian Alexander at Dana-Farber Cancer Institute)
4/11/2025
Speaker: Shouhao Zhou, Associate Professor, Department of Public Health Sciences, Division of Biostatistics and Bioinformatics, Penn State University
Title: Posterior Predictive (PoP) Design
Abstract: We propose a novel Bayesian model-assist trial design using the predictive Bayes factors, to determine the escalation and de-escalation boundaries for dose-finding trials. It overcomes the limitations of the previous model-assisted designs and serves as the first model-assisted design to guarantee global optimality and asymptotic convergence to true MTD. Intensive simulation results demonstrate superior operating characteristics.
3/28/2025
Speaker: Jiwei Zhao, Associate Professor, Dept. of Biostatistics and Medical Informatics, University of Wisconsin
Title: Statistical Benefits when Incorporating LLM-Derived Predictions: Old Wine in a New Bottle
Abstract: In biomedical studies involving electronic health records, manually extracting gold-standard phenotype data is labor-intensive and limited in scale. The rise of generative AI, particularly large language models (LLMs), offers a systematic and significantly faster alternative through predictions, such as automated computational phenotypes (ACPs). However, directly substituting gold-standard data with these predictions, without addressing their differences, can introduce biases and lead to misleading conclusions. To address this challenge, we adopt a semi-supervised learning framework that integrates both labeled data (with gold-standard annotations) and unlabeled data (without gold-standard annotations) under the covariate shift paradigm. We propose doubly robust and semiparametrically efficient estimators to infer general target parameters. Through a rigorous efficiency analysis, we compare scenarios with and without the incorporation of LLM-derived predictions. Furthermore, we situate our approach within existing literature, drawing connections to prediction-powered inference and its extensions, as well as some seemingly unrelated concept such as surrogacy. To validate our theoretical findings, we conduct extensive synthetic experiments and apply our method to real-world data, demonstrating its practical advantages
2/21/2025
Speaker: Tianchen Qian, Assistant Professor, Department of Statistics, University of California at Irvine
Title: Causal inference and machine learning in mobile health - modeling time-varying effects using longitudinal functional data
Abstract: To optimize mobile health interventions and advance domain knowledge on intervention design, it is critical to understand how the intervention effect varies over time and with contextual information. This study aims to assess how a push notification suggesting physical activity influences individuals’ step counts using data from the HeartSteps micro-randomized trial (MRT). The statistical challenges include the time-varying treatments and the longitudinal functional step count measurements. We propose the first semiparametric causal excursion effect model with varying coefficients to model the time-varying effects within a decision point and across decision points in an MRT. The proposed model incorporates double time indices to accommodate the longitudinal functional outcome, enabling the assessment of time-varying effect moderation by contextual variables. We propose a two-stage causal effect estimator that is robust against a misspecified high-dimensional outcome regression nuisance model. We establish asymptotic theory and conduct simulation studies to validate the proposed estimator. Our analysis provides new insights into individuals’ change in response profiles (such as how soon a response occurs) due to the activity suggestions, how such changes differ by the type of suggestions received, and how such changes depend on other contextual information such as being recently sedentary and the day being a weekday.
1/24/2025
Speaker: Krithika Suresh, Research Assistant Professor, University of Michigan at Ann Arbor
Title: Bounded hazard ratio Cox model for the effect of time to treatment on mortality
In resource-limited settings, there is often interest in assessing the effect of time to treatment (TTT) on subsequent mortality. We demonstrate that the traditional Cox proportional hazards model specifying the effect of TTT as a linear term in the log hazard ratio results in a mathematical anomaly that violates the expected monotonic TTT effect on survival (i.e., as TTT increases, survival probability should decrease). Additionally, the quantification of the time to treatment effect from these models is the hazard ratio, which provides an interpretation conditional on surviving until treatment rather than a quantification of the effect of delayed treatment at baseline. We propose a class of bounded hazard ratio (BHR) Cox models that attenuate the hazard ratio for TTT towards the null with increasing treatment time, such that hazard for death after treatment does not exceed the hazard without treatment. Estimation is performed using direct optimization of the partial log-likelihood, and we propose a linearized approximation to fit the model in standard software for large sample sizes. From BHR models, the estimated hazard ratio curve describes how the treatment effect diminishes with delays in treatment. Additionally, we present the marginal survival probability difference comparing immediate treatment to a treatment time in the future. We evaluate the performance of model estimation in a simulation study and demonstrate the use of this approach in an application to treatment for colon cancer using NCDB data.
2/7/2025
Speaker: Guoqing Diao, Professor, Department of Biostatistics and Bioinformatics, The George Washington University
Title: Estimating Predictive Margins and Marginal Effects
Abstract: Predictive margins and marginal effects are useful tools to interpret regression model results in biomedical and epidemiological research, especially for models of non-linear function forms. Proper estimation of the marginal effects and their variances is also called for, which is lacking in the existing statistical software for some commonly encountered data. This article discussed two use cases: survival analysis with competing risks and analysis of binary outcomes with hierarchical clustering. We reviewed the pros and cons of a few methods that have been proposed to handle competing risks, including the Fine-Gray model, cause-specific hazard model, mixture models, and composite outcome approach. We also proposed to use a generalized bootstrap method to construct confidence intervals for the marginal effect, accounting for the clustering effect. As illustrations, we analyzed real data from a COVID Antimicrobial Resistance (AMR) study and a market-size study on new Gram-Negative Antibiotic Use. An R program implementing the proposed method with core code in C language is developed.