2021-2022



12/16/2022

Title: Surrogate endpoints validation using joint modeling, mediation analysis and meta-analytic data 

Speaker: Quentin Le Coënt, Bordeaux Population Health, Bordeaux, France

Abstract: 

In clinical research, the use of surrogate endpoints speeds up the process of evaluating a treatment compared to the use of a final endpoint. Surrogate endpoints must first be statistically validated before being used in a new clinical trial. In this work we propose the validation of a surrogate endpoint based on a causal approach using mediation analysis which aims at decomposing the effect of the treatment on the final endpoint into an indirect effect through the surrogate endpoint and a direct effect independent of it. The total effect of the treatment is therefore the sum of the indirect effect and the direct effect. A surrogate endpoint will be validated if most of the treatment effect corresponds to the indirect effect (the treatment therefore mainly operates through the surrogate). We are particularly interested in the case where the endpoint is a time-to-event such as time to death and the surrogate endpoint is either a time-to-event or a longitudinal biomarker. Joint models have been developed from which direct and indirect treatment effects can be derived. These models also allow heterogeneous data from meta-analyses or multicenter trials to be taken into account in order to strengthen the validation process. Simulation studies were conducted to evaluate the performance of these approaches which were then applied to real cancer datasets: a meta-analysis in gastric cancer to evaluate time-to-relapse as a surrogate for overall survival and a multicenter trial in prostate cancer to evaluate the evolution over time of the level of prostate-specific antigen as a surrogate for disease-free survival.

12/2/2022

Title: Two Recent Statistical Regulatory Problems in Oncology: Treatment Switching and Dose Optimization.

Speaker: Erik Bloomquist, FDA 

Abstract: Drug development in solid tumor oncology in the past decade has been highly successful with several new classes of drugs becoming available to patients that have been shown to extend life. As part of these rapidly changing field, regulatory statisticians have encountered several issues that continued to be worked and discussed. In this talk, I will introduce the two such problems, present some research work, and also attempt to encourage more research work by the academic community in this area. The first problem will focus on treatment switching and how this can occur not only for disease progression, but when clinical trials reading out. The second problem will focus on early-stage dose optimization efforts and sample size considerations for such studies.



10/14/2022

Title: Single-arm phase II trials of combination therapies: A review of the CTEP experience 2008–2017

Speaker:  Jared Foster,  NCI (in-person)  Biostatistics Branch, Division of Cancer Treatment and Diagnosis, National Cancer Institute


Abstract:

Designing and interpreting single-arm phase II trials of combinations of agents is challenging because it can be difficult, based on historical data, to identify levels of activity for which the combination would be worth pursuing. We identified Cancer Therapy Evaluation Program single-arm combination trials that were activated in 2008–2017 and tabulated their design characteristics and results. Positive trials were evaluated as to whether they provided credible evidence that the combination was better than its constituents. A total of 125 trials were identified, and 120 trials had results available. Twelve had designs where eligible patients were required to be resistant or refractory to all but one element of the combination. Only 17.8% of the 45 positive trials were deemed to provide credible evidence that the combination was better than its constituents. Of the 10 positive trials with observed rates 10 percentage points higher than their upper (alternative hypothesis) targets, only five were deemed to provide such credible evidence. Many trials were definitively negative, with observed clinical activity at or below their lower (null hypothesis) targets. Ideally, use of single-arm combination trials should be restricted to settings where each agent is known to have minimal monotherapy activity (and a randomized trial is infeasible). In these settings, an observed signal is attributable to synergy and thus could be used to decide whether the combination is worth pursuing. In other settings, credible evidence can still be obtained if the observed activity is much higher than expected, but experience suggests that this is a rare occurrence.


9/30/2022

Title: Approximate maximum likelihood estimation of the mixture cure model from aggregated data

Speaker: John Rice, University of Michigan School of Public Health, Ann Arbor (virtual)


Abstract:

Research into vaccine hesitancy is a critical component of the public health enterprise, as rates of communicable diseases that are preventable by routine childhood immunization have been increasing in recent years. It is therefore important to estimate proportions of “never-vaccinators” in various subgroups of the population to successfully target interventions to improve childhood vaccination rates. However, due to privacy issues, it is sometimes difficult to obtain individual patient data (IPD) to perform the appropriate time-to-event analyses: state-level immunization information services may only be willing to share aggregated data with researchers. While some existing regression methods do not require IPD, they are unable to account for either differential follow-up or a cured fraction. In this work, I propose statistical methodology for the analysis of aggregated survival data that can accommodate a cured fraction based on an approximation to the mixture cure model log-likelihood function relying only on summary statistics. The proposed methods are demonstrated using both simulated and real data examples.



9/16/2022

Title: Using natural history and statistics to inform primary and secondary prevention strategies for cervical cancer

Speaker: Dr. Li C. Cheung, Stadtman Investigator, Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NCI.

I will cover two topics: (1) the development of risk-based management guidelines and (2) a proposed single-arm trial design for estimating efficacy of the HPV vaccine.

To address the appropriate management of individuals under an ever-changing screening landscape (i.e., the introduction of new screening technology, electronic health records providing greater access to patient histories, and HPV vaccination), representatives from 19 professional organizations agreed to change from issuing recommendations based on test results to recommendations based on precancer risk and a pre-agreed set of clinical action risk thresholds. Using electronic-health records from nearly 2 million women undergoing routine screening from 2003 to 2017, we estimated precancer risk for combinations of screening test results and relevant past histories. Because there can be precancers prevalent at the initial screen and precancer status is intermittently observed, resulting in left-, interval-, and right-censored time of precancer onset, we fit the data using prevalence-incidence mixture models (i.e., jointly estimated logistic regression and proportional hazards models). To inform the consensus risk thresholds, we provided to the working groups estimates of delayed diagnosis vs. colposcopic efficiency trade-offs. The new risk-based management recommendations were then externally validated using data from 2 trials, the New Mexico HPV-precancer registry, and a CDC program that provided screening for underinsured and uninsured individuals.

The WHO recently concluded that a single-dose of the HPV vaccine offers comparable protection as 2-3 doses; however, further studies confirming efficacy are needed before many countries are willing to adopt a single-dose regimen. However, randomized control trials (RCT) are expensive and enrolling an unvaccinated control group may be unethical. To address these issues, I propose a single arm trial design in which untargeted and unaffected HPV genotypes in vaccinated individuals can be used as a natural control. This new strategy leverages recent findings on the natural history of HPV infections: (1) type-specific incidence is proportional to type-specific prevalence, (2) similar “pure” risk of clearance among newly acquired HPV infections, and (3) long sojourn times from HPV acquisition to the development of precancerous lesions. In addition to not requiring an unvaccinated control arm, the new single-arm trial design can reduce the total sample size by 20-60% (depending on HPV prevalence and incidence) compared to an RCT.


5/27/2022

Title: Recent Oncology Trial Designs and Concepts: St. Jude Clinical Trials as Examples

Speaker: Haitao Pan, St. Jude Children's Research Hospital


Abstract: In this presentation, I will use pediatric oncology trials conducted at St. Jude to introduce some recent developments in the design of oncology clinical trials. Specifically, for phase I dose-finding trials, I will briefly introduce a Bayesian optimal interval design (BOIN) to introduce phase I dose-finding trials. Further, various other methods and challenges will be discussed. In phase IIa, two newly proposed methods (a nonrandomized Bayesian single-arm design for pediatric rhabdomyosarcoma patients and a randomized screened selection design for patients with Ewing sarcoma) will be briefly introduced. In phase IIb or phase III, an optimal multi-arm platform design for developing a pediatric osteosarcoma trial will be introduced. The goal of this presentation is to showcase the exemplary life of a biostatistician in the St. Jude Comprehensive Center.


5/20/2022

Title: Joint Modeling of a Time-to-Event and Partially Observed Biomarker and Estimating the Number of Men Living with Metastatic Prostate Cancer

Speaker: Theresa Devasia, National Cancer Institute, NIH

[Abstract] In clinical studies, the outcome of interest is often a time-to-event. In addition, longitudinal biomarker data representing a partially-observed latent biological process may be collected. To capture the dependence between failure and the biomarker process, a joint model is needed. In this joint model, the cumulative hazard of failure can be modeled as a Lévy process with a continuous time transformation representing the accumulated risk of failure. In this project, we derive the survival and hazard functions of failure under the assumption of a Gamma process cumulative hazard and different observation mechanisms for the biomarker. We focus on the case of marked survival wherein the marker and event are observed simultaneously. We apply our method to SEER prostate cancer incidence data, where men diagnosed with prostate cancer have prostate specific antigen (PSA) collected at diagnosis.

Metastatic prostate cancer (MPC) includes metastases detected at diagnosis (de novo) and those occurring later (recurrent). While the prevalence of prostate cancer (PC) is available in SEER, the prevalence of MPC has not previously been assessed. In this project, we used cancer registry data to estimate the number of men living with MPC in the US. We applied a back-calculation method to estimate MPC incidence and prevalence from US PC mortality and MPC survival. The method is based on an illness-death process and assumes that each observed PC death is preceded by a de novo or recurrent MPC diagnosis. We extracted PC mortality and de novo MPC relative survival from the SEER-18 registries between 2000-2017. We assumed equal survival for de novo and recurrent MPC. We estimate that on January 1, 2018, there were 120,368 men living with MPC in the US, with 55% diagnosed with early-stage disease who progressed to MPC.


Special Lectures

Lecture title: Competing Risks Data: Models, Methods & Data Applications

4/28/2022  Lecture 1: Background and introduction - Mei-Cheng Wang

5/5/2022    Lecture 2: Joint inference of multiple competing events - Jiyang Wen

5/12/2022  Lecture 3: Clinical trials with competing risks data - Chen Hu



4/29/2022

Title: Cluster Randomized Trials with Rare Events: Improved Study Design and the Application of Negative Binomial Regression

Speaker: Philip Westgate, Department of Biostatistics, University of Kentucky

[Abstract]  Cluster randomized trials (CRTs) randomize entire clusters of subjects to different trial arms. For example, the HEALing (Helping to End Addiction Long-termSM) Communities Study (HCS) is a multi-site (Kentucky, Massachusetts, New York and Ohio), parallel-group study in in which 67 communities are randomized to either an intervention or a wait-list control arm. The goal of the intervention is to reduce opioid-related overdose fatalities, which are expected to be rare events. As the intervention, and hence randomization, is at the community level, this is a unique cluster trial with rare events, large cluster sizes, and multiple constraints that had to be considered. Motivated by this study, we contrast traditional stepped wedge and parallel-group designs, and demonstrate how to potentially increase statistical power by modifying the traditional incomplete stepped wedge design. Second, we discuss traditional marginal modeling approaches in the CRT literature, demonstrate that negative binomial regression is an alternative modeling approach that can be employed and that may have utility over traditional approaches. Specific modeling examples, as well as analyses of data from communities participating in the HCS, are used to demonstrate concepts.



4/22/2022

Title: Obtaining Survival Curves to Quantify the Causal Effect of a Time-Dependent Treatment Using Matching Methods 

Speaker: Yun Li, Department of Biostatistics,  University of Pennsylvania

[Abstract] In observational studies of survival time featuring a time-dependent treatment, the hazard ratio (an instantaneous measure) is often used to represent the treatment effect. However, investigators are often more interested in the difference in survival functions. We proposed methods to estimate the causal effect of treatment among the treated with respect to survival probability. The objective is to compare post-treatment survival with the survival function that would have been observed in the absence of treatment. The proposed methods are applied to a national organ transplant registry.  



4/8/2022

Title: An Efficient Data Integration Scheme to Synthesize Information from Multiple Secondary Outcomes into the Main Data Analysis

Chixiang Chen, Univ of Maryland School of Medicine


Abstract: Depending on clinical design and research interest, many observational studies and clinical trials collect various secondary outcomes that are highly associated with the primary endpoint. In practice, these secondary outcomes are not directly involved in the primary model but are often treated separately in secondary models in post-hoc analyses. This talk will focus on one potential of synthesizing secondary outcomes to improve estimation precision in the main analysis. We propose an efficient and robust scheme of Multiple information Borrowing from secondary outcomes, named MinBo, which is shown to be robust to mis-specification of any secondary models and can substantially improve the estimation efficiency in the main model. Both theory and case studies showcase more efficiency gain by comparing MinBo to existing methods. The utility of MinBo is also validated by Atherosclerosis Risks in Communities (ARIC) study, where important risk factors to the development of hypertension are successfully detected.




2/25/2022 Student project presentations

Xiaobin Zhou: "Methods for Illness-Death Processes and Semicompeting Risks Data"

Chunnan Liu:  “Meta-analysis models for survival data”

Wanlu Chen: "A nonparametric-correction approach on Cox regression with replicated mismeasured covariates"



2/17/2022

Title: Student project presentations

Jennifer Xu: "Causal inference for survival data in large-scale observational studies" 



2/18/2022

Title: Student project presentations

Elias Sotirchos: "Risk of clinical relapse and infection during rituximab treatment in neuromyelitis optica and MOG antibody disease"

Scott Mu: "Self-Rated Health Predicts Recurrent Hospitalizations and Death: a 30 Year Cohort Study"

Wentao Zhan: "A nonparametric estimation of the illness probability in cross-sectional sampling"



2/10/2022

Title: Student project presentations

Lily Koffman: “Models and Methods for Recurrent Events with Dependent Censoring”

Jiafang Song: “Bias in cross-sectional sampling data in observational studies for binary outcome”

Kathleen Ridgeway: “Factors associated with HIV care loss to follow-up among young people Nigeria: Survival analysis of multiple-event-per-subject data”



2/4/2022

Joint SLAM/WIT working group meeting

Title: Estimating sleep and routine from sparse smartphone sensor data

Speaker: Ian J. Barnett, Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania

Abstract:  Data collected from wearable devices and smartphones can shed light on an individual's patterns of behavior and circadian routine. However, when a device or smartphone is not carried on the individual, or if missing data is present, using the device’s data as a proxy for the individual’s behavior can be flawed. In particular, sleep estimation based on smartphone use data is generally biased to have sleep duration estimates that are too large. This frequently results when a person’s smartphone is idle and not constantly in use while awake. We propose a sleep estimation procedure that accounts for non-use and missing data to adjust for this bias. We use a recurrent event proportional hazard regression to model transitions between latent idle/active use states, and we use these latent states to estimate nightly time-to-bed and time-to-wake using semi-supervised regularization to adjust for biases in sleep duration estimates. We demonstrate the accuracy of our approach by comparing with self-reported sleep in a cohort of adolescents and young adults with affective disorders.



12/10/2021

Title: Bivariate hierarchical Bayesian model for combining summary measures and their uncertainties from multiple sources

Speaker: Qixuan Chen, Department of Biostatistics, Columbia University

Abstract:  It is often of interest to combine available estimates of a similar quantity from multiple data sources. When the corresponding variances of each estimate are also available, a model should take into account the uncertainty of the estimates themselves as well as the uncertainty in the estimation of variances. In addition, if there exists a strong association between estimates and their variances, the correlation between these two quantities should also be considered. In this paper, we propose a bivariate hierarchical Bayesian model that jointly models the estimates and their estimated variances assuming a correlation between these two measures. We conduct simulations to explore the performance of the proposed bivariate Bayesian model and compare it to other commonly used methods under different correlation scenarios. The proposed bivariate Bayesian model has a wide range of applications. We illustrate its application in three very different areas: PET brain imaging studies, meta-analysis, and small area estimation. This is a joint work with Yujing Yao, Todd Ogden, and Chubing Zeng.


Bio: Dr. Qixuan Chen is Associate Professor and Director of MS program in Theory & Methods Track in the Department of Biostatistics at Columbia University. Her research interest includes survey sampling, missing data, measurement errors, and Bayesian analysis.



11/18/2021

Title: SLAM Group Presentation

Presenters:

Ravi Varadhan, Johns Hopkins University: How to speed up iterative algorithms?

Yuxin (Daisy) Zhu, Johns Hopkins University: Optimal Rule for Prefixed Tree Classifiers

Mei-Cheng Wang, Johns Hopkins University: SLAM working group activities



11/5/2021

Title: Penalized quantile regression

Speaker: Ben Sherwood, School of Business, University of Kansas

Abstract: Quantile regression directly models a conditional quantile. Penalized quantile regression constrains the regression coefficients similar to penalized mean regression. Quantile regression with a lasso penalty can also be framed as a linear programming problem. If a group lasso penalty is used, then it becomes a second order cone programming problem. These approaches become computationally burdensome for large values of n or p. Using a Huber approximation to the quantile function allows for the use of computationally efficient algorithms that require a differentiable loss function that can be implemented for both penalties. These algorithms then can be used as the backbones for implanting penalized quantile regression with other penalties such as Adaptive Lasso, SCAD, MCP and group versions of these penalties.



10/22/2021

Title: Modeling daily and weekly moderate and vigorous physical activity using zero-inflated mixture Poisson distribution

Speaker: Xiaonan Xue, Department of Epidemiology & Population Health (Biostatistics), Albert Einstein College of Medicine

Abstract: Recently developed accelerometer devices have been used in large epidemiological studies for continuous and objective monitoring of physical activities. Typically, physical movements are summarized as minutes in light, moderate and vigorous physical activities in each wearing day. Because of preponderance of zeros, zero-inflated distributions have been used for modeling the daily moderate or higher levels of physical activity. Yet, these models do not fully account for variations in daily physical activity and cannot be extended to model weekly physical activity explicitly; while the weekly physical activity is considered as an indicator for a subject’s average level of physical activity. To overcome these limitations, we propose to use a zero-inflated Poisson mixture distribution that can model daily and weekly physical activity in same family of mixture distributions. Under this method, the likelihood of an inactive day and the amount of exercise in an active day are simultaneously modeled by a joint random effects model to incorporate heterogeneity across participants. If needed, the method has the flexibility to include an additional random effect to address extra variations in daily physical activity. Maximum likelihood estimation can be obtained through Gaussian quadrature technique, which is implemented conveniently in an R package GLMMadaptive. Method performances are examined using simulation studies. The method is applied to data from the Hispanic Community Health Study/Study of Latinos to examine the relationship between physical activity and BMI groups and within a participant the difference in physical activity between weekends and weekdays.



10/1/2021

Title: Weibull racing survival analysis for competing events with time-varying covariates

Speaker: Quan Zhang, Department of Accounting and Information Systems, Michigan State University

Abstract: We propose Bayesian nonparametric Weibull delegate racing (WDR) to explicitly model surviving under competing events and to interpret how the covariates accelerate or decelerate the event times. WDR relaxes the ubiquitous proportional-hazards assumption and explains non-monotonic covariate effects by racing a potentially infinite number of sub-events without data transformation or sacrificing interpretability. Furthermore, it is able to handle left truncation, different types of censoring, time-varying covariates and missing event times or types. For inference, we develop a Gibbs-sampler-based MCMC algorithm along with a maximum a posteriori estimation for big data applications. We use synthetic data analysis to demonstrate the flexibility and parsimonious nonlinearity of WDR and real data sets to showcase the interpretability.



9/17/2021

Title: Transportability of risk prediction models

Speaker: Jon Steingrimsson, Department of Biostatistics, Brown University

Abstract: Prediction models are often used and/or interpreted in the context of a target population that differs from the study population used to develop the model (e.g., a different health-care system or a different geographic region). In this talk, we discuss how to tailor and evaluate a prediction model in the context of the target population when outcome and covariate information are available from the study data and covariate but no outcome information are available on a sample from the target population. In the second part of the talk, we focus on estimating the AUC in the target population. We provide conditions under which measures of model performance in the target population are identifiable. We develop and discuss theoretical properties of three estimation procedures - inverse-odds weighting, outcome model, and doubly robust estimators. Finite sample performance is evaluated using simulations and using data from a lung cancer screening trial.



5/21/2021 

Speaker:  Bo Huang, Ph.D., Senior Director, Pfizer Global Product Development

Title: Statistical Considerations for Non-Proportional Hazards: Methods and Applications


Abstract:

In statistical analysis of time-to-event data, the hazard ratio is the standard summary measure to quantify treatment benefit of an experimental drug versus the comparator. The hazard ratio is a function over time and is not constant unless the hazards are proportional. For hypothesis testing, the log-rank test is the gold-standard method in survival analysis. Although it is non-parametric, it is optimal under the proportional hazards assumption because of its link to the Cox proportional hazards model. When the proportional hazard assumption does not hold, the HR is difficult to interpret and the log-rank test is less than ideal. In this presentation, I will go over 3 types of methods dealing with the non-proportional hazards issue, namely, weighted or combination rank-based methods, restricted mean survival time, and generalized pairwise comparison method. Recent advances in methodology development and practical applications of these methods will be shared.



5/12/2021 Joint BLAST-SLAM Group Seminar

Speaker: Alan Riva Palacio Cohen, Ph.D., The Institute for Research in Applied Mathematics and Systems (IIMAS).

Title: Generalized Additive Neutral to the Right Regression for Survival Analysis.


Abstract:

We present a novel Bayesian nonparametric model for regression in survival analysis. The model builds on the neutral to the right model of Doksum (1974) and on the Cox proportional hazards model of Kim and Lee (2003). The use of a vector of dependent Bayesian nonparametric priors allows us to efficiently model the hazard as a function of covariates whilst allowing non-proportionality. Properties of the model and inference schemes will be discussed. The method will be illustrated using simulated and real data. (Joint work with Jim Griffin, University College Londom, U.K., and Fabrizio Leisen, University of Nottingham, U.K.)


5/7/2021

Title: Joint modelling of multivariate markers measured repeatedly over time and clinical endpoints

Speaker: Cécile Proust-Lima, Ph.D., Director of Research, Bordeaux Population Health Research Center

Abstract: Joint models for longitudinal and survival data are now widely used in biostatistics to address a variety of etiological and predictive questions. Originally designed to simultaneously analyze the trajectory of a single marker measured repeatedly over time, and the risk of a single right-censored time-to-event, joint models now need to capture increasingly complex longitudinal information to adapt to in-depth medical research questions. After an introduction of the general joint modelling methodology, I discuss two extensions proposed to handle multivariate repeated marker information: one using a latent class approach and one using a latent degradation process approach. The models are applied in the context of progression of Multi-System Atrophy, a rare neurodegenerative disease.


4/2/2021 

Title: Causal Discovery for Longitudinal Data with Functional Bayesian Networks

Speaker: Yang Ni, Assistant Professor of Statistics Texas A&M University

Abstract: Establishing causality is the ultimate goal in practically any science. The knowledge about causality is essential in predicting a system's behavior under external intervention. While controlled experimentation remains the gold standard for establishing causality, it may not be feasible in many applications especially for research on human. In this talk, we present a novel functional Bayesian network approach for generating causal hypothesis in longitudinal data. Our method has two main ingredients: (1) a functional PCA to represent longitudinal data using a set of functional bases (2) a Bayesian network built on the functional bases coefficients to represent causal relationships. These two ingredients are combined in a Bayesian hierarchical model. We will show some preliminary simulations as a proof of concept and welcome any suggestions.



3/19/2021

Title: A Wavelet-Based Independence Test for Functional Data with an Application to MEG Functional Connectivity

Speaker: Xiaoke Zhang, Assistant Professor, Department of Statistics, George Washington University

Abstract: Measuring and testing the dependency between multiple random functions is often an important task in functional data analysis. In the literature, a model-based method relies on a model which is subject to the risk of model misspecification, while a model-free method only provides a correlation measure which is inadequate to test independence. In this paper, we adopt the Hilbert-Schmidt Independence Criterion (HSIC) to measure the dependency between two random functions. We develop a two-step procedure by first pre-smoothing each function based on its discrete and noisy measurements and then applying the HSIC to recovered functions. To ensure the compatibility between the two steps such that the effect of the pre-smoothing error on the subsequent HSIC is asymptotically negligible when the data are densely measured, we propose a new wavelet thresholding method for pre-smoothing and to use Besov-norm-induced kernels for HSIC. We also provide the corresponding asymptotic analysis. The superior numerical performance of the proposed method over existing ones is demon- strated in a simulation study. Moreover, in an magnetoencephalography (MEG) data application, the functional connectivity patterns identified by the proposed method are more anatomically interpretable than those by existing methods.



3/5/2021

Title: A New Robust and Powerful Weighted Logrank Test

Speaker: Zhiguo Li, Associate Professor, Department of Biostatistics and Bioinformatics, Duke University


Abstract:

In the weighted logrank tests such as Fleming-Harrington test and the Tarone-Ware test, certain weight functions are used to put more weight on early, middle or late events.  The purpose is to maximize the power of the test.  The optimal weight under an alternative depends on the true hazard functions of the groups being compared, and thus cannot be applied directly.  We propose replacing the true hazard functions with their estimates and then using the estimated weights in a weighted logrank test.  However, the resulting test does not control type I error correctly because the weights converge to 0 under the null in large samples.  We then adjust the estimated optimal weights for correct type I error control while the resulting test still achieves improved power compared to existing weighted logrank tests, and it is shown to be robust in various scenarios.  Extensive simulation is carried out to assess the proposed method and it is applied in several clinical studies in lung cancer.



2/26/2021

Title: Bias and Randomization

Speaker: Jay Herson, Ph.D. Senior Associate, Biostatistics Johns Hopkins Bloomberg School of Public Health

Abstract  In 1980 I published the paper “Patient Registration in a Cooperative Oncology Group” in the first volume of Controlled Clinical Trials. The fortieth anniversary affords an opportunity to reflect on how much has changed and how much remains the same. The history of randomization 1980-2020 reflects the changes in what the research and regulatory communities considered as persuasive evidence of treatment efficacy and safety. This talk will encourage the audience to think about what they consider persuasive evidence. We begin our journey with the primitive oncology trials of 1980 before arriving at the highly regulated and complex trial design and analysis paradigm of today. The role of randomization is considered among the challenges of defining estimands, historical controls, real world data, Bayesian methods, pragmatic trials and COVID-19. The talk will conclude with imagining clinical trials in the year 2060 where machine learning using aggregate data may change our definition of persuasive evidence. We raise the question of whether different communities (industry, academia, regulators, clinicians, patients) can co-exist each with their own definitions of persuasive evidence and how will randomization fit into these definitions.



2/19/2021

Title: Error-prone failure time outcomes in electronic health records data: Methods for analysis and study design

Speaker: Pamela Shaw, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine


Abstract: Electronic health records (EHR) data present many opportunities as a cost-effective resource to support medical research, but with these opportunities come a number of analytical challenges. Real-world observational cohort and EHR data demonstrate the presence of measurement errors in event times, event classification, and exposures, with strong correlation in some settings between the magnitude of errors in these variables. These correlated errors can bias estimation and distort study results. Measurement error can also dramatically reduce power to detect the target associations. Novel methods to address measurement error in failure time outcomes will be presented. Validation studies, where gold standard observations on a subset of individuals are available, are generally necessary to apply these statistical methods. Aspects validation study design that can improve the precision of the bias-corrected study results will be discussed.



1/29/2021

Speaker:Jinbo Chen, Department of Biostatistics, University of Pennsylvania

Title: Novel Two-Phase Sampling Designs for Studying Binary Outcomes


Abstract:  In a biomedical cohort study for assessing the association between an outcome variable and a set of covariates, it is common that some covariates can only be measured on a subgroup of study subjects. An important design question is which subjects to select into the subgroup towards increased statistical efficiency. When the outcome is binary, one may adopt a case-control sampling design or a balanced case-control design where cases and controls are further matched on a small number of complete discrete covariates. While the latter achieves success in estimating odds ratio (OR) parameters for the matching covariates, similar two-phase design options have not been explored for the remaining covariates, especially the incompletely collected ones. This is of great importance in studies where the covariates of interest cannot be completely collected. To this end, utilizing a preliminary model relating the outcome and complete covariates, we propose a novel sampling scheme that oversamples cases and controls with worse goodness-of-fit based on the preliminary model and further matches them on complete covariates similarly to the balanced design. We develop a pseudo-likelihood method for estimating OR parameters. Through simulation studies and explorations in a real cohort study, we find that our design generally leads to reduced asymptotic variances of the OR estimates and the reduction for the matching covariates is comparable to that of the balanced design.