2019-2020


Meeting Calendar



12/11/2020 1:30-3:00pm


Title: Transformation model estimation of survival under dependent truncation and independent censoring

Speaker: Sy Han Steven Chiou, Assistant Professor, The University of Texas at Dallas


Abstract: Truncation is a mechanism that permits observation of selected subjects from a source population; subjects are excluded if their event times are not contained within subject-specific intervals. Standard survival analysis methods for estimation of the distribution of the event time require quasi-independence of failure and truncation. When quasi-independence does not hold, alternative estimation procedures are required; currently, there is a copula model approach that makes strong modeling assumptions, and a transformation model approach that does not allow for right censoring. We extend the transformation model approach to accommodate right censoring. We propose a regression diagnostic for assessment of model fit. We evaluate the proposed transformation model in simulations and apply it to the National Alzheimer’s Coordinating Centers autopsy cohort study, and an AIDS incubation study. Our methods are publicly available in an R package, tranSurv.


12/4/2020 1:30-3:00pm

Title: Obtaining Optimal Rule for a Prefixed Tree Classifier

Speaker: Yuxin (Daisy) Zhu, Department of Biostatistics, Johns Hopkins University

[Abstract] In biomedical practices, multiple biomarkers are often combined using a pre-specified classification rule with tree structure for diagnostic decisions. The classification structure and cutoff point at each node of a tree are usually chosen on ad-hoc basis, depending on decision makers' experience. There is a lack of analytical approaches that lead to optimal prediction performance, and that guide the choice of optimal cutoff points in a pre-specified classification tree. In this paper, we propose to search for and estimate the optimal decision rule through an approach of rank correlation maximization. The proposed method is flexible, theoretically sounded, and computationally feasible when there are many biomarkers available for classification or prediction. Using the proposed approach, for a pre-specified tree-structured classification rule, we are able to guide the choice of optimal cutoff points at tree nodes as well as to estimate optimal prediction performance from multiple biomarkers combined.


11/13/2020 1:30-3:00pm

Title: Likelihood ratio tests for meta-analysis and meta-regression based on random-effects models

Speaker: Dr. Chongzhi Di, Fred Hutchinson Cancer Research Center


[Abstract] In meta-analysis and meta-regression, random-effects models are widely used to combine results from multiple studies while accounting for heterogeneity. Existing testing procedures rely on asymptotic results, but they may not work well as the number of studies is typically small to moderate. Another difficulty arises due to the nonstandard situation where the variance component might be on the boundary of its parameter space under certain null hypothesis of interest. To address these challenges, we consider exact likelihood ratio tests for two hypotheses with boundary problems, the global null of no effects and homogeneity. Based on spectral decomposition, we characterize exact distributions of the likelihood ratio test under both the null and alternative hypotheses. This facilitates fast computation of not only the null distribution for p-values but also the power function, which allows comprehensive power comparison between tests based on random-effects and fixed-effects models and provide tools for practitioners in planning and designing their studies. The proposed test performs well regardless of the number of studies, and can have substantially higher power than tests based on fixed-effects models in the presence of heterogeneity.


9/25/2020 1:30-2:50pm

Title: Random Forests for Dependent Data

Speaker: Arkajyoti Saha, PhD Candidate, Department of Biostatistics, Johns Hopkins University

Abstract:

Random forests (RF) are widely popular for estimating regression functions but little attention has been paid to impact of spatial/serial data correlation on RF. Use of intra-node means and variances to create decision trees for RF ignore dependence of data across all nodes. Also, resampling used in RF violates the principles of bootstrap under correlation. These lead to poor estimation and prediction performance of RF for dependent data.

We propose RF-GLS, a novel and well-principled extension of RF for dependent data, in the same way generalized least squares (GLS) fundamentally extends ordinary least squares (OLS) for linear models. Exploiting the representation of regression trees as recursive OLS optimization, we propose switching to GLS loss that explicitly accounts for the spatial/serial autocorrelation in the estimation procedure. GLS loss also ensures resampling of uncorrelated contrasts and not of correlated data to create a forest of trees. RF becomes a special case of RF-GLS with an identity working covariance matrix. For spatial data, RF-GLS can be used in conjunction with Gaussian Processes (GP) for spatial prediction using kriging. For big spatial data, RF-GLS seamlessly harmonizes with Nearest Neighbor Gaussian Process covariance matrices to ensure linear time-complexity of the algorithm. We also demonstrate, using extensive numerical experiments, the improvement achieved by RF-GLS over RF in both estimation and prediction under dependence.

We establish consistency of RF-GLS under beta-mixing dependence that subsumes autoregressive time series and spatial Matern Gaussian Processes. As a byproduct, we also establish consistency of RF for beta-mixing processes, which to our knowledge, is the first consistency result for RF under dependence. We establish results of independent importance, including a general consistency result of GLS optimizers of data-driven function classes, and uniform laws of large numbers for unbounded and non-smooth function classes under beta-mixing dependence. These new tools can be potentially useful for asymptotic analysis of other GLS-style estimators in nonparametric regression with dependent data.


https://jh.zoom.us/rec/share/bS4T3hBOcZBCXXyMvLsHa1h-DBnWCUfbC4OmhA6-Nrp2wgrFd9gKcis0B0-v6fG6.B944U0STdTa8KClL

Passcode: #h.^BMd1


9/17/2020 SLAM Working Group Presentation

Chen Hu, Associate Professor, Oncology Biostatistics, School of Medicine, JHU. "Making sense with complex disease process: application to COVID-19 treatment trials"

Rajeshwari Sundaram, senior investigator, National Institute of Child Health and Human Development, NIH. "Reassessing safe progression of labor: a survival analyst's view"

Mei-Cheng Wang, Professor, Dept. of Biostatistics, JHU. "Landmark Modeling and Prediction for Hospitalized Patients with COVID-19"

Yanxun Xu, Assistant Professor, Dept. of Applied Math & Statistics, Johns Hopkins University. "Effectiveness of remdesivir in hospitalized patients with COVID-19"


6/26/2020

Title: Estimating the Optimal Individualized Treatment Rule from A Cost-Effectiveness Perspective

Speakers: Jincheng Shen, Department of Population Health Sciences, University of Utah

Abstract:

Optimal individualized treatment rules (ITRs) provide customized treatment recommendations based on patient characteristics to maximize clinical benefit in accordance with the objectives in precision medicine. As a result, there is growing interest in developing statistical tools for estimating optimal ITRs in evidence-based research. In health economic perspectives, policy makers consider the trade-off between health gains and added costs of interventions to set priorities and allocate resources. However, most work on ITRs has focused on maximizing single outcome such as overall survival on the population level regardless of any cost increments. In this work, we jointly consider the impact of treatment decisions on both cost and effectiveness and extend the concept of ITRs to a composite outcome setting, so that we identify the most cost-effective ITR that accounts for individual-level heterogeneity through direct optimization. In particular, we propose statistical learning algorithms that use a net-monetary-benefit-based reward to provide nonparametric estimations of the optimal ITR. We provide several approaches for estimating the reward underlying the ITR as a function of patient characteristics. We present the strengths and weaknesses of each approach and provide practical guidelines by comparing their performance in simulation studies. We illustrate the top-performing approach from our simulations by evaluating the projected 15-year personalized cost-effectiveness of the intensive blood pressure Intervention Trial (SPRINT) study.


Bio: Jincheng Shen, PhD, is an Assistant Professor in the Department of PHS at the University of Utah. Jincheng received his PhD in Biostatistics from the University of Michigan. Before joining the University of Utah, he worked as a postdoctoral research fellow in the Department of Biostatistics at the Harvard T.H. Chan School of Public Health. Jincheng's research focuses on causal inference and machine learning methods for clinical and genetics studies. His research primarily focuses on developing causal inference and machine learning methods for clinical and genetics studies. Particularly, he is devoted to the following two areas: i) Heterogeneous treatment effect estimation and individual treatment rules identification in complex disease setting; ii) statistical analysis of high throughput genetics and epigenetics studies, including genome-wide association studies and mediation analysis.


6/12/2020

Title: Minorization-Maximization-Based Block-Coordinate Ascent for Large-scale Survival Analysis with Time-Varying Effects

Speakers: Kevin (Zhi) He, Department of Biostatistics, University of Michigan, Ann Arbor

Abstract: National disease registries have produced a vast amount of data. Many existing statistical methods that perform well for moderate sample sizes and small-dimensional data do not scale to such large-scale data, leading to a demand for statistical techniques that enable full utilization of these rich sources of information. For example, the time-varying effects model is a flexible and powerful tool for modeling the dynamic changes of covariate effects. However, in survival analysis, its computational burden increases quickly as the number of sample sizes or predictors grows. Traditional methods that perform well for moderate sample sizes and low-dimensional data do not scale to massive data. Analysis of national kidney transplant data with a massive sample size and large number of predictors defy any existing statistical methods and software. In view of these difficulties, we propose a Minorization-Maximization-based Block-Coordinate Ascent method for estimating the time-varying effects. Leveraging the block structure formed by the basis expansions, the proposed procedure iteratively updates the optimal block-wise direction along which the approximate increase in the log-partial likelihood is maximized. The resulting estimates ensure the ascent property and serve as refinements of the previous step. The performance of the proposed method is examined by simulations and applications to the analysis of national kidney transplant data and cancer death data from the U.S. SEER cancer registry.


5/15/2020

Title: Analyzing data from an observational study of hydroxychloroquine in hospitalized patients with COVID-19

Speakers: Yifei Sun, Department of Biostatistics, Columbia University

Abstract Hydroxychloroquine has been widely administered to patients with Covid-19 without robust evidence supporting its use. We examined the association between hydroxychloroquine use and intubation or death at a large medical center in New York City. Data were obtained regarding consecutive patients hospitalized with Covid-19, excluding those who were intubated, died, or discharged within 24 hours after presentation to the emergency department (study baseline). The primary end point was a composite of intubation or death in a time-to-event analysis. We compared outcomes in patients who received hydroxychloroquine with those in patients who did not, using a multivariable Cox model with inverse probability weighting according to the propensity score. Of 1446 consecutive patients, 70 patients were intubated, died, or discharged within 24 hours after presentation and were excluded from the analysis. Of the remaining 1376 patients, during a median follow-up of 22.5 days, 811 (58.9%) received hydroxychloroquine (600 mg twice on day 1, then 400 mg daily for a median of 5 days); 45.8% of the patients were treated within 24 hours after presentation to the emergency department, and 85.9% within 48 hours. Hydroxychloroquine-treated patients were more severely ill at baseline than those who did not receive hydroxychloroquine (median ratio of partial pressure of arterial oxygen to the fraction of inspired oxygen, 223 vs. 360). Overall, 346 patients (25.1%) had a primary end-point event (180 patients were intubated, of whom 66 subsequently died, and 166 died without intubation). In the main analysis, there was no significant association between hydroxychloroquine use and intubation or death (hazard ratio, 1.04, 95% confidence interval, 0.82 to 1.32). Results were similar in multiple sensitivity analyses. In this observational study involving patients with Covid-19 who had been admitted to the hospital, hydroxychloroquine administration was not associated with either a greatly lowered or an increased risk of the composite end point of intubation or death. Randomized, controlled trials of hydroxychloroquine in patients with Covid-19 are needed.



4/10/2020

Zoom Seminar

Title: On a Statistical Transmission Model in Analysis of the Early Phase of COVID-19 Outbreak

Speakers: Yifan Zhu and Yingqing Chen, Fred Hutchinson Cancer Research Center

Abstract Since Dec. 2019 a disease caused by a novel strain of coronavirus (COVID-19) had infected many people and the cumulative confirmed cases have reached almost 180,000 as of Mar. 17, 2020. The COVID-19 outbreak was believed to have emerged from a seafood market in Wuhan, a metropolis city of more than 11 million population in Hubei province, China. We introduced a statistical disease transmission model using case symptom onset data to estimate the transmissibility of the early phase outbreak in China, and provided sensitivity analyses with various assumptions of disease natural history of the COVID-19. We fitted the transmission model to several publicly available sources of the outbreak data until Feb. 11, 2020, and estimated lock down intervention efficacy of Wuhan city. The estimated R0 was between 2.7-4.2 from plausible distribution assumptions of the incubation period and relative infectivity over the infectious period. 95% confidence interval of R0 were also reported. Potential issues such as data quality concerns and comparison of different modelling approaches were discussed.


3/13/2020 (cancelled)

Speaker: Zhang, Xiaoke, Department of Statistics, George Washington University

Title: Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data

Abstract: In functional data analysis (FDA), the covariance function is fundamental not only as a critical quantity for understanding elementary aspects of functional data but also as an indispensable ingredient for many advanced FDA methods. A new class of nonparametric covariance function estimators in terms of various spectral regularizations of an operator associated with a reproducing kernel Hilbert space is developed. Despite their nonparametric nature, the covariance estimators are automatically positive semi-definite, which is an essential property of covariance functions, via a one-step procedure. An unconventional representer theorem is established to provide a finite dimensional representation for this class of covariance estimators based on data, although the solutions are searched over infinite dimensional functional spaces. To further achieve a low-rank representation, another desirable property, e.g., for dimension reduction and easy interpretation, the trace-norm regularization is particularly studied, under which an efficient algorithm is developed based on the accelerated proximal gradient method. The outstanding practical performance of the trace-norm-regularized covariance estimator is demonstrated by a simulation study and the analysis of a traffic dataset. Under both fixed and random designs, an excellent rate of convergence is established for a broad class of operator-regularized covariance function estimators, which generalizes both the trace-norm-regularized covariance estimator and other popular alternatives. Time permitting, I will briefly talk about the extension to multidimensional functional data.


2/28/2020

Speaker: Yanxun Xu, Department of Applied Mathematics and Statistics, Johns Hopkins University

Title: Inferring Longitudinal antiretroviral drugs effects on mental health in people with HIV


Abstract: The effects of antiretroviral (ART) drugs for people living with HIV (PLWH) on mental health are inconsistent. Given the heterogeneous nature of both ART drugs and the presentation of depressive symptoms, newer approaches are necessary for guiding clinical practice. Since ART-related depression would be heterogeneous among HIV patients depending on their differences in numerous factors including demographics and clinical variables, we develop a new Bayesian semiparametric graphical model with nodes representing drugs and depression items, and weighted edges representing their relationships. The weights indicate the strength of the drug-depression relationships and can vary across different visits and different patients. The effective and reliable modeling and prediction will help elucidate the treatment-depression relationship and guide the clinicians in making more informed decision for patients.


2/14/2020

Speaker: Scott Zeger, Department of Biostatistics, Johns Hopkins UniversityTitle: Clinical risk prediction with random forests for survival, longitudinal, and multivariate (RF-SLAM) data analysis

Abstract: This talk will introduce a novel dynamic approach to clinical risk prediction for survival, longitudinal, and multivariate (SLAM) outcomes, called random forest for SLAM data analysis (RF-SLAM). RF-SLAM is a continuous-time, random forest method for survival analysis that combines the strengths of existing statistical and machine learning methods to produce individualized Bayes estimates of piecewise-constant hazard rates. We also present a method-agnostic approach for time-varying evaluation of model performance and for communicating results to laypersons. The methods are illustrated with analysis of a clinical cohort study of sudden cardiac arrest. (Joint work of Shannon Wongvibulsin, Katherine C. Wu & Scott L. Zeger)


1/31/2020

Student project presentations:

Speaker 1: Erjia Cui, PhD student, Department of Biostatistics, Johns Hopkins University

Title: Additive Functional Cox Model

Speaker 2: Pei-Lun Kuo, Ph.D., Department of Epidemiology, Johns Hopkins University

Title: Understanding the physical activity pattern from recurrent-event perspective


1/24/2020

Title: Machine Learning Methods for Survival Analysis: Are They Good Enough?

Speaker: Hieu Nguyen, Department of Biomedical Engineering, Johns Hopkins University

Abstract: Recent advances in data acquisition, storage, and artificial intelligence have enabled the use of machine learning (ML) in processing large, high-dimensional data to support decision-making in medicine and many other fields. ML methods in general, and Deep Learning (DL) in particular, have shown various successes in classification and regression tasks, and several ML and DL methods have been successfully adapted for the task of survival analysis. Despite the claims from the original authors that these ML and DL methods can address limitations posed by traditional statistical survival methods, there have been limited studies that demonstrate superior performance gains by comprehensively comparing these ML methods against the traditional methods. In this talk, I will summarize current ML methods in survival analysis to date, talk about their limitations, and present some results of benchmark studies that compare ML survival methods to the traditional Cox model.



1/17/2020

Student project presentations:

Jiyang Wen, PhD student, Department of Biostatistics, Johns Hopkins University

Lacey Etzkorn, PhD student, Department of Biostatistics, Johns Hopkins University


1/10/2020

Title: Robust Mendelian Randomization Analysis Using Mixture Models (job talk practice)

Speaker: Guanghao Qi, PhD student, Department of Biostatistics, Johns Hopkins University

Abstract:

Mendelian randomization (MR) has emerged as a major tool for the investigation of causal relationship among traits, utilizing results from large-scale genome-wide association studies. Bias due to horizontal pleiotropy, however, remains a major concern. We propose a novel approach for robust and efficient MR analysis using large number of genetic instruments, based on a novel spike-detection algorithm under a normal-mixture model for underlying effect-size distributions. Simulations show that the new method, MRMix, provides nearly unbiased or/and less biased estimates of causal effects compared to alternative methods and can achieve higher efficiency than comparably robust estimators. Application of MRMix to publicly available datasets leads to notable observations, including identification of causal effects of BMI and age-at-menarche on the risk of breast cancer; no causal effect of HDL and triglycerides on the risk of coronary artery disease; a strong detrimental effect of BMI on the risk of major depressive disorder.


11/8/2019

Title: Quantifying the time-varying prognostic performance of survival models

Speaker: Jason Liang, NIAID/NIH

Abstract:

Many prognostic models are created using survival data. In practice, the development of such models remains fairly ad hoc, and the temporal aspect of survival data is often underused. I will outline a number of existing methods for evaluating prognostic survival models. In particular, the emphasis will be on tools that can quantify how prognostic performance varies with time. I will also present a complementary new tool we have developed, the hazard discrimination summary (HDS). HDS is an interpretable, risk-based measure of how a model’s discrimination varies with time. I will also describe a connection between HDS and the Cox model partial likelihood.


10/25/2019

Title: Statistical Methods in Analyzing Single-cell Genomic Data

Speaker: Zhicheng (Jason) Ji, Department of Biostatistics, JHU

Abstract:

Single-cell sequencing is a transformative technology that measures the sequencing information from single cells. It has been widely used to investigate the functions of individuals cells and distinguish different cell types within a heterogeneous cell population. Data from single-cell sequencing is of high complexity and sparsity. Novel statistical and computational methods are needed to tackle the unique challenges in analyzing single-cell data. We have developed methods to order cells computationally based on their gene expression profiles, perform regression and differential analysis in single-cell RNA-seq data with multiple samples, and reconstruct activities in individual cis-regulatory elements from highly sparse single-cell ATAC-seq data. These methods have been applied to a wide variety of real biological and clinical studies.


10/18/2019

Title: Challenges and Developments in Analyzing Complex Longitudinal Data: Applications in NEXT Generation Health Study

Speaker: Dr. Danping Liu, NCI/NIH

Abstract:

NEXT Generation Health Study (2009-2016) is a longitudinal study of a nationally representative sample of U.S. 10th grade students. The major aim is to examine the trajectory and changes of adolescent health status and health behaviors from mid-adolescence to early adulthood. Participants received annual questionnaire assessments of diet, physical activity, substance use, sleep, peer and environmental influence, etc. A subsample of students (NEXT Plus sample) participated in a more extensive data collection including biomarkers for obesity, peer behavior, etc. Through my collaborations with NEXT investigators, a series of new statistical methods were developed to address the analytical challenges. My talk will discuss two examples.

The first project is motivated from the transition inference on adolescents’ alcohol use among NEXT participants. Generalized estimating equations (GEE) are commonly used to estimate the “population-average” transition probabilities. However, when the Markov assumption does not hold but first-order transition probabilities are still of interest, the transition inference is sensitive to the choice of working correlation. We consider a random process transition model as the true underlying data generating mechanism, which characterizes subject heterogeneity and complex dependence structure of the outcome process in a flexible way. Two types of transition probabilities at the population level are formally defined: “naive transition probabilities” that average across all the transitions and “population-average transition probabilities” that average the subject-specific transition probabilities. It is demonstrated that the unstructured working correlation provides consistent estimators of the population-average transition probabilities while the independent working correlation provides consistent estimators of the naive transition probabilities. We further study the behavior of sandwich variance estimator, as well as extensions to deal with initial state dependence.

The second project involves modelling the peer influence on adolescents’ drinking behavior using the partial peer network data. NEXT Plus participants nominated a few close friends who answered questionnaire about their own drinking behavior. A unique feature of the data is that, the number of peers each participant had was random; and conceivably, the number of nominated peers might be associated with the drinking outcome, as well as the strength of peer influence. Meanwhile, there were some participants who did not nominate any peers, which could potentially bias the sample. We develop a novel joint model to account for these unique data features. The joint model has three components, a model for the participant’s outcome, a model for the peer outcomes, and a model for the informative size of the peer network. A random effect term is shared among these model components to introduce a dependence structure. We discuss the advantage of this new method with several simple alternatives that ignore the informative network size, and then compare their performance in a series of simulation studies.


9/13/2019

`Statistical review of gene and cell therapies and related research topics'

Xue (Mary) Lin, CBER/OBE/DB, Food and Drug Administration

Abstract:

In this presentation, first we will talk about the statistical issues in the biological licensure application review of KYMRIAH®, which was the first CAR-T therapy approved by the FDA. In addition, we will discuss statistical issues related to the timing of randomization in the study design of gene and cell therapy trials, all based on INDs we have reviewed. At the end, we will touch upon some research topics we are working on.

5/10/2019

Title: Risk prediction using high-dimensional post-vaccination events in VAERS data

Speaker: Yong Chen, Department of Biostatistics, University of Pennsylvania

Abstract:

In this talk, we will demonstrate the promise of safety reports data from vaccine adverse event reporting system (VAERS) in pharmacovigilance research, and a few key challenges in converting these massive data into useful knowledge. We will present our current effort in tackling these challenges in terms of signal detection, building meaningful risk prediction models, as well as incorporating temporal information in high dimensional risk prediction modeling. Preliminary data analysis results will be presented.

5/3/2019

Title: Predicting student progress through MOOCs

Speaker: Aboozar Hadavand, Postdoc Fellow, Department of Biostatistics, Johns Hopkins University

Abstract:

The enthusiasm surrounding Massive Open Online Courses (MOOCs) has been tempered by early results on MOOC completion rates. We know that only a small share of student complete MOOCs (usually less than 10%). However, studies show that interventions and nudges might be effective in increasing completion rates among students. While we know the time stamp for completion we do not know when a student drops out from a course since, unlike traditional courses, students do not typically declare that they want to drop out from a MOOC. In this study we attempt to predict who is likely to drop out or complete a course using survival analysis. We first use hidden markov models to predict the probability and the time of dropout and use competing risk survival analysis to be able to predict dropout and completion among students.​

4/26/2019

Title: Designing Personalized Treatment Plans for Breast Cancer Patients:

A Predictive Analytics Approach

Speaker: Yixin Lu, Dept. of Information Systems & Technology Management, George Washington University

Abstract

Breast cancer is the most common cancer among women worldwide. Contemporary treatments for breast cancer are complex and multimodal, involving highly specialized medical professionals across a variety of settings. This poses many challenges to the design and delivery of treatment to individual patients. In this paper, we propose a novel, predictive analytics model to optimize treatment plans for breast cancer patients. Unlike traditional methods that prescribe homogeneous plans for all patients, we customize individual patients’ treatment plans based on accurate predictions of the amount and distribution of tumor cells. We use clinical data to estimate the key parameters in our model. By repeating optimization for a sequence of clinical targets, we demonstrate that our personalized treatment plans can significantly improve the treatment outcome. In light of the rising cost burden faced by individuals, companies and society as a whole, we further analyze the economic impact of our proposed planning method. The results suggest that personalized treatment plans offer great promise in reducing the cost of managing disease progression or recurrence as well as the cost of treatment-related toxicity. Our findings provide useful implications to both researchers and practitioners in providing best quality care in an environment with limited resources and increasing costs.

4/19/2019

Title: Estimation and Model Checking for General Semiparametric Recurrent Event Models with Informative Censoring

Speaker: Chin-Tsang Chiang, Department of Mathematics, National Taiwan University

Abstract: This research aims to explore a recurrent event process with informative censoring using more general semiparametric latent intensity regression models. When the distributions of the subject-specific latent variable and the censoring time are left unspecified, the distinct distributional features of the recurrent event times are found to be linked to the shape parameter, which, hence, merits the development of estimation and testing procedures. In light of this finding, two contrasting estimation methods are proposed for shape-dependent and -independent models. Especially, the estimation criteria are useful in building test rules to distinguish between competing rate regression models without the need to specify a significance level. Under very mild conditions, we establish large-sample properties of the estimators and test statistics. Comprehensive simulations are further conducted to assess their finite-sample performance. Moreover, our methodology is demonstrated by applying it to recurrent event samples of intravenous drug users needing inpatient care and patients with chronic granulomatous disease.

3/29/2019

Title: Modelling of retrospective data on time to event: some successes

and failures

Speaker: Professor Debasis Sengupta, Indian Statistical Institute

Abstract: Distribution of time to a landmark event is often estimated

from cross-sectional data on current status. Recall-based data on the

time of event is also used in some studies, though there are questions

about precision and accuracy of such data. Lack of precision is

manifested as interval censoring of the time to event, which is likely

to be informative. We argue that improper handling of this issue can

lead to biased estimates. We present a few models for varying degrees of

data complexity. Computational burden can be a major issue in the

nonparametric framework. We provide a theoretical result that simplifies

this problem somewhat. Only the simplest of the models has so far been

adapted for utilization of covariate information through the Cox model.

The context of this collaborative work is an anthropometric study on

age-at-menarche of growing girls.