The Schedule

"When something is important enough, you do it even if the odds are not in your favor."

~ Elon Musk, Engineer

Day 1

November 29th

18:00 - 19:30

Coded Bias

Look for the movie in the main hall of Gather "Data" Town!

Short Talks and Posters

Day 2

November 30th

16:00 - 18:00

Data Hub Club

18:00 - 19:00

Title: Drug Repurposing Pipeline Using Overparametrized Representation Learning and Causality

Speakers: Adit Radhakrishnan & Louis Cammarata

Abstract: Given the severity of the SARS-CoV-2 pandemic, a major challenge is to rapidly repurpose existing approved drugs for clinical interventions. While a number of data- driven and experimental approaches have been suggested in the context of drug repurposing, a platform that systematically integrates available transcriptomic, proteomic and structural data is missing. We here take advantage of large-scale transcriptional drug screens combined with RNA-seq data of the lung epithelium with SARS-CoV-2 infection. To identify robust druggable protein targets, we propose a principled causal framework that makes use of multiple data modalities. Our pipeline consists in (i) identifying drug signatures using overparametrized representation learning, (ii) constructing a disease interactome from protein-protein interaction data, and (iii) analyzing drug mechanism using causal structure learning. Our analysis highlights the importance of serine/threonine and tyrosine kinases as potential targets that intersect the SARS-CoV-2 and aging pathways. By integrating transcriptomic, proteomic and structural data that is available for many diseases, our drug discovery platform is broadly applicable. Rigorous in vitro experiments as well as clinical trials are needed to validate the identified candidate drugs.

Reference

Belyaeva, A., Cammarata, L., Radhakrishnan, A., Squires, C., Dai Yang, K., Shivashankar, G.V. and Uhler, C., 2021. Causal network models of SARS-CoV-2 expression and aging to identify candidates for drug repurposing. Nature communications, 12(1), pp.1-13.

Short Talks and Posters

Day 3

December 1st

12:00 - 13:00

Title: Integrating Support Vector Machine with Cure Rate Model and EM-based Inference

Speaker: Suvra Pal

Abstract: In this talk, I will present a new promotion time cure model (PCM) that uses the support vector machine (SVM) to model the incidence part. The proposed model inherits the features of the SVM and provides flexibility in capturing non-linearity in the data. Furthermore, the new model can incorporate potentially high dimensional covariates. For the estimation of model parameters, I will discuss the steps of an expectation maximization algorithm where I will make use of the sequential minimal optimization technique together with the Platt scaling method. Next, I will present the results of a detailed simulation study and show that the proposed model outperforms the existing logistic regression-based PCM model, specifically when the true classification boundary is non-linear. I will also show that the proposed model's ability to capture complex classification boundaries can improve the estimation results related to the latency part. Finally, to illustrate the proposed model, I will analyze a data from leukemia cancer study.


13:30 - 14:30

Title: Spatial and spatio-temporal models in the time of COVID-19

Speaker: Marta Blangiardo

Abstract: In this talk I will present some recent work I have done on spatial and spatio- temporal modelling of mortality data during the COVID-19 pandemic. I will first show the first comprehensive analysis of the spatio-temporal differences in excess mortality during 2020 across five European countries (Greece, Italy, England, Spain, Switzerland), using a population-based design on all-cause mortality data. Sex-specific weekly mortality rates for each area (NUTS3 regions) were estimated, based on a comparison period (2015-2019), while adjusting for age, localised temporal trends and the effect of temperature. Then, all-cause weekly deaths and mortality rates at the same spatial resolution were predicted for 2020, based on the modelled spatio-temporal trends, so that excess deaths could be estimated. Secondly, I will talk about a two-stage spatial model to quantify inequalities in excess mortality in people aged 40 years and older at the community level during the first wave of the pandemic in England. Finally, I will present a cross-sectional study in England to investigate the effect of long-term exposure to air pollution on COVID-19 deaths, adjusting for potential confounders related to meteorology, socio-demographics, disease spread, and healthcare provision and accounting for spatial autocorrelation.


18:00 - 19:00

Title: Statistical Inference on Network Data

Speaker: Yuguo Chen

Abstract: Network structures arise in modeling a wide variety of systems in sciences and engineering. Statistical network analysis aims to develop methods that account for the complex dependencies in network data. Over the last few decades, this area has rapidly accumulated methods, including techniques for network modelling and computation. We review some of these developments and their applications to real-world problems, and briefly discuss challenges in this area.


Short Talks and Posters

Day 4

December 2nd

12:00 - 13:00

Title: Cure models: re-parametrization of a flexible family

Speaker: Fotios Milienos

Abstract: In a great number of real-life applications, there exist a non-negligible proportion of individuals/items who are not subject to the event (or recurrence of the event) of interest. Although researchers may provide, for example, biological, medical, or sociological evidence for the presence of such items, statistical models performing well under the existence or not of a cured proportion, frequently offer a necessary flexibility. These types of populations are typically studied by the theory of cure models (long term survival models) which, nowadays, play a substantial role in survival analysis (see, for example, the monographs by Maller and Zhou, 1996, Ibrahim et al., 2005, Peng and Yu, 2021). Cure models offer efficient tools for modeling and estimating both the proportion of such individuals/items, usually termed as long-term survivors or immune or cured, and the survival times of the non-cured (susceptibles) group. The most studied cure models can be defined through a competing cause scenario, where the random variables corresponding to the time-to-event due to each competing cause are independent and identically distributed, while the total number of competing causes is an unobservable discrete random variable. The literature consists of parametric and non/semi-parametric approaches, while the existence of right censored data, in the great majority of applications, makes the EM algorithm a popular option of estimating model parameters.

In this talk, after an introduction to the theory of cure models, we discuss a new re- parametrization of a flexible family of cure models, which not only includes among its special cases, the most studied cure models (such as, the mixture, bounded cumulative hazard and negative binomial cure model) but also classical survival models (without cured items). One of the main properties of the proposed family, apart from its computationally tractable closed form, is that the case of zero cured proportion is not found at the boundary of the parameter space, as it typically happens to other families. A simulation study will exhibit the (finite) performance of the suggested methodology, focusing to the estimation through EM algorithm; for illustrative purposes, analysis of two real life data sets (on recidivism and cutaneous melanoma) is also carried out


References

R.A. Maller and X. Zhou (1996). Survival Analysis with Long-term Survivors. John Wiley & Sons, NY.

J.G. Ibrahim, M.H. Chen, and D. Sinha (2005). Bayesian Survival Analysis. John Wiley & Sons, NY.

Peng, Y. and Yu, B. (2021). Cure Models: Methods, Applications, and Implementation. CRC Press.


Short Talks and Posters

Day 5

December 3rd

10:30 - 11:45

Poster Evaluation Session

12:00 - 13:30

Panel Discussion

Margaret Betz - Purdue University

Carl Drummond - Purdue University Fort Wayne

Melissa Gruys - Purdue University Fort Wayne

Robert Neher - Zimmer-Biomet

Mohamed Ould-Khouya - Central Insurance

Tanya Soule - Purdue University Fort Wayne

Mark Ward - Purdue University

13:30 - 14:00

Poster and Short Talk Awards