Collaborative Research Projects

Diabetes and COVID-19

Message: The effect of age on 30-day mortality after admission to critical care with COVID-19 is not mediated by changes type 2 diabetes.

Investigators: John Dennis (University of Exeter), Bilal Mateen (University College London) and Sebastian Vollmer (Warwick University)

Observational dataset source: COVID-19 Hospitalisation in England Surveillance System (CHESS)

Description: This study seeks to quantify how much the effect of age on COVID-19 mortality in patients admitted to critical care is due to changes in the prevalence of type 2 diabetes with increasing age. In order to evaluate this effect, we used a causal mediation approach to analyse the outcomes of over 13, 000 patients admitted to critical care with COVID-19.

(November - January 2020)

Roche-Turing Pilot Study

Investigators: Karla Diaz-Ordaz (Lead Analyst), London School of Hygiene and Tropical Medicine, Franz Kiraly, University College London and Chris Holmes, University of Oxford

Observational dataset source: Flatiron Health and Foundation Medicine, Inc.

Description: This scientific project aimed to predict individual survival by building models with good predictive performance (for overall survival after 1st line therapy), to quantify the “added value” of the genomics data, and to chart the methodological space for machine-learning predictive models and causal inference for clinical outcomes in observational settings. In this study I implemented two novel methods for subgroup discrovery and characterising subpopulations, namely Sequential BATTing and PRIM; these methods were originally developed for randomised controlled trials. As a member of this collaborative study, I worked with a team of researchers affiliated with the Alan Turing Institute (across universities within the United Kingdom) and clinicians and statisticians at Roche.

www.turing.ac.uk/turing-and-roche-towards-tailor-made-lung-cancer-treatment

Relevant reference for sequential BATTing: onlinelibrary.wiley.com/doi/abs/10.1002/sim.7236

(August 2019 - November 2019)

Precision medicine for Type 2 diabetes

Collaborators: John Dennis, University of Exeter, Bilal Mateen, Clinical Data Science Fellow and Sebastian Vollmer, Warwick University

RCT Dataset source: Janssen Pharmaceuticals and Boehringer Ingelheim (BI)

Description: The patient population diagnosed with Type 2 diabetes is heterogeneous and it is of interest to be able to understand how subpopulations may differ in their response to allocation second-line therapies. The literature currently available is limited to metformin (a first-line therapy) and the ability to better tailor treatments would improve the overall outcomes. I led a comparative effectiveness study to compare two different second-line therapies in terms of their overall effect in reducing the HbA1c levels and characterising patient subpopulations. This study compares a novel machine learning method, Causal Forest against a classical method, the penalised regression model for the purpose of evaluating effect heterogeneity, causal effect estimation and conducting variable selection.

www.turing.ac.uk/research/research-projects/data-driven-evaluation-treatments-type-2-diabetes

Further details available on request

(July 2019 - April 2020)

Effect of 40-cm Segment Umbilical Cord Milking on Hemoglobin and Serum Ferritin at 6 Months of Age in Full-Term Infants of Anemic and Non-Anemic Mothers.

Lead Biostatistician: Julian Wolfson, University of Minnesota, Twin Cities

Description: This study was conducted by pediatric researchers to assess the effect of early clamping and milking on hemoglobin and serum ferritin concentrations at six months of age and to further evaluate the differences in effect in infants of anemic and non-anemic mothers. A subgroup analysis was performed (by myself and Julian) to determine that the cord hemaglobin was similar but cord ferritin was lower in infants of anemic mothers. Additionally, we found that the effectiveness of the long umbilical cord milking did not vary with maternal anemia status. The study concluded that the intervention may be an effective method for improving hemoglobin and iron stores at 6 months of age in infants.

(April 2014)

Publication: www.nature.com/articles/jp201592

BLS: Box Lunch Study, A Randomised controlled trial

Lead Biostatistician: Julian Wolfson, University of Minnesota, Twin Cities

Description: This study examined the effect of weekday exposure (over six months) to different portion size of a boxed lunch on energy intake and body weight in a free-living sample of working adults. The different portion sizes evaluated were 400 kcal, 800 kcal and 1600 kcal against no intervention and the interventions were determined by nutritional epidemiologists. The study concluded that weekly exposure for six months to a 1600 kcal portion size at lunch causes an increase in the total energy intake and weight gain. I conducted basic statistical analysis (including planned intent-to-treat analysis and linear regression models) and applied recursive partitioning methods as part of an exploratory analysis. These results were utilised in the eventual publication; the link to this paper is available below.

(August 2012 - Feburary 2013)

Publication: onlinelibrary.wiley.com/doi/full/10.1002/oby.20720

SmarTrAC: A smartphone solution for travel and activity capturing

Biostatistician: Julian Wolfson; Urban Planner: Yingling Fan, University of Minnesota, Twin Cities

Description: As a biostatistician research assistant in this smartphone application development project, I utilised machine learning methods to characterise human activity and travel behaviour patterns over smartphone sensor data (as generated by both GPS and accelerometers).

(August 2013 - August 2014)

Now patented as Daynamica (daynamica.com)

Statistical Consultancy: Chronic Health Statistics in the Great Lakes region

Data Source: Minnesota Department of Health

Lead Consultant: Kyle Rudser (Biostatistics Data Analysis Centre), University of Minnesota, Twin Cities

Description: This study examined the chronic health statistics in the Great Lake region (which has a number of coal power plants) and evaluated whether they are statistically significantly different from the national level statistics. Specifically, the chronic health statistics included the prevalence estimates of adult and pediatric asthma, cardiovascular disease, COPD, diabetes, and poverty in 27 counties and 3 metropolitan statistical areas. The study and conclusions was of interest to a fellow student in the Division of Environmental Health and I was a biostatistics consultant on this project. In order to assess whether the differences between the sample and the national statistics were statistically significant, an exact binomial test was conducted separately for each county by the disease of interest. Eighteen counties were found to be statistically significant, after adjusting for multiple comparisons (all countines in Indiana, New York and two counties in Michigan don't have statistically significant results).

(April 2014)