Wolfson Statistical Meeting

This talk normally takes place once a month. 

This is now a virtual meeting, you can contact the organisers to join the mailing list for future talks. For anyone wishing to attend in person, please express your interest during the meeting and we will take this to account for furture events. 

The current seminar organisers are Adam Brentnall, Benard North, Emily Lane,Thomas Hamborg, Jo Haviland, Joy Li and Shama Sheikh.

2024

Sample size re-estimation in a precision medicine trial

Recordings available via this link. QM login required for access.


Review of statistical models used in multi-cancer early detection studies 


Surrogates for mortality in cancer screening trials


Diabetes and cancer: a causal link? Opportunities to explore early diagnosis?


Cardiovascular diseases are the primary prevention targets for people with diabetes, but many studies also reported an association between diabetes and cancer. There are debates on whether there is a causal link between them. Possible explanations include:

1) improvements in life expectancy due to successful cardiovascular prevention; 

2) increasing prevalence and earlier onset of diabetes coinciding with that of overweight and obesity; and

3) more intensive use of health services in this population, leading to early detection of cancer (surveillance effect).


In this talk,  Dr. Suping Ling will summarise her work on describing mortality trends and exploring causal links between diabetes and cancer, and talk about some ongoing research on estimating the effects of primary care system factors on early/delayed diagnosis of cancer.

Suping is an Assistant Professor in Epidemiology within the Inequalities in Cancer Outcomes Network (ICON) group, part of the Department of Non-Communicable Disease Epidemiology.


Comparison of multimorbidity 5 years before to 5 years after a cancer diagnosis: focusing on breast, prostate, colorectal, bladder and lung cancer

(MS Team virtual meeting only for this semimar)


     Balance and variance inflation checks for completeness-propensity weights.


Statistical and AI methods for the analysis of multiple longitudinal ovarian cancer biomarkers

Video recording here. QM login details required for access.

Slides contain unpublished results, please consult the author before sharing.


Methods for efficient and accurate linkage of multiple routinely collected datasets

Record linkage can combine information from records in separate routinely collected datasets to provide a more detailed picture of characteristics of patients, their disease, the care they receive, and their outcomes. However, record linkage typically involves sending personal information to a separate organisation to carry out the linkage which can be time-consuming and costly, and increases the risk of disclosure of sensitive information. Furthermore, when linking multiple datasets, many linkages may be required. 

In this talk, Dr. Helen Blake will summarise two strategies to deal with these issues: linkage of datasets without patient identifiers; and linkage of multiple datasets through one "spine dataset". These approaches have the potential to accelerate the use of linked datasets to address important clinical and public health questions, while minimising costs and delays, and protecting data security.  

(MS Team virtual meeting only for this semimar)

Link to the recording: https://qmulprod-my.sharepoint.com/:v:/g/personal/hfx559_qmul_ac_uk/EYPDOsfqgehOiYjN4KRXVUQBuahKJq2zj2p8GpQI8G1kSg?referrer=Teams.TEAMS-ELECTRON&referrerScenario=MeetingChicletGetLink.view.view 

(Access using QM login)

Covid impact on data collection (TRACC study: Tracking mutations in cell free tumour DNA to predict Relapse in eArly Colorectal Cancer).

(MS Team virtual meeting only for this semimar)


Lessons from working in prediction research

(MS Team virtual meeting only for this semimar)


A Bayesian Power Prior Approach for Incorporating Pilot Data into Cluster Randomised Controlled Trial Analysis & Design: A simulation study

In Cluster Randomised Controlled Trials (CRCTs), randomisation occurs at a group level, which has methodological implications that make design, conduct and analysis more complicated. Typically, CRCTs are analysed using mixed-effects regression. The Power Prior is a Bayesian analysis technique in which historical data is parameterised as an informative prior distribution, then discounted according to the similarity between the historical and current datasets. A Normalised Power Prior (NPP) approach has been proposed which accounts for the clustered structure of CRCT data, enabling incorporation of evidence from historical data (e.g. pilot study) into the definitive trial analysis whilst allowing for potential differences between the two datasets, where greater differences result in less information borrowing. A simulation study is presented which aimed to: (i) verify that the NPP appropriately discounts historical data according to the similarity between datasets; (ii) assess the performance of the NPP approach compared to the frequentist mixed-effects model and simple dataset pooling; and (iii) assess whether the NPP approach can facilitate more efficient CRCT design by justifying reduced sample sizes.


Persistent Homology for Medical Image Analysis

An Important tool in the field topological data analysis is persistent Homology (PH) which is used to encode abstract representations of the homology of data at different resolutions in the form of persistence barcode (PB). Normally, one will obtain one PB from a digital image when using a sublevel-set filtration method. In this talk, I will present a novel approach to build more than one PB representation of a single image based on a landmark selection method, known as local binary patterns (LBP), that encode different types of local texture and structure from a digital image. Using LBP, we can construct up to 56 PBs from a single image if we restrict to only using the 8-bit binary codes which have 2 circular transitions between 1 and 0. The information within these 56 PBs contain detailed local and global topological and geometrical information which can be used to design effective machine learning models in a distributed manner. Experimental results, on Breast mammogram scans, give new insights on using different PB vectorizations with sublevel set filtrations and landmark-based Vietoris-Rips filtration.


Developing long term disease models using individual patient data from trials and cohorts: a cardiovascular disease model using 15 large clinical trials and the UK Biobank cohort 


Decision-analytic disease models are often used to assess long-term effects of interventions. Increasingly, such models are developed using individual patient data to support more nuanced assessment of effects in distinct categories of patients. It is also important to ensure models capture well disease risks and survival in target population/s. We illustrate an approach to developing and calibrating a micro-simulation cardiovascular disease model using 15 clinical trials’ data and the UK Biobank cohort.  

 

A micro-simulation model was developed using the individual participant data from the Cholesterol Treatment Trialists’ collaboration (CTT: 118,000 participants) and was calibrated and further developed in the UK Biobank cohort (UKB: 502,000 participants). Proportional hazards survival models estimated risks of key endpoints (myocardial infarction, stroke, coronary revascularisation, incident cancer and vascular and nonvascular death) using CTT participants’ sociodemographic and clinical characteristics at entry and incidents of the key endpoints during follow-up. Model calibration using UKB data was based on proportional hazards assumptions and involved re-fitting the intercept and linear predictors from these equations; excluding/de-novo estimating some risk factors; and adding factors and endpoint/s (incident diabetes) to extend the model functionality. Standard approaches to risk equation modelling and model validation were employed.  


We demonstrate the feasibility of calibrating a detailed cardiovascular disease multi-state decision analytic model under the framework of proportional hazards modelling. However, it is a data intensive process likely to lead to modified strength of relationships between disease endpoints. The process required all equations to be calibrated with few factors de-novo estimated (e.g. smoking) but enabled new factors (e.g. cancer duration) and endpoint (incident diabetes) to be included.  

 

A new calibrated lifetime CVD model accurately predicts morbidity and mortality in contemporary UK populations. It will be made available to provide individualised projections of expected lifetime health outcomes and benefits of treatment.  


Maximising Precision Prevention through Population Testing


Combining Non-Adherence and Mediation in a Unified Causal Analysis: A Methodological Review and Application to the AVATAR trial

Many clinical trials suffer from participant non-adherence. A standard intention-to-treat (ITT) analysis estimates the causal effect of treatment offer and any intercurrent events that occur post-randomisation, such as non-adherence, are ignored. Alternatively, estimating the effect of treatment receipt is complicated by selection bias in participant non-adherence. Though more complex methods of analysis that account for this selection bias are required to infer causality, statistical methods for estimating the total causal effect of treatment receipt are well-researched. Of these, the complier average causal effect (CACE) provides an estimate of the average effect of receiving treatment in the subgroup of participants who comply with their randomisation.

Clinical trials in mental health often evaluate complex, non-pharmacological interventions, such as psychotherapy. Evaluating how a complex intervention has led to changes in the outcome (the mechanism) is key for the development of more effective interventions. A mediation analysis aims to decompose a total treatment effect into a mediated effect, one that operates via changing the mediator, and a direct effect. However, current methods for mediation analysis in trials usually decompose the ITT effect, and the corresponding direct and mediated effects ignore the impact of participant non-adherence.

Both mediation analysis and non-adherence are independent areas of active research, but it is unclear how to identify and estimate mediation effects whilst appropriately accounting for participant non-adherence. This talk will summarise the literature on methods that combine mediation and non-adherence, and show that CACE can be decomposed into a complier average natural direct effect (CANDE) and a complier average causal mediated effect (CACME), and these can be estimated using linear structural equation models under a given set of assumptions.


Calibrating complex computer models through history matching with emulation

Computer models are used in a variety of fields in science and technology to study real world systems. As the complexity of such models increases, however, it becomes more and more challenging to robustly fit them to empirical data. This limits their utility in scenario analyses, and may lead to overconfident or misleading predictions.  

In this seminar I will discuss history matching with emulation, a calibration method that has been successfully applied to complex models, but is currently not widely used, due to the lack of available software to implement it. To address this compelling issue, a group of researchers I am part of is developing a user-friendly R package hmer to simply and efficiently perform history matching with emulation..


 Defining estimands to answer clinically relevant questions in trials around surgery

As part of my NIHR pre-doctoral fellowship I would like to present the work I have been currently undertaking which focuses on the statistical principles outlined in the ICH-E9(R1) addendum on estimands in clinical trials with an emphasis on trials around surgery. Following the outline of the main issues raised I will illustrate the use of this framework using a case study of the PRISM trial I am currently working on to answer two trial objectives from a healthcare perspective and a patient benefit perspective. I will also speak about the work I have doing with other institutes around this area. We will then look at the challenges posed to identify gaps and shortcomings that need to be addressed and will form the basis of my NIHR Doctoral Fellowship application.


Handling unplanned disruptions in randomised trials using missing data methods: a four-step strategy

The coronavirus pandemic (Covid-19) presents a variety of challenges for ongoing clinical trials, including unplanned treatment disruptions, participant infections and an inevitably higher rate of missing outcome data, with non-standard reasons for missingness. This presentation explores a four-step strategy for handling such unplanned disruptions in the analysis of randomised trials using missing data methods. Following an outline of the main issues raised by a pandemic we describe each point of the guidance in turn, which we illustrate using an ophthalmic trial ongoing during Covid-19. Scenarios where treatment effects for a ‘pandemic free world’ and ‘world including a pandemic’ are of interest are considered. We highlight controlled multiple imputation as an accessible tool for conducting sensitivity analyses. The framework is consistent with the statistical principles outlined in the ICH-E9(R1) addendum on estimands and sensitivity analysis in clinical trials.

(this meeting will take place on Teams)



Biomarkers in Parkinson’s disease: project introduction to use of remote blood collection for the earlier detection of Parkinson’s disease


Biomarker-driven umbrella adaptive design in rheumatoid arthritis clinical trials

Rheumatoid arthritis (RA) is one of the most prevalent and rapidly growing chronic autoimmune diseases in the UK and all over the world. The decision on treatment is currently being based on trial and error, as recommendations provide little help to determine the best strategy for each patient. The precision medicine approach is still in its infancy in RA studies, mostly due to the current lack of knowledge regarding the predictive biomarkers to help stratify patients to the correct treatment. Thus, the potential benefits in efficiency of the application of master protocols, such as the umbrella trial design, has not been explored in RA.

Using data available from the Centre for Experimental Medicine & Rheumatology (EMR) department clinical trials, the first aim of this project is to identify (and validate) a treatment response predictive model for selecting patients for treatment. Statistical modelling and machine learning methods will be applied to model clinical response. Then, several umbrella design scenarios will be simulated. These simulations will use the existing trial data to investigate what might have happened if the patients would have enrolled in a trial with a different design.


   14:30:  Daniel Vulkan (Centre for Cancer Prevention, Wolfson Institute of Preventive Medicine, Queen Mary University of London) 

Liverpool Lung Project lung cancer risk stratification model: calibration and prospective validation


Natural Language Processing of real world clinical data for predicting diabetic foot complications

Vascular complications of diabetes are increasingly prevalent and, although symptoms can be surgically treated, are associated with increased mortality. Some problems, such as nerve damage, artery blockages and kidney disease, are known to be associated with foot disease. We hope to get an insight on unobserved disease and lifestyle features to identify who is at higher risk of developing an active diabetic foot problem.

Structured data from patient electronic health records is used directly for exploratory data analytics and for a basic understanding of patient journeys. However, for deeper analysis of the clinical journeys of patients and a broader perspective of a patient’s health status, higher quality data is required. We use Natural Language Processing to convert clinical notes into structured SNOMED CT coded data, which can be queried and modelled at a population level. This will allow deeper analysis of the clinical journeys of patients for development of new predictive tools and better prevention of diabetic foot disease.

 2020

Anne will present some early findings and analysis plans on the evaluation of using FIT in the NHS-England clinical service that she developed with Dr. Kevin Monahan of St. Mark’s Hospital (London North West Healthcare University NHS Trust). More specifically, the utilisation of FIT for the risk-stratification of colonoscopy surveillance in patients with Lynch Syndrome.

    14:30 - 15:00: Joy Li (Centre for Cancer Prevention, Wolfson Institute of Preventive Medicine, Queen Mary University of London)

Joy will present some results from the 2014 FIT pilot study in correspondence to the current COVID-19 situation. Particularly, the impact of changes to the inter-screening interval and FIT threshold in the national bowel cancer screening programme in England.