Princeton University

Covid-19 Patient Outcome Machine Learning Predictions

Modeling COVID-19 progression for medical triage, clinical trial recruitment, and resource decision making

PIs: Coleen Murphy, Zemer Gitai, Barbara Engelhardt

Problem: A large number of patients are being hospitalized for SARS-Cov-2 infection; some of these patients (Class 1) will recover with minimal interventions, while others (Class 2) will deteriorate; higher proportions of these Class 2 patients will die. Ideally, clinicians would be able to triage patients on intake to the emergency room (ER), classifying with high accuracy each patient into one of these two groups. Then, valuable hospital resources could be allocated appropriately to those in Class 2, saving more lives. For example, we would be able to treat Class 2 COVID-19 patients with drugs that have already been approved by the FDA, or enter them into clinical studies in a targeted way, so they can be fast-tracked for COVID-19 treatment. ICU beds, ventilators, and other resources could be allocated similarly.

Conversely, for COVID-19 clinical drug trials to be most useful and efficient, we need to involve patients who are most likely to have worse outcomes after hospitalization (Class 2), but at a time before their condition deteriorates. Because the proposed drugs to be tested are very expensive, drug trials on all incoming patients is unfeasible and a waste of resources. Moreover, treating patients already experiencing severe symptoms will likely be too late to be effective treatment.

Project goals: We propose to use large-scale hospital inpatient data in order to 1. model the disease trajectory of COVID-19, 2. distinguish the two classes of patients prior to the development of severe symptoms, and 3. used in triage and resource allocation in hospitals. Patients predicted to be in Class 2 could then be enrolled in clinical trials of FDA-approved drugs. Working closely with hospitals and more rural clinics, we propose to make these triage and decision-making tools available broadly to coordinate among clinical trials, hospitals, and states with different resources, needs, and timelines.

Methods: Given the relatively unknown progression of COVID-19, the disease caused by the novel coronavirus, we would like to develop treatment guidelines and triaging for patients in the Emergency Room (ER). When a patient is seen initially in the ER, measurements of height, weight, age, gender, ethnicity and race, heart rate, O2, blood pressure, and temperature are taken, and symptoms recorded. Relevant co-morbidities are also recorded, such as cardiovascular disease or asthma. Because of the sparse resources in the ER and ICU, and also the challenge of using appropriate drug interventions, we hope to model the data using time series analyses and to apply two separate types of sequential decision making models on the most current anonymized ER COVID+ patient data.

Data: Many existing methods exist for ED and inpatient data in the same form as the existing MIMIC-III data set. We hope to encourage hospitals to use this basic data format, which removes dates and other possibly identifying information from patients, but preserves demographic, co-morbidities, family history, and current medication information and relative timestamps of admission, transfer, discharge or death, ventilation, drug administration, lab tests, and vital sign measurements.

Impact: The impact of these methods would be 1. develop and deploy an app to recruit patients for time-critical clinical trials across the country; 2. deploy an app that allows ER clinicians and inpatient coordinators to enter the data for a patient in order to determine key time points in the progression of the disease, and where specific interventions might lead to better outcomes in COVID+ patients; 3. develop a model for the evaluation of uncertainty around the effect estimates of specific drugs for diverse patient types in COVID+ patients.

Related work: The Engelhardt Group has existing work on MIMIC-III and Hospitals at UPenn (HUP) data, including modeling and smoothing time series data, predicting the effects of specific interventions on hospital patients, and developing safe policies for patient care for removing mechanical ventilators and taking lab tests. We also have previous work in multi-armed bandits in highly constrained settings, assuming correlations among the treatments (here, the possible drugs). The Murphy group has used machine learning methods to predict tissue-specific gene expression and aging outcomes. The Gitai lab has used machine learning methods to predict bacterial growth patterns and characterize antibiotics, and also just likes being involved.

Contact: email Prof Barbara Engelhardt (bee@princeton.edu), Prof Zemer Gitai (zgitai@princeton.edu), or Prof Coleen Murphy (ctmurphy@princeton.edu) for more details and to be involved.

For Hospital Administrators and Data Scientists: We have put together a document with information on the how-tos of sharing data. We have talked to a number of hospitals with in-house efforts in developing tools such as the ones we propose, but without the in-house expertise to do this. We believe by sharing COVID+ patient data, hospitals could accelerate this process substantially and have these tools developed for them for free. Hospitals can focus on deployment rather than tool development.

Example ventilated MIMIC III ICU patient. Vitals are measured at a range of sampling intervals. Ventilation times are marked, and multiple administered sedatives (both as continuous IV drips and discrete boli) are shown. (from Prasad et al. 2017)

Existing Resources for COVID-19

  • ISARIC: based in Oxford UK, collates clinical data for COVID-19 patients – collected on a standardized case report form on their website – from around the world.

  • SCCM Virus Registry: a real time COVID-19 registry of current ICU/hospital care patterns to allow evaluations of safety and observational effectiveness of COVID-19 practices and to determine the variations in practice across hospitals.

  • PETAL: NHLBI-funded group that collects similar data as SCCM Virus Registry at 15 sites in the US.

  • 23andMe COVID study: COVID resources for researchers by application. Includes access to medical information and genotype.

  • South Korea Medical Records: South Korea released medical history for COVID+ patients based on insurance claims for the past five years.

  • ASU Data Hub: Collection of all publicly available data for COVID-19

  • CoronaWhy: Kaggle group of volunteers to manage data repositories and commit time to projects for COVID response



Existing Resources for anonymized hospital inpatient data

  • MIMIC III: Anonymized hospital inpatient data for approximately 70,000 patients from Beth Israel ICU between 2001-2012.


Existing information on methods and tools for COVID-19

  • COVID Zoomposium: Recorded lecture from Prof. Irizarry at DFCI and Harvard about COVID. How do we model disease dynamics? How do we test and why is it important? How do we evaluate vaccines and therapies? How do we obtain and report data?

  • CHIME: UPenn Hospital system tool for resource allocation during COVID-19

  • Stanford's Deterioration Index: