Sarah Pungitore

About Me

I am a fifth year Ph.D. student in the Applied Mathematics program at the University of Arizona and a member of the Computational Medicine and INformatics (COM-IN) Collaboratory.

I am interested in machine learning approaches to clinical research informatics problems. Applying machine learning to healthcare and clinical problems brings a range of sociotechnical challenges, particularly when it comes to translational and implementation science. I enjoy bridging the gap between disparate stakeholders in medical research and applying my technical background to further discussions and produce tangible, equitable improvements in patient outcomes. After my doctoral training, I am interested in becoming a technical leader in the healthcare industry to develop, validate and implement digital health tools and facilitate collaborations between researchers, clinicians, patient advocates, and developers.

In my free time, I enjoy yoga, hiking, climbing, and hanging out with reptiles.

Research Interests


Clinical Decision Support

Reproducibility and Interoperability


Acute Respiratory Failure


 Current Research Projects

Overview of methods for T-BIRM in prediction of outcomes involving patients with acute respiratory failure (ARF).

Data Shifts in Electronic Health Record Data


Temporal electronic health record (EHR) data are often preferred for clinical prediction tasks because they offer more complete representations of a patient’s pathophysiology than static data. For clinical prediction in acute care, temporality is critical because aggregation of measurements reduces the ability to detect shifts in patient status over smaller timescales. There have been numerous studies performed on applying machine learning models to clinical prediction problems. Many of these models, especially those leveraging temporal data, can successfully predict clinical outcomes and provide useful decision support. 

However, few models have demonstrated the ability to perform well across different clinical environments, which limits their clinical utility because they cannot be implemented at multiple hospitals without significant modification. My current research on data shifts, or changes in model performance under different conditions, for models using temporal EHR data would allow researchers to both develop site-agnostic models while leveraging previously successful techniques. These results would have a direct impact on facilitating improved patient care in a variety of clinical disciplines, especially those with high potential for automated clinical decision support systems.


I am currently developing Temporal Bayesian Invariant Risk Minimization (T-BIRM). The goal of T-BIRM is to extract invariant features across patient samples from different hospital locations to improve model predictions when data shifts are present.

Computable Phenotypes for Post-Acute Sequelae of SARS-CoV-2 (PASC)


Post-acute sequelae of SARS-CoV-2 (PASC), or Long COVID, is an increasingly recognized yet incompletely understood public health concern. Computable phenotypes (reproducible and interpretable disease concepts) are often used to characterize heterogeneous diseases. Several studies have examined PASC phenotypes to better classify, understand, and treat different disease subgroups. However, many gaps in PASC phenotyping research exist, including a lack of the following: 1) standardized definitions for PASC based on symptomatology; 2) generalizable and reproducible phenotyping heuristics and meta-heuristics; and 3) phenotypes based on both COVID-19 severity and symptom duration.


Working with other COM-IN members, I defined computable phenotypes and meta-heuristics (rules and definitions that structure the development of the computable phenotypes) for PASC based on COVID-19 severity and symptom duration. As part of the phenotyping process, we developed a symptom profile for PASC based on the Observational Health Data Sciences and Informatics (OHDSI) Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) data standard to promote further standardization of PASC phenotyping methods. 


We identified four phenotypes based on COVID-19 severity (mild vs. moderate/severe) and duration of PASC symptoms (subacute vs. chronic). We also characterized individuals from these phenotypes with respect to age, gender, and number and type of PASC symptoms.

Frequency of symptoms from each symptom group in the four phenotypes developed for PASC.

Selected Publications

Journal Publications

Pungitore S, Subbian V. Assessment of Prediction Tasks and Time Window Selection in Temporal Modeling of Electronic Health Record Data: A Systematic Review. Journal of Healthcare Informatics Research. 2023; doi:10.1007/s41666-023-00143-4.

Link (Free View)


Fisher JM, Subbian V, Essay P, Pungitore S, Bedrick EJ, Mosier JM. Outcomes in Patients with Acute Hypoxemic Respiratory Failure Secondary to COVID-19 Treated with Noninvasive Respiratory Support versus Invasive Mechanical Ventilation. medRxiv (Preprint). 2022.



Douglas MJ, Bell BW, Kinney A, Pungitore S, Toner BP. Early COVID-19 respiratory risk stratification using machine learning. Trauma Surgery and Acute Care Open. 2022; 7(1).


Conference Publications

Pungitore S, Olorunnisola T, Mosier J, Subbian V. Computable Phenotypes for Post-acute sequelae of SARS-CoV-2: A National COVID Cohort Collaborative Analysis. AMIA 2023 Annual Symposium. 2023. New Orleans, LA.



Douglas MJ, Bell BW, Pungitore S, Toner BP. Early COVID-19 Respiratory Risk Stratification Using Machine Learning. Machine Learning for Healthcare 2021, Virtual Conference. 2021.

Link (Conference Video)



Last updated April 11, 2023