Dr. Matthias Görges – Associate Professor, Department of Anesthesiology, Pharmacology & Therapeutics, UBC; Co-lead,Digital Health Innovation Lab, BC Children's Hospital Research Institute
Role: Principal Investigator.
Dr. Srinivas Murthy – Clinical Associate Professor, Department of Pediatrics, UBC; Investigator, BC Children's Hospital Research Institute
Role: Clinical Consultant.
Rhea Kaul – Research Assistant, BC Children's Hospital Research Institute; Master’s Student, School of Biomedical Engineering, UBC
Role: Student.
Rainie Fu – Research Assistant, School of Population and Public Health, UBC; Undergraduate Student, Department of Computer Science and Statistics, UBC
Role: Student.
Our team unites pediatric clinical expertise and engineering innovation to tackle pressing real-world challenges. Led by seasoned clinicians and digital health researchers, we harness data-driven modeling—from bedside vital sign monitoring to predictive sepsis detection—to advance pediatric critical care. We thrive on collaboration across disciplines and are motivated by translating insights into tools that improve care delivery for children.
We trained an ensemble of six calibrated XGBoost classifiers on complementary feature sets (mutual-information selections, XGBoost-importance selections, clinically curated scores, and two script-derived optimal sets). Out-of-fold(OOF) probabilities from each model were linearly combined via a simplex-constrained weight search. Calibration was two-stage: (i) per-fold sigmoid calibration for each base learner and (ii) a global Platt calibrator fitted on the weighted OOF scores and applied at inference. A data-driven threshold was selected on OOF predictions by maximizing a composite metric under a sensitivity constraint.
Missing data employed a clinically guided three-stage approach: (1) Age-stratified median imputation for vital signs(heart rate, blood pressure, SpO₂, lactate) across developmental stages (neonate to adolescent), (2) Global median imputation for residual numerical variables, and (3) Binary indicators for missing anthropometric/lab values. Patients without outcome labels were excluded. Feature engineering incorporated seven validated pediatric severity scores with age-adapted thresholds (SOFA, Phoenix, PELOD-2, PEWS), WHO nutritional z-scores, and novel hemodynamic indices (Temperature-Adjusted Mean Shock Index). Tree-based architectures inherently managed outliers without explicitcapping. Preprocessing used modular pipelines: StandardScaler for continuous variables, one-hot encoding for categoricals (handling unseen categories), and cross-validation to ensure consistent transformations.
No synthetic augmentation was applied; class imbalance was managed via XGBoost’s feature-set-specific positive-class reweighting (scale_pos_weight).
XGBoost is well-suited to heterogeneous, sparse, and imperfect tabular EHR data, handles non-linearities/interactions, and pairs naturally with class-imbalance reweighting. We trained six feature-set–specific XGBoost classifiers independently: mutual information-based selections, XGBoost-derived importance rankings, clinical scores, and hybrid optimal sets, which captures complementary clinical signals and reduces overfitting to any single representation. Each base learner was probability-calibrated on its validation fold using CalibratedClassifierCV (sigmoid) during 5-fold stratified cross-validation with stratified splits to preserve class ratios, yielding calibrated out-of-fold (OOF) probabilities. This approach avoids data leakage and provides reliable probability estimates for downstream thresholding and decision analysis.
We constructed a linear ensemble by grid-searching simplex-constrained weights across the six calibrated OOF prediction streams. A global Platt calibration (LogisticRegression) was then applied to the ensemble's weighted score to restoreprobability calibration. A clinical-operational threshold was selected on the OOF predictions to maximize a compositeperformance metric that combines F1, AUPRC, Net Benefit, and Expected Calibration Error (ECE) while enforcing a predefined sensitivity requirement. At inference time, we apply the saved threshold, with a percentile-based fallback as back-up.
This architecture leverages XGBoost's proficiency with heterogeneous clinical data while the dual-calibration ensembleensures probabilistically reliable and clinically actionable predictions.
Hyperparameter selection for each base learner was guided by a combination of prior domain knowledge and empirical optimization. For XGBoost classifiers, we tuned tree complexity (depth, learning rate, subsampling, column sampling) and explicitly set scale_pos_weight to account for outcome imbalance within each feature set, ensuring sensitivity to rare sepsis cases. We incorporated Optuna-based Bayesian optimization for candidate configurations, followed by cross-validation–based selection to stabilize performance estimates. To mitigate calibration drift, every base model was wrapped in CalibratedClassifierCV, with calibration fitted only on held-out folds to prevent information leakage.
At the ensemble level, we performed a grid search over simplex-constrained weights on the six calibrated out-of-fold prediction streams. The ensemble optimizer targeted a composite scoring function balancing F1, AUPRC, Net Benefit, and Expected Calibration Error, while enforcing a hard sensitivity constraint (≥0.95) to reflect clinical safety priorities. To further refine deployment readiness, we applied a global Platt calibrator trained on ensemble out-of-fold predictions, ensuring well-calibrated probabilities across risk thresholds. The final decision threshold was selected from the OOFdistribution to maximize the composite objective under the sensitivity constraint, with a percentile-based fallback strategy implemented for test-time robustness.
This multi-level tuning pipeline spanning base learner hyperparameters, calibration, ensemble weighting, and thresholdselection ensured that model optimization aligned with both statistical performance and domain-specific clinical requirements, prioritizing high sensitivity for sepsis detection while minimizing false alarms.
Team website: https://bcchr.ca/dhil