Dr. Tariq Mahmood – Professor, Department of Computer Science, School of Mathematics and Computer Science, Institute of Business Administration (IBA), Karachi
Dr. Naveed ur Rehman Siddiqui – Assistant Professor and Section Head, Department of Paediatrics and Child Health, Division of Women and Child Health, Aga Khan University Hospital (AKUH), Karachi
Hina Farheen – MS Data Science Student, Institute of Business Administration (IBA), Karachi
Akifa Khan – MS Data Science Student, Institute of Business Administration (IBA), Karachi
Ayesha Salman – MS Data Science Student, Institute of Business Administration (IBA), Karachi
Our team comprises three students of Master’s in Data Science program at the prestigious Institute of Business Administration (IBA), Karachi, which is one of the leading institutes in Pakistan. These students (Hina, Akifa and Ayesha) were supervised by Dr. Tariq Mahmood, a Professor at IBA, along with the Paediatrics professor Dr. Naveed ur Rehman Siddiqui from the Agha Khan University Hospital (AKUH), which is the most prestigious and globally well-known hospital in Pakistan. Our diverse team was brought together by a shared passion for using data science to solve real-world healthcare problems. Especially, all three students came from a non-Computer Science background and are making a career shift towards AI and Data Science, while focusing on healthcare. Hence, their motivation for this challenge has been throughout commendable. The aspiration was to work on something which would have a strong impact on healthcare. In this challenge, the team worked hard to bring together clinical and technical strengths to build a data-driven solution to enhance pediatric care.
Our approach aimed to develop a robust machine learning methodology which catered for both the imbalance in the provided pediatric sepsis data, while using innovative approaches for feature selection (given the large number of features) along with heuristic decisions to handle the extreme complexity of the data. We adopted a hybrid feature selection method which combined domain expert feature scores with multiple statistical feature scores. We handled class imbalance through standard approaches with diverse sampling strategies. A complicated task was to find the balance between the right data sampling thresholds and the right number of features.
We realized the incapability of using complicated AI and Deep Learning predictive models (which have become a norm). In our opinion, these models are more focused on big data tasks and prevent high-level micromanagement of the data, which was essential for the given task. We hence experimented with a carefully selected set of data science algorithms, including traditional and ensemble methods.
We used the results to create and configure a hybrid ensemble machine learning algorithm which turned out to be our best model. All our submissions were made on this best ensemble model, along with two other individual and robust individual ML models.
We note that the high variety of performance metrics provided by the competition organizers made the data science task extremely challenging. Our novel ensemble method was the one which made the best balance between the performance requirements.
We also fine-tuned classification thresholds through repeated cross-validation to ensure optimal balance between sensitivity and specificity.
We imputed missing values by replacing them with mean. Although this approach is very basic, but prototypical experiments indicated its flexibility given the data imbalance as compared to chained equations etc. In addition, we carefully reviewed the dataset and removed features that had too many missing values, showed inconsistencies, or contained extreme outliers that couldn’t be reasonably fixed through transformation.
Our personalized feature scoring approach combined insights from both domain experts and statistics. This was a weighted approach allowing us to prioritize features that were not only statistically significant but also clinically meaningful. For preprocessing, we standardized numerical features using a standard scaler and applied one-hot encoding to transform categorical variables. We also discretized some features.
To address the class imbalance, we used various extensions of standard ML oversampling methods, including AI-based generative models. We selected a low sampling strategy which helped us improve the model’s ability to detect minority class cases without creating too many synthetic samples. Our aim was to capture meaningful decision boundaries while minimizing the risk of overfitting. We tested different sampling ratios using cross-validation and found that this setting offered the best balance between model performance and generalizability.
Our final model was a novel type of ensemble model that combined diverse traditional ML algorithms in a unique way. Initially, we had benchmarked several individual ML algorithm. This gave us an idea of the complexity of the data and the individual strength of each algorithm. In this stage, we also applied and fine-tuned the feature selection and data augmentation approaches. Since the performance on most metrics was not satisfactory, we moved to ensemble and other more complex ML models. After thorough experimentation, which again involved feature selection and augmentation, we were able to identify several combinations of successful algorithms.
We used these models to build our personal hybrid ensemble. This was our best model along with several individual models which surprisingly displayed an almost comparable performance to ensembles. We, then, selected the three best-performing ones based on cross-validation scores. The ensemble methods were generally found to have improved robustness and performance as compared to traditional individual ML algorithms.
We used iterative, manual hyperparameter tuning to refine our models, guided by cross-validation performance and domain knowledge. Additionally, we computed prediction probabilities on the validation fold and experimented with various thresholds to optimize F1-score and sensitivity. A clinically conservative fixed threshold of 0.0159 was ultimately selected after empirical testing.
GitHub Repository: Pediatric Sepsis Challenge