Before Data Transformation Snapshot
This is the original hospital dataset containing attributes like:
Hospital Type, Hospital Ownership, ZIP Code, and various performance metrics. Many fields include "Not Available" or missing values, requiring data cleaning before modeling.
After Data Transformation Snapshot
Features used include:
Count of MORT Measures Better (mortality)
Count of READM Measures Better (readmission)
One-hot encoded hospital types
Target: Hospital Ownership (label encoded as y_cls)
Interpretation:
Feature space is sparse and unbalanced.
Several classes in y_cls are underrepresented (e.g., label 7 appears frequently, others rarely).
Results
Best Parameters:
class_weight='balanced'
max_depth=None
min_samples_split=5
n_estimators=50
Classification Metrics:
High precision/recall for Government - Hospital District or Authority (1.00)
Very poor performance on most other classes (recall and precision = 0.00)
Overall accuracy: ~17.6%
Weighted F1-score: ~0.12
Interpretation:
Model is highly skewed toward one dominant class.
Class imbalance is a major issue despite using class_weight='balanced'.