Colorado vs Utah

Random Forest
- Hospital Data -

Before Data Transformation Snapshot
This is the original hospital dataset containing attributes like:

- Hospital Type, Hospital Ownership, ZIP Code, and various performance metrics. Many fields include "Not Available" or missing values, requiring data cleaning before modeling.

After Data Transformation Snapshot

- Features used include:
  - Count of MORT Measures Better (mortality)
  - Count of READM Measures Better (readmission)
  - One-hot encoded hospital types
- Target: Hospital Ownership (label encoded as y_cls)
- Interpretation:
  - Feature space is sparse and unbalanced.
  - Several classes in y_cls are underrepresented (e.g., label 7 appears frequently, others rarely).

Results

Best Parameters:
- class_weight='balanced'
- max_depth=None
- min_samples_split=5
- n_estimators=50
Classification Metrics:
- High precision/recall for Government - Hospital District or Authority (1.00)
- Very poor performance on most other classes (recall and precision = 0.00)
- Overall accuracy: ~17.6%
- Weighted F1-score: ~0.12
Interpretation:

Model is highly skewed toward one dominant class.
Class imbalance is a major issue despite using class_weight='balanced'.

Page updated

Report abuse