overall performance
For this study, we evaluated performance using a macro F1 score based on a binary classification of predicting “severe” drought (score of 2.5 or higher or USDM categories D2 and above) vs. non-severe drought (score of less than 2.5 or USDM categories D1 and lower). This binary classification was chosen to evaluate how well models are able to predict severe or higher drought scores in comparison to more common lower drought scores. Weighted F1 score was provided for comparison, though macro F1 was the main evaluation metric. Given that severe drought is increasing in frequency, severity, and length, the macro F1 will likely better represent a model’s ability to predict future severe drought, despite low or no drought conditions at present.
The other evaluation metrics used were mean squared error (MSE) and mean absolute error (MAE) to demonstrate the difference between actual and predicted scores. MSE is calculated by averaging the squared difference between the actual and predicted values, while MAE is the average of the absolute value of the difference between the actual and predicted values.
The overall performance of the tested models is shown below:
Figure 1: Final model results
*Macro F1 scores are from a binary classification for ability to predict “Severe" (USDM2) drought score of 2.5 or higher in the dataset
Key Takeaways:
The best performing model according to the macro F1 score, or ability to “accurately” predict severe drought scores or higher, was the LSTM model, followed by the XGBoost model
The CNN and Random Forest model had lower but similar macro F1 scores for severe drought scores, though higher MSE and MAE values in comparison
All tested models performed better than the baseline model in all metrics
The LSTM had a macro F1 score of 0.90 for predicting severe drought, with a MSE of 0.32 and MAE of 0.33
The MAE and MSE indicated that the model was, on average, predicting a drought score that differed from the actual drought score of around 0.33, with drought scores ranging from 0-5
The XGBoost model performed similarly, though with a worse MAE of 0.35, or an average difference of 0.35 between the predicted and actual score
Please proceed to the other pages within our Results section to review more facts and figures about our analysis of our two best-performing models, the XGBoost and LSTM