overall performance

For this study, we evaluated performance using a macro F1 score based on a binary classification of predicting “severe” drought (score of 2.5 or higher or USDM categories D2 and above) vs. non-severe drought (score of less than 2.5 or USDM categories D1 and lower). This binary classification was chosen to evaluate how well models are able to predict severe or higher drought scores in comparison to more common lower drought scores. Weighted F1 score was provided for comparison, though macro F1 was the main evaluation metric. Given that severe drought is increasing in frequency, severity, and length, the macro F1 will likely better represent a model’s ability to predict future severe drought, despite low or no drought conditions at present. 


The other evaluation metrics used were mean squared error (MSE) and mean absolute error (MAE) to demonstrate the difference between actual and predicted scores. MSE is calculated by averaging the squared difference between the actual and predicted values, while MAE is the average of the absolute value of the difference between the actual and predicted values.


The overall performance of the tested models is shown below: 

Figure 1: Final model results

*Macro F1 scores are from a binary classification for ability to predict “Severe" (USDM2) drought score of 2.5 or higher in the dataset

Key Takeaways:

Please proceed to the other pages within our Results section to review more facts and figures about our analysis of our two best-performing models, the XGBoost and LSTM