Main Findings
This research shows that employing an LSTM or XGBoost modeling approach yields the most effective results in predicting USDM drought classification scores in California
Due to the imbalance of drought scores inherent in the data, that is drought being more uncommon than not, models generally underpredict extreme drought scores, particularly in areas with very low severe drought occurrences, such as by the Nevada-California border
MAE results in the range of 0.3-0.4 suggested that model predictions are deviating at generally less than half of one drought category
Through feature importance evaluation, we found that previous drought scores and a time indication, specifically the month of the occurring score, were the most important features in guiding model predictions, in addition to Earth skin temperature, precipitation, and humidity
Experimenting with past window data size and future forecast horizon sizes, we found using 24 weeks or more of data resulted in reasonable performance in macro F1, 88-90%, for a 4-16 week forecast horizon
A shorter forecast horizon will result in better performance in all evaluated metrics, but increasing the amount of weeks used in training will not lead to consistent performance gains
Although the LSTM displayed high accuracy in the initial weeks, discrepancies between predicted and actual scores grew in later weeks. Nonetheless, the LSTM excelled in areas with persistent and significant drought conditions, suggesting its potential utility in guiding decision-making for regions susceptible to frequent drought occurrences
Future Work
Analysis of the metrics maps reveals heterogeneous model performance across counties, underscoring the intricate influence of geography, climate patterns, and water resources management in California
Our predictive modeling solely relied on local weather variables and historical drought scores
However, certain regions may exhibit unique characteristics, where factors beyond local weather variables significantly impact drought conditions
In such scenarios, future work includes the integration of GIS data to enhance our comprehension and prediction of drought dynamics
GIS data offers valuable insights into various factors including soil moisture, land cover, and hydrological features, enriching our predictive capabilities beyond the scope of local weather variables alone
The incorporation of GIS data holds the potential to bolster the accuracy and robustness of drought forecasting systems.
When applying the same modeling approach to different areas, such as various states, challenges arise due to differences in geographical, climatic, and environmental conditions, which impact drought occurrences
Therefore, customizing or adapting model parameters and features becomes essential to achieve optimal performance across diverse regions.
In California, our dataset contained over 63K data entries from 2000 to 2020, encompassing 58 counties. However, states with smaller geographical areas, such as Delaware, a state with only three counties, face data availability constraints
In such instances, adopting a regional modeling strategy that integrates data from neighboring areas could mitigate the scarcity of county-level data, ensuring a robust model development and assessment process
Ethical Considerations
We considered the following items as the main ethical considerations for our capstone project, drought prediction in California:
It is key that these predictive models be as accurate as possible. The results from our drought prediction models would likely be used to shape or guide public policy, particularly regarding water management and wildfire preparedness. Any information that shapes public policy can have a major impact for residents living in the areas affected. For example, funding for services could be reallocated, water management policy could be changed to limit certain crop growth, etc.
The drought prediction task could have more impact in other areas of the world than in California. Drought can lead to food insecurity and famine in areas where there are fewer resources or tools to predict, prepare for, and manage natural disasters. Our focus may have been better served in these areas.
However, we do believe our Capstone project and its topic to have much less risk for harm than it has opportunity for positive impact. This task does not require the use of personal or individual data, and there is little opportunity to use drought predictions for active harm to individuals or others.