introduction

Throughout the last few decades, the planet has experienced worsening weather conditions and increasing frequency and scale of natural disasters. Among these events, drought stands out as one of the most devastating phenomena, having significant socioeconomic and environmental consequences, particularly in regions with significant agricultural industry and limited water resources, such as California. 


From 2012 to 2016, the state experienced one of its most severe and prolonged droughts in its history, which had significant residual effects, such as the loss of natural forests, native fish populations, and decreased groundwater levels (Lund et al., 2018). This drought period’s dryness and heat conditions occurred with a frequency estimated to be between once in 20-1,200 years. In 2015, the total estimated economic effect of the drought was $2.7 billion, with nearly a third of that amount stemming from crop losses, plus 21,000 total job losses (Howitt et al., 2014). 


More recently, the 2020 and 2021 water years in California were the second-driest two-year periods in the history of water records and the driest two-year period since the 1976-1977 drought (PPIC, 2022). With the drought came an estimated $1.2 billion economic impact cost for 2022 alone on the agricultural industry (Medellín-Azuara et al., 2022) and an extreme series of wildfires throughout 2020. From the 2009-2018 decade, wildfires cost almost $1 billion on average annually, with up to $3.52 billion in estimated structure value loss in Butte county alone (Buechi et al., 2021).


Drought is a slow developing natural disaster resulting from a mixture of complex factors, including water deficit and local and global weather phenomena, making drought a challenge to predict in advance (Funk & Shukla, 2020). Early detection and warning systems are key for the mitigation of negative consequences due to drought. To combat this environmentally and financially troublesome issue, several governmental and non-governmental organizations have attempted to develop tools and systems to accurately predict future drought severity. This includes the U.S. National Integrated Drought Information System, the U.S. Drought Monitor (USDM), and the Global Integrated Drought Monitoring and Prediction System, to name a few. Such methods employ the use of common drought indicators, like precipitation, temperature, streamflow, and indices, which are computed numerical representations of drought intensity (WMO et al., 2016), such as the Aridity and Crop Moisture indices. 


Researchers have also attempted to utilize various time series and machine learning-based models to help improve predictive power for drought. These approaches have shown effectiveness in predicting drought intensity. Brust et al. (2021) were able to predict the USDM drought classification with MSE values of 0.0534 - 0.5565, or with a difference of less than one drought category, up to 12 weeks in advance using a recurrent neural network using the the 2017 Northern Plains Flash Drought as a case study. Cao et al. (2023) were able to accurately predict (80-90% depending on region) the USDM drought classification using Markov chains up to 4 weeks ahead for the nation. Hameed et al. (2023) were able to develop and compare multi-month forecasting models for the Great Lakes region specifically using Extreme Learning Machine (ELM), random forest, and other hybrid models in combination with their newly developed Multivariate Standardized Lake Water Level Index (MSWI) for assessing drought.


Previous research did not seem to focus specifically on California, though some work, like Brust et al. (2021), highlighted that drought in the Western U.S. was particularly hard to predict due to the particularly slow-developing nature of drought in the region. Our goal was to focus on highlighting opportunities to predict drought intensity within the state.


Our research focuses on the USDM drought classification, which depicts the intensity of droughts on a weekly basis across the country (NDMC et al., 2024) and serves as our guidelines for establishing varying drought conditions. The USDM uses a five-category system, ranging from Abnormally Dry (“D0”), a precursor to drought conditions when there is no official drought, to Exceptional Drought (“D4”), the most severe conditions. For example, for a given day, the data for a location will dictate what percent of that area is classified as D0, D1, D2, D3, D4, or none. Drought categories depict experts’ assessments of conditions related to dryness and drought, including observations of how much water is available in streams, lakes, and soil compared to usual for the same time of year.


Similar to prior studies conducted by Brust and Cao, this work’s goal is to determine which machine learning and/or time series-based models provided the most accurate prediction of drought intensity in California in combination with meteorological variables. Based on previous work (Nangunde et al., 2023), a long short-term memory (LSTM), convolutional neural network (CNN), and two decision tree approaches, extreme gradient boosting (XGBoost) and random forest, are presented. LSTM models have shown promising results in hydrological prediction tasks and other time series problems (Nangunde et al., 2023), as they can capture temporal relationships between features and can model non-linear relationships. The team also built a CNN model as it is well suited to capturing spatial relationships within the data, while decision tree models are able to help determine which features are most important in determining drought scores.


The models presented here are able to predict the drought intensity scores with a high degree of accuracy, with the best performing models producing F1 scores of ~90%. The models can be used by local government agencies, such as water departments, to obtain timely drought predictions. The resulting predictions can be used to implement preventative actions such as water conservation measures, agricultural planning, and disaster preparedness efforts.