models & methods

Feel free to scroll down the page to understand our model architecture, data engineering, and selected models

Architecture & Data Engineering

Figure 1: Machine learning architecture for multi-step time series forecasting

model architecture

Figure 1 illustrates the machine learning architecture designed for multi-step time series forecasting, where the input incorporates historical meteorological data and drought scores from the preceding m weeks, while the output comprises forecasted drought scores for the subsequent n weeks. Here, m denotes the window size, and n represents the forecasting horizon. Initially, the window size is set to 30 weeks and the forecasting horizon to 12 weeks, based on established literature (Brust et al., 2021). Utilizing these parameters, our model leverages historical data from the preceding 30 weeks to forecast drought scores for the subsequent 12 weeks. Notably, both the window size and forecasting horizon are tuned to evaluate the model's performance in predicting further into the future.

Figure 2: Data engineering pipeline

data engineering

In the data engineering phase of our research, we assembled weekly time series data encompassing various meteorological variables and historical drought scores for each county. Augmented with additional features such as longitude, latitude, and month to capture geospatial and temporal characteristics, the data was initially partitioned into distinct train-validation-test sets, maintaining a split ratio of 70%, 10%, and 20%, respectively. Importantly, this partitioning procedure was conducted without shuffling the data, preserving the temporal integrity of the sequences.


Subsequently, the data underwent a windowing process, wherein it was segmented into smaller, overlapping subsequences known as windows. These windows encapsulated historical data from the past m weeks as features, along with corresponding drought scores for the subsequent n weeks as labels. This windowing technique is instrumental in transforming the time series forecasting problem into a supervised machine learning task, enabling the model to learn from past observations to make predictions about future drought trends.


Figure 2 provides a visual representation of the data engineering pipeline employed in this study. Each train-validation-test set from individual counties was consolidated into statewide datasets, facilitating comprehensive model training and evaluation. Following consolidation, each dataset underwent normalization independently to ensure consistency in data scaling across different regions and variables.

Models

Figure 3: Baseline Model 

(Image credit: Towards Data Science)

baseline model

For the baseline results for comparison, a multistep baseline or persistence model that averaged the drought score over the “Window” period was used. This baseline provided an unskilled result for comparison with our other machine learning models.

Figure 4: Random Forest

(Image credit: Medium)

random forest

The first tree-based model utilized in this study was the random forest. This model aggregated predictions from multiple decision trees, considering various combinations of samples and records. The primary parameters influencing predictive accuracy are the number of estimators, representing the total trees employed, and the maximum depth of each decision tree. After hyperparameter tuning, our final parameters were 300 trees with a maximum depth of 4.

Figure 5: Convolutional Neural Network

(Team created image)

convolutional neural network (CNN)

We used a Convolutional Neural Network (CNN) architecture tailored for multistep time series forecasting tasks. The CNN model was constructed using TensorFlow's Keras API, instantiated as a sequential model. The architecture began with a 1D convolutional layer comprising 64 filters and kernel size of 3, activated using rectified linear units (ReLU). Subsequently, a max-pooling layer with a pool size of 2 was applied to downsample the feature maps. A dropout layer with a dropout rate of 0.1 was then employed for regularization to prevent overfitting. Next, a fully connected dense layer with 30 units and ReLU activation was incorporated, followed by another dropout layer with a dropout rate of 0.1. Finally, the output layer produced the predicted values for the multistep forecasting, configured with the appropriate number of units (referred to as the horizon). The model was compiled with the Adam optimizer and utilized the Mean Absolute Error (MAE) loss function. This architecture aimed to capture intricate patterns within the time series data while mitigating overfitting through dropout regularization. Hyperparameters such as the numbers of filters and the kernel size were fine-tuned to enhance model performance.

Figure 6: XGBoost

(Image credit: GeeksforGeeks)

xgboost

We used one other decision tree model, Extreme Gradient Boosting (XGBoost), in this study. Like random forest, XGBoost is a decision tree ensemble model. However, when creating decision trees, XGBoost uses additive training, i.e. one tree is added at one time, as opposed to randomly creating all trees at the same time. When one tree is added at a time, the model learns from the prediction score, optimization results, and regularization term from the previous tree before creating the next tree. The model continues to create the desired number of trees, or estimators, and sums the prediction results. After hyperparameter tuning, we found the XGBoost model produced optimal results when using 100 estimators, a max depth of 3, and a learning rate of 0.15.



Figure 7: Long Short-Term Memory

(Image credit: Medium)

long short-term memory (LSTM)

An LSTM model tailored for multistep time series forecasting was introduced in this study. Leveraging TensorFlow's Keras API, the LSTM architecture was implemented as a sequential model. The model was structured with an initial LSTM layer comprising 150 units, followed by a dropout layer with a dropout rate set to 0.1 to mitigate overfitting. Subsequently, another LSTM layer with 75 units was incorporated, along with another dropout layer with the same dropout rate. The model concluded with a dense layer responsible for generating predictions for the multistep forecasting, configured with the appropriate number of units (referred to as the horizon). For optimization, the model was compiled with the Adam optimizer and utilized the MAE loss function to assess the disparity between predicted and actual values. Hyperparameters such as the number of units in the LSTM layers and the dropout rates in the dropout layers were fine-tuned to enhance model performance.