Time series prediction problems are a difficult type of predictive modeling problem. Unlike regression predictive modeling, time series also adds the complexity of a sequence dependence among the input variables. A powerful type of neural network designed to handle sequence dependence is called recurrent neural networks. The Long Short-Term Memory network or LSTM network is a type of recurrent neural network used in deep learning because very large architectures can be successfully trained. In this post, you will discover how to develop LSTM networks in Python using the Keras deep learning library to address a demonstration time-series prediction problem. ## Problem DescriptionThe problem we are going to look at in this post is theInternational Airline Passengers prediction problem. This is a problem where, given a year and a month, the task is to predict the number of international airline passengers in units of 1,000. The data ranges from January 1949 to December 1960, or 12 years, with 144 observations. The dataset is available for free from the DataMarket webpage as a CSV download with the filename “ Below is a sample of the first few lines of the file. We can load this dataset easily using the Pandas library. We are not interested in the date, given that each observation is separated by the same interval of one month. Therefore, when we load the dataset we can exclude the first column. The downloaded dataset also has footer information that we can exclude with the You can see an upward trend in the dataset over time. You can also see some periodicity to the dataset that probably corresponds to the Northern Hemisphere vacation period. Plot of the Airline Passengers Dataset We are going to keep things simple and work with the data as-is. Normally, it is a good idea to investigate various data preparation techniques to rescale the data and to make it stationary. ## Long Short-Term Memory NetworkThe Long Short-Term Memory network, or LSTM network, is a recurrent neural network that is trained using Backpropagation Through Time and overcomes the vanishing gradient problem. As such, it can be used to create large recurrent networks that in turn can be used to address difficult sequence problems in machine learning and achieve state-of-the-art results. Instead of neurons, LSTM networks have memory blocks that are connected through layers. A block has components that make it smarter than a classical neuron and a memory for recent sequences. A block contains gates that manage the block’s state and output. A block operates upon an input sequence and each gate within a block uses the sigmoid activation units to control whether they are triggered or not, making the change of state and addition of information flowing through the block conditional. There are three types of gates within a unit: **Forget Gate**: conditionally decides what information to throw away from the block.**Input Gate**: conditionally decides which values from the input to update the memory state.**Output Gate**: conditionally decides what to output based on input and the memory of the block.
Each unit is like a mini-state machine where the gates of the units have weights that are learned during the training procedure. You can see how you may achieve sophisticated learning and memory from a layer of LSTMs, and it is not hard to imagine how higher-order abstractions may be layered with multiple such layers. ## LSTM Network for RegressionWe can phrase the problem as a regression problem. That is, given the number of passengers (in units of thousands) this month, what is the number of passengers next month? We can write a simple function to convert our single column of data into a two-column dataset: the first column containing this month’s (t) passenger count and the second column containing next month’s (t+1) passenger count, to be predicted. Before we get started, let’s first import all of the functions and classes we intend to use. This assumes a working SciPy environment with the Keras deep learning library installed. Before we do anything, it is a good idea to fix the random number seed to ensure our results are reproducible. We can also use the code from the previous section to load the dataset as a Pandas dataframe. We can then extract the NumPy array from the dataframe and convert the integer values to floating point values, which are more suitable for modeling with a neural network. LSTMs are sensitive to the scale of the input data, specifically when the sigmoid (default) or tanh activation functions are used. It can be a good practice to rescale the data to the range of 0-to-1, also called normalizing. We can easily normalize the dataset using the After we model our data and estimate the skill of our model on the training dataset, we need to get an idea of the skill of the model on new unseen data. For a normal classification or regression problem, we would do this using cross validation. With time series data, the sequence of values is important. A simple method that we can use is to split the ordered dataset into train and test datasets. The code below calculates the index of the split point and separates the data into the training datasets with 67% of the observations that we can use to train our model, leaving the remaining 33% for testing the model. Now we can define a function to create a new dataset, as described above. The function takes two arguments: the This default will create a dataset where X is the number of passengers at a given time (t) and Y is the number of passengers at the next time (t + 1). It can be configured, and we will by constructing a differently shaped dataset in the next section. Let’s take a look at the effect of this function on the first rows of the dataset (shown in the unnormalized form for clarity). If you compare these first 5 rows to the original dataset sample listed in the previous section, you can see the X=t and Y=t+1 pattern in the numbers. Let’s use this function to prepare the train and test datasets for modeling. The LSTM network expects the input data (X) to be provided with a specific array structure in the form of: Currently, our data is in the form: [ We are now ready to design and fit our LSTM network for this problem. The network has a visible layer with 1 input, a hidden layer with 4 LSTM blocks or neurons, and an output layer that makes a single value prediction. The default sigmoid activation function is used for the LSTM blocks. The network is trained for 100 epochs and a batch size of 1 is used. Once the model is fit, we can estimate the performance of the model on the train and test datasets. This will give us a point of comparison for new models. Note that we invert the predictions before calculating error scores to ensure that performance is reported in the same units as the original data (thousands of passengers per month). Finally, we can generate predictions using the model for both the train and test dataset to get a visual indication of the skill of the model. Because of how the dataset was prepared, we must shift the predictions so that they align on the x-axis with the original dataset. Once prepared, the data is plotted, showing the original dataset in blue, the predictions for the training dataset in green, and the predictions on the unseen test dataset in red. We can see that the model did an excellent job of fitting both the training and the test datasets. LSTM Trained on Regression Formulation of Passenger Prediction Problem For completeness, below is the entire code example. Running the example produces the following output. We can see that the model has an average error of about 23 passengers (in thousands) on the training dataset, and about 52 passengers (in thousands) on the test dataset. Not that bad. ## LSTM for Regression Using the Window MethodWe can also phrase the problem so that multiple, recent time steps can be used to make the prediction for the next time step. This is called a window, and the size of the window is a parameter that can be tuned for each problem. For example, given the current time (t) we want to predict the value at the next time in the sequence (t+1), we can use the current time (t), as well as the two prior times (t-1 and t-2) as input variables. When phrased as a regression problem, the input variables are t-2, t-1, t and the output variable is t+1. The A sample of the dataset with this formulation looks as follows: We can re-run the example in the previous section with the larger window size. The whole code listing with just the window size change is listed below for completeness. Running the example provides the following output: We can see that the error was increased slightly compared to that of the previous section. The window size and the network architecture were not tuned: this is just a demonstration of how to frame a prediction problem. LSTM Trained on Window Method Formulation of Passenger Prediction Problem |

Trang chủ > IT > Data Mining > Time Series Analysis > The Promise of Recurrent Neural Networks for Time Series Forecasting >