### TIME SERIES PREDICTION WITH LSTM ON KERAS PART 1

Time series prediction problems are a difficult type of predictive modeling problem.

Unlike regression predictive modeling, time series also adds the complexity of a sequence dependence among the input variables.

A powerful type of neural network designed to handle sequence dependence is called recurrent neural networks. The Long Short-Term Memory network or LSTM network is a type of recurrent neural network used in deep learning because very large architectures can be successfully trained.

In this post, you will discover how to develop LSTM networks in Python using the Keras deep learning library to address a demonstration time-series prediction problem.

## Problem Description

The problem we are going to look at in this post is theInternational Airline Passengers prediction problem.

This is a problem where, given a year and a month, the task is to predict the number of international airline passengers in units of 1,000. The data ranges from January 1949 to December 1960, or 12 years, with 144 observations.

Below is a sample of the first few lines of the file.

 123456 "Month","International airline passengers: monthly totals in thousands. Jan 49 ? Dec 60""1949-01",112"1949-02",118"1949-03",132"1949-04",129"1949-05",121

We can load this dataset easily using the Pandas library. We are not interested in the date, given that each observation is separated by the same interval of one month. Therefore, when we load the dataset we can exclude the first column.

The downloaded dataset also has footer information that we can exclude with the skipfooterargument to pandas.read_csv() set to 3 for the 3 footer lines. Once loaded we can easily plot the whole dataset. The code to load and plot the dataset is listed below.

 12345 import pandasimport matplotlib.pyplot as pltdataset = pandas.read_csv('international-airline-passengers.csv', usecols=[1], engine='python', skipfooter=3)plt.plot(dataset)plt.show()

You can see an upward trend in the dataset over time.

You can also see some periodicity to the dataset that probably corresponds to the Northern Hemisphere vacation period.

Plot of the Airline Passengers Dataset

We are going to keep things simple and work with the data as-is.

Normally, it is a good idea to investigate various data preparation techniques to rescale the data and to make it stationary.

## Long Short-Term Memory Network

The Long Short-Term Memory network, or LSTM network, is a recurrent neural network that is trained using Backpropagation Through Time and overcomes the vanishing gradient problem.

As such, it can be used to create large recurrent networks that in turn can be used to address difficult sequence problems in machine learning and achieve state-of-the-art results.

Instead of neurons, LSTM networks have memory blocks that are connected through layers.

A block has components that make it smarter than a classical neuron and a memory for recent sequences. A block contains gates that manage the block’s state and output. A block operates upon an input sequence and each gate within a block uses the sigmoid activation units to control whether they are triggered or not, making the change of state and addition of information flowing through the block conditional.

There are three types of gates within a unit:

• Forget Gate: conditionally decides what information to throw away from the block.
• Input Gate: conditionally decides which values from the input to update the memory state.
• Output Gate: conditionally decides what to output based on input and the memory of the block.

Each unit is like a mini-state machine where the gates of the units have weights that are learned during the training procedure.

You can see how you may achieve sophisticated learning and memory from a layer of LSTMs, and it is not hard to imagine how higher-order abstractions may be layered with multiple such layers.

## LSTM Network for Regression

We can phrase the problem as a regression problem.

That is, given the number of passengers (in units of thousands) this month, what is the number of passengers next month?

We can write a simple function to convert our single column of data into a two-column dataset: the first column containing this month’s (t) passenger count and the second column containing next month’s (t+1) passenger count, to be predicted.

Before we get started, let’s first import all of the functions and classes we intend to use. This assumes a working SciPy environment with the Keras deep learning library installed.

 123456789 import numpyimport matplotlib.pyplot as pltimport pandasimport mathfrom keras.models import Sequentialfrom keras.layers import Densefrom keras.layers import LSTMfrom sklearn.preprocessing import MinMaxScalerfrom sklearn.metrics import mean_squared_error

Before we do anything, it is a good idea to fix the random number seed to ensure our results are reproducible.

 12 # fix random seed for reproducibilitynumpy.random.seed(7)

We can also use the code from the previous section to load the dataset as a Pandas dataframe. We can then extract the NumPy array from the dataframe and convert the integer values to floating point values, which are more suitable for modeling with a neural network.

 1234 # load the datasetdataframe = pandas.read_csv('international-airline-passengers.csv', usecols=[1], engine='python', skipfooter=3)dataset = dataframe.valuesdataset = dataset.astype('float32')

LSTMs are sensitive to the scale of the input data, specifically when the sigmoid (default) or tanh activation functions are used. It can be a good practice to rescale the data to the range of 0-to-1, also called normalizing. We can easily normalize the dataset using the MinMaxScaler preprocessing class from the scikit-learn library.

 123 # normalize the datasetscaler = MinMaxScaler(feature_range=(0, 1))dataset = scaler.fit_transform(dataset)

After we model our data and estimate the skill of our model on the training dataset, we need to get an idea of the skill of the model on new unseen data. For a normal classification or regression problem, we would do this using cross validation.

With time series data, the sequence of values is important. A simple method that we can use is to split the ordered dataset into train and test datasets. The code below calculates the index of the split point and separates the data into the training datasets with 67% of the observations that we can use to train our model, leaving the remaining 33% for testing the model.

 12345 # split into train and test setstrain_size = int(len(dataset) * 0.67)test_size = len(dataset) - train_sizetrain, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]print(len(train), len(test))

Now we can define a function to create a new dataset, as described above.

The function takes two arguments: the dataset, which is a NumPy array that we want to convert into a dataset, and the look_back, which is the number of previous time steps to use as input variables to predict the next time period — in this case defaulted to 1.

This default will create a dataset where X is the number of passengers at a given time (t) and Y is the number of passengers at the next time (t + 1).

It can be configured, and we will by constructing a differently shaped dataset in the next section.

 12345678 # convert an array of values into a dataset matrixdef create_dataset(dataset, look_back=1): dataX, dataY = [], [] for i in range(len(dataset)-look_back-1): a = dataset[i:(i+look_back), 0] dataX.append(a) dataY.append(dataset[i + look_back, 0]) return numpy.array(dataX), numpy.array(dataY)

Let’s take a look at the effect of this function on the first rows of the dataset (shown in the unnormalized form for clarity).

 123456 X Y112 118118 132132 129129 121121 135

If you compare these first 5 rows to the original dataset sample listed in the previous section, you can see the X=t and Y=t+1 pattern in the numbers.

Let’s use this function to prepare the train and test datasets for modeling.

 1234 # reshape into X=t and Y=t+1look_back = 1trainX, trainY = create_dataset(train, look_back)testX, testY = create_dataset(test, look_back)

The LSTM network expects the input data (X) to be provided with a specific array structure in the form of: [samples, time steps, features].

Currently, our data is in the form: [samples, features] and we are framing the problem as one time step for each sample. We can transform the prepared train and test input data into the expected structure using numpy.reshape() as follows:

 123 # reshape input to be [samples, time steps, features]trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))testX = numpy.reshape(testX, (testX.shape[0], 1, testX.shape[1]))

We are now ready to design and fit our LSTM network for this problem.

The network has a visible layer with 1 input, a hidden layer with 4 LSTM blocks or neurons, and an output layer that makes a single value prediction. The default sigmoid activation function is used for the LSTM blocks. The network is trained for 100 epochs and a batch size of 1 is used.

 123456 # create and fit the LSTM networkmodel = Sequential()model.add(LSTM(4, input_shape=(1, look_back)))model.add(Dense(1))model.compile(loss='mean_squared_error', optimizer='adam')model.fit(trainX, trainY, epochs=100, batch_size=1, verbose=2)

Once the model is fit, we can estimate the performance of the model on the train and test datasets. This will give us a point of comparison for new models.

Note that we invert the predictions before calculating error scores to ensure that performance is reported in the same units as the original data (thousands of passengers per month).

 12345678910111213 # make predictionstrainPredict = model.predict(trainX)testPredict = model.predict(testX)# invert predictionstrainPredict = scaler.inverse_transform(trainPredict)trainY = scaler.inverse_transform([trainY])testPredict = scaler.inverse_transform(testPredict)testY = scaler.inverse_transform([testY])# calculate root mean squared errortrainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0]))print('Train Score: %.2f RMSE' % (trainScore))testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0]))print('Test Score: %.2f RMSE' % (testScore))

Finally, we can generate predictions using the model for both the train and test dataset to get a visual indication of the skill of the model.

Because of how the dataset was prepared, we must shift the predictions so that they align on the x-axis with the original dataset. Once prepared, the data is plotted, showing the original dataset in blue, the predictions for the training dataset in green, and the predictions on the unseen test dataset in red.

 12345678910111213 # shift train predictions for plottingtrainPredictPlot = numpy.empty_like(dataset)trainPredictPlot[:, :] = numpy.nantrainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict# shift test predictions for plottingtestPredictPlot = numpy.empty_like(dataset)testPredictPlot[:, :] = numpy.nantestPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict# plot baseline and predictionsplt.plot(scaler.inverse_transform(dataset))plt.plot(trainPredictPlot)plt.plot(testPredictPlot)plt.show()

We can see that the model did an excellent job of fitting both the training and the test datasets.

LSTM Trained on Regression Formulation of Passenger Prediction Problem

For completeness, below is the entire code example.

 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970 # LSTM for international airline passengers problem with regression framingimport numpyimport matplotlib.pyplot as pltfrom pandas import read_csvimport mathfrom keras.models import Sequentialfrom keras.layers import Densefrom keras.layers import LSTMfrom sklearn.preprocessing import MinMaxScalerfrom sklearn.metrics import mean_squared_error# convert an array of values into a dataset matrixdef create_dataset(dataset, look_back=1): dataX, dataY = [], [] for i in range(len(dataset)-look_back-1): a = dataset[i:(i+look_back), 0] dataX.append(a) dataY.append(dataset[i + look_back, 0]) return numpy.array(dataX), numpy.array(dataY)# fix random seed for reproducibilitynumpy.random.seed(7)# load the datasetdataframe = read_csv('international-airline-passengers.csv', usecols=[1], engine='python', skipfooter=3)dataset = dataframe.valuesdataset = dataset.astype('float32')# normalize the datasetscaler = MinMaxScaler(feature_range=(0, 1))dataset = scaler.fit_transform(dataset)# split into train and test setstrain_size = int(len(dataset) * 0.67)test_size = len(dataset) - train_sizetrain, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]# reshape into X=t and Y=t+1look_back = 1trainX, trainY = create_dataset(train, look_back)testX, testY = create_dataset(test, look_back)# reshape input to be [samples, time steps, features]trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))testX = numpy.reshape(testX, (testX.shape[0], 1, testX.shape[1]))# create and fit the LSTM networkmodel = Sequential()model.add(LSTM(4, input_shape=(1, look_back)))model.add(Dense(1))model.compile(loss='mean_squared_error', optimizer='adam')model.fit(trainX, trainY, epochs=100, batch_size=1, verbose=2)# make predictionstrainPredict = model.predict(trainX)testPredict = model.predict(testX)# invert predictionstrainPredict = scaler.inverse_transform(trainPredict)trainY = scaler.inverse_transform([trainY])testPredict = scaler.inverse_transform(testPredict)testY = scaler.inverse_transform([testY])# calculate root mean squared errortrainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0]))print('Train Score: %.2f RMSE' % (trainScore))testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0]))print('Test Score: %.2f RMSE' % (testScore))# shift train predictions for plottingtrainPredictPlot = numpy.empty_like(dataset)trainPredictPlot[:, :] = numpy.nantrainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict# shift test predictions for plottingtestPredictPlot = numpy.empty_like(dataset)testPredictPlot[:, :] = numpy.nantestPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict# plot baseline and predictionsplt.plot(scaler.inverse_transform(dataset))plt.plot(trainPredictPlot)plt.plot(testPredictPlot)plt.show()

Running the example produces the following output.

 123456789101112131415 ...Epoch 95/1000s - loss: 0.0020Epoch 96/1000s - loss: 0.0020Epoch 97/1000s - loss: 0.0020Epoch 98/1000s - loss: 0.0020Epoch 99/1000s - loss: 0.0020Epoch 100/1000s - loss: 0.0020Train Score: 22.93 RMSETest Score: 47.53 RMSE

We can see that the model has an average error of about 23 passengers (in thousands) on the training dataset, and about 52 passengers (in thousands) on the test dataset. Not that bad.

## LSTM for Regression Using the Window Method

We can also phrase the problem so that multiple, recent time steps can be used to make the prediction for the next time step.

This is called a window, and the size of the window is a parameter that can be tuned for each problem.

For example, given the current time (t) we want to predict the value at the next time in the sequence (t+1), we can use the current time (t), as well as the two prior times (t-1 and t-2) as input variables.

When phrased as a regression problem, the input variables are t-2, t-1, t and the output variable is t+1.

The create_dataset() function we created in the previous section allows us to create this formulation of the time series problem by increasing the look_back argument from 1 to 3.

A sample of the dataset with this formulation looks as follows:

 123456 X1 X2 X3 Y112 118 132 129118 132 129 121132 129 121 135129 121 135 148121 135 148 148

We can re-run the example in the previous section with the larger window size. The whole code listing with just the window size change is listed below for completeness.

 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970 # LSTM for international airline passengers problem with window regression framingimport numpyimport matplotlib.pyplot as pltfrom pandas import read_csvimport mathfrom keras.models import Sequentialfrom keras.layers import Densefrom keras.layers import LSTMfrom sklearn.preprocessing import MinMaxScalerfrom sklearn.metrics import mean_squared_error# convert an array of values into a dataset matrixdef create_dataset(dataset, look_back=1): dataX, dataY = [], [] for i in range(len(dataset)-look_back-1): a = dataset[i:(i+look_back), 0] dataX.append(a) dataY.append(dataset[i + look_back, 0]) return numpy.array(dataX), numpy.array(dataY)# fix random seed for reproducibilitynumpy.random.seed(7)# load the datasetdataframe = read_csv('international-airline-passengers.csv', usecols=[1], engine='python', skipfooter=3)dataset = dataframe.valuesdataset = dataset.astype('float32')# normalize the datasetscaler = MinMaxScaler(feature_range=(0, 1))dataset = scaler.fit_transform(dataset)# split into train and test setstrain_size = int(len(dataset) * 0.67)test_size = len(dataset) - train_sizetrain, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]# reshape into X=t and Y=t+1look_back = 3trainX, trainY = create_dataset(train, look_back)testX, testY = create_dataset(test, look_back)# reshape input to be [samples, time steps, features]trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))testX = numpy.reshape(testX, (testX.shape[0], 1, testX.shape[1]))# create and fit the LSTM networkmodel = Sequential()model.add(LSTM(4, input_shape=(1, look_back)))model.add(Dense(1))model.compile(loss='mean_squared_error', optimizer='adam')model.fit(trainX, trainY, epochs=100, batch_size=1, verbose=2)# make predictionstrainPredict = model.predict(trainX)testPredict = model.predict(testX)# invert predictionstrainPredict = scaler.inverse_transform(trainPredict)trainY = scaler.inverse_transform([trainY])testPredict = scaler.inverse_transform(testPredict)testY = scaler.inverse_transform([testY])# calculate root mean squared errortrainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0]))print('Train Score: %.2f RMSE' % (trainScore))testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0]))print('Test Score: %.2f RMSE' % (testScore))# shift train predictions for plottingtrainPredictPlot = numpy.empty_like(dataset)trainPredictPlot[:, :] = numpy.nantrainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict# shift test predictions for plottingtestPredictPlot = numpy.empty_like(dataset)testPredictPlot[:, :] = numpy.nantestPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict# plot baseline and predictionsplt.plot(scaler.inverse_transform(dataset))plt.plot(trainPredictPlot)plt.plot(testPredictPlot)plt.show()

Running the example provides the following output:

 123456789101112131415 ...Epoch 95/1000s - loss: 0.0021Epoch 96/1000s - loss: 0.0021Epoch 97/1000s - loss: 0.0021Epoch 98/1000s - loss: 0.0021Epoch 99/1000s - loss: 0.0022Epoch 100/1000s - loss: 0.0020Train Score: 24.19 RMSETest Score: 58.03 RMSE

We can see that the error was increased slightly compared to that of the previous section. The window size and the network architecture were not tuned: this is just a demonstration of how to frame a prediction problem.

LSTM Trained on Window Method Formulation of Passenger Prediction Problem