Predicting Stock Prices - Regression Problem

In this example, we take a look into how to use Neurons in a regression problem: predicting the mean price of a stock in a certain day.

What is a Regression Problem?

In Machine Learning, a Regression Problem is a problem where the output variable is continuous, as opposed to a Classification problem, for example. In our case, the stock price can be any positive real number.

Dataset Structure

This time, we'll use different datasets for training and testing the model. For training, we'll use Apple's stock values from 15/02/2013 (dd/mm/yyyy) to 08/06/2016. For testing we'll use the prices from 09/04/2016 to 06/02/2018. The columns are as follows:

0. Date (yyyymmdd)
1-5. Mean price of stock in the last 5 days (dollars)
6. Current day mean price (dollars)

For our model, we'll use the stock price of the last 5 days as input, and we want it to predict the current day stock price.

Creating the Model

Let's create our model.

Open Neurons. As the 'Training File', select 'stocks_data_apple_train.txt' inside the 'Data' folder.
Change input columns to 1-5, the last few days stock prices. Change output column to 6, the current day price.
Press 'File'->'New Model' or just press 'Ctrl + N'. We'll name the model 'Stocks_model'.

Create model.

Next, we'll add 2 Hidden Layers and add some neurons in the Input layer.

Double click the Input Layer, change neurons to 20.
Press 'Add'. Let's use 20 neurons. We'll name the layer 'Layer_2'. Press 'Add Layer'.
Press 'Add' again, choose 12 neurons for this one. Name it 'Layer_3'.

Layers setup.

Training the Model

Since this is a Regression problem, we'll have to change some of the training settings.

Press 'Train'.
Change 'Loss function' to 'Mean Squared Error'. This'll tell us how far away the predicted stock price is from the true one.
Select the 'Mean Absolute Percentage Error' metric.
Epochs: 200. Batch size: 5.
Select all 'Show' options. Press 'Train'.

Training settings.

Take a look at our model's performance:

Training output.

Model Metric and Loss plots (zoomed in).

As we see in the first figure, our estimates usually deviate 1.3% from the true value. This seems very good, but remember these are stock prices, so a small percentage variation can be a big deal. Let's use the test dataset to verify our model's performance against new data.

Testing the Model

We want to see how this model performs with data it hasn't seen before, so we'll use the test dataset.

In 'Test file', choose the file 'stocks_data_apple_test.txt' that's inside the 'Data' folder.
Press 'Test' and select 'Predicted Vs Correct Output' in 'Show Plots'
Click 'Browse' to save testing data to a file, we'll call it 'stocks_output'.
Press 'Test Model'

True vs Predicted Output

This is the plot of every True Output paired to its Predicted Output. The closer the points are to the green line, the better. The blue points are pretty close to it most of the time, but there's still some dispersion.

Also, notice that the points tend to sit above the line, which means our model is overestimating the prices.

Plot of the Predicted and True prices in the first 80 days.(made with Matplotlib).

Here we can confirm that the model is overestimating the prices, as the Predicted line is mostly above the True line.

It also looks like the the Predicted values are 'delayed' by a day. This makes sense. If the true price changes abruptly, it will only be used as input for a Predicted value the next day, so it takes 1 day for the model to notice a big change.

Our results are acceptable given the simplicity of our model, but are we going to get rich using it? Probably not. In the Final Note, let's see how we could make it better.

Final Note

This model used only the mean stock prices of the previous 5 days to predict the current price. The truth is, there are hundreds of different indicators day traders use to try to predict price fluctuations: moving averages, Bollinger bands, trading volumes, etc. One way we could redo this problem would be using some of those indicators paired with the mean prices.

There's also a better way to approach this problem with Machine Learning (and Keras): by using a model with Long Short-Term Memory layers. These types of layers, as the name suggests, give the model a sort of 'memory' of past information. Learn more about it here. Currently, Neurons only supports Dense layers, so this isn't possible with this program.

Page updated

Google Sites

Report abuse