In last year, both Bitcoin's price and market value have boomed dramatically. From January to the end of November, in 2017, Bitcoin rose by 2,000% (from 996 to 19205 dollar per coin) . Meanwhile, up to April 1st, in 2018, Bitcoin prices have fallen by nearly 200 percent from the highest point, now the price is nearly 6600 dollar per coin. The market value has also fallen, more than 100%.
Whether I can predict the next burst of the Bitcoin price is the main reason drive me to do this research. In this semester I used several classic models to make predictions and compared their results.
Fig 1.The main process
This flowchart shows the whole progress of my project. The main work lies in three parts, data processing, models analysis and summary.
I crawled the Bitcoin price from the https://coinmarketcap.com/currencies/bitcoin/historical-data/, using a function read_html from Python library Pandas.
To make the data easier to read, I reshaped it and showed it in a line graph.
Fig 2. Bitcoin price change trend since 2013-04-28
Firstly, I chose a classic machine learning model, called Long-short term memory network, which is one of the most popular Recurrent Neural Network using for time series analysis (LSTM documents) .
However, when I made predictions, I found that using neural networks to make long-term prediction is deceptive. The test set was actually used to self-correct the results, not just for comparison, which made the result too accurate.
Fig 3. The prediction result, using 2017-1-1 to 2018-1-30 as the training set, 2018-1-31 to 2018-2-17 as the test set
2. Random Walk
Secondly, I made a simple prediction using the random walk model, and I compared the single-point prediction with the long-term prediction. Unsurprisingly, the results of single-point prediction were not much different from those of LSTM, because it used the random probability of a normal distribution to predict the result of the previous point.
a.seed = 100
b.seed = 150
c.seed = 200
Fig 4. Bitcoin price prediction obtained by random walk model, with both single point prediction and long term prediction. (*Single point prediction means using the exist value to predict the next, long term prediction means using the predicted value to predict the next.
*Seed is a parameter I used in the program to get the different probability arrangement.)
Autoregressive integrated moving average is fitted to time series data either to better understand the data or to predict future points in the series (forecasting). My work uses statsmodels api. It provides me many useful tools to do the analysis.
To make the data available, I used the one-parameter Box–Cox transformations to make the data stable.
Stationary data is an important prerequisites that we can use ARIMA.
Fig 5a. Using the Autoregressive integrated moving average model to perform a prediction.
Fig 5b. Using the Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors model to perform a prediction.
SARIMAX model considers the season as a factor. For example, the price is more likely rising from the October to December. According to the best_model function, for the two models, the parameter combinations was (1,1,0) and (1,1,0)*(1,1,1,4) respectively.
Fig 6. Using the Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors model to perform a backtesting prediction. The gray area shows the possible range the the price of the Bitcoin can reach. Obviously, the prediction result is relatively better where the data fluctuation is not too large.
Fig 7. Comparison of Bitcoin Price Changes with Changes in the Nasdaq Composite Index
I compared the trend of price changes, between Bitcoin (2010 to 2018) and the Nasdaq Composite (1995 and 2005, This period is also known as the Dot-com bubble) .
I projected the two sets of data into a 0-1 range, in this way, the difference between the value of the two set will be minimize. Moreover, I used the peak value of the two sets of data as the central axis to better observe the trend of change.
Summing up, simply using the results of the existing model, because the randomness of the data is too strong, the results are obviously far from accurate. But that was not unexpected, due to the extremely volatile nature of cryptocurrencies, especially in the last year. It is probably also not a good idea to predict the Bitcoin price for a long term, due to the error is multiplied. There are still many things I can do in the future, especially in the mathematical part, if I can have a more complete model and add it on the LSTM structure, I wonder whether it will be a better result.