Overview and Motivation

As long as capital markets have existed, investors and aspiring arbitrageurs alike have strived to gain edges in predicting stock prices. In particular, use of machine-learning techniques and quantitative analysis to make stock price predictions has become increasingly popular with time. Success, however, can be hard to come by; predicting financial price movements is an extremely difficult task. Nevertheless, the motivation for doing so is constantly present, with lucrative opportunities and insightful outcomes awaiting victors in an ongoing information battle.

Our team wishes to use the Data Science framework we learned in CS 109 to take our chances at predicting stock price movements. Through the Boston Data Festival, we participated in a competition organized by Hack/Reduce and sponsored by DataRobot on November 10. The task of the competition was to take data from the past 9 days of a stock's pricing data and the opening price on the 10th day to predict whether the closing price on the 10th day is greater than or less than the opening price on the 10th day. After experimenting with various singular and stacked models, we eventually were able to devise a model that obtained an AUC of 0.941, which won us 1st place in the competition.

Through this engaging and rewarding experience with predictive stock price modeling, we became very excited about the real-world applications of this kind of model and wanted to go more in depth into our predictive analysis after the competition. Our CS109 final project gave us a structured opportunity to extend our work. We aim to test a variety of models on S&P 500 stocks encompassing a different industries and quantify the success of our models. Such analysis may give Buffalo Capital Management an upper hand in any future investing endeavors!

Related Work

We were inspired by the successes we've seen in algorithmic trading and computer-assisted stock analysis. Several well-documented effects we've read about online are the momentum effect and regression to the mean. There has been a lot of research conducted about the significance of the momentum effect in stock price prediction. For example, Lee and Swaminathan (2002) studied the relationship between momentum and value trading strategies, while Grinblatt and Moskowitz (2004) examined the effect of consistent positive past returns on the link between past and expected returns. The wealth of research into price momentum made us interested in examining this effect further and seeing how it could be applied to stock price prediction models. 

Initial Questions

There are a number of questions that we wished to initially address in our project.
  1. Given historical data and a stock's pricing data for 9 days and its opening price on the 10th day, how much success can we have in predicting whether the closing price on the 10th day is greater than or less than the opening price on the 10th day.
  2. Which classification models will perform the best at assisting us in making predictions used to answer Question 1?
  3. Do stacked/blended models (models that utilize some committee to combine predictions made from two or more separate classification models) perform better or worse than individual classification models in making predictions used to answer Question 1?
  4. Among the S&P 500 stocks, are there any industries for which the predictions used to answer Question 1 tend to be more or less accurate than the predictions made for other industries?
  5. What are some of the differences between the behaviors of our classification models when making predictions used to answer Question 1?
Question 4 was suggested in principle by our TF, Lehman. Question 5 arose when running our models.