Our project achieved the following goals.
Decreased the runtime of data processing and model training by Big Data and Big Compute techniques. Solved the issues of mass input data and long training time in stock predictions by LSTM.
Implemented the parallel data processing by Spark, and parallel training of LSTM models based on datasets of different stocks and different time ranges by SLURM Job Manager on Cannon.
The LSTM models learned the price trend well after tuning the models.
Ran the above executions with different nodes and cores and achieved good speedup
Built user-friendly applications for stock price predictions.
In the future, Our software can be improved in the following aspects.
Explore more stocks and industries to improve the model generalization
Explore more news / social media texts to get more market sentiments
Mitigate prediction latency by using faster fetching data APIs and replacing sequential parts with parallel programs, e.g. replacing LSTM by Transformers.
Build an interactive web application to provide real-time predictions
Hutto, C., & Gilbert, E. (2014, May). Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the International AAAI Conference on Web and Social Media (Vol. 8, No. 1).
Li, X., Li, Y., Yang, H., Yang, L., & Liu, X. Y. (2019). DP-LSTM: Differential privacy-inspired LSTM for stock prediction using financial news. arXiv preprint arXiv:1912.10806.
P, Prasad. (2009). Parallel Quicksort using MPI & Performance Analysis. https://www.codeproject.com/Articles/42311/Parallel-Quicksort-using-MPI-Performance-Analysis.