We encountered problems when we were trying to merge the stock data files into a single PySpark DataFrame
Mistake: Trying to read all stock data files separately and then merging them
Consequence: incredibly long execution time and large memory usage, OutOfMemoryError (Java Heap space)
Solution: Reading them collectively by passing the directory path to PySpark read() function
Unfortunately, there was no official data available indicating the sector affiliation of the stocks
Our Approach: researching affiliations on our own
Sources: https://finance.yahoo.com, https://www.marketwatch.com
In our study, we only concentrated on the Covid death cases, but not on infections. daily_covid_cases is also available as a column and can therefore be used for further insights.
In the future, it might also be useful to see the Stock vs Covid development in alternative regions.
Our scripts and the available datasets allow us to customize queries and find out generally a lot more than what we concentrated on in the course of the analysis.
Adding plot generation for extreme-performing stocks might also prove to be useful.
With more time and resources we could have taken other factors into account like vaccination rates, lockdown restrictions, general sentiment of the society etc. This way, a deeper understanding of the correlation between Covid and the stock market would have been possible.
Better time management would have saved time and stress
Good and precise communication is the key to success
Data is like a rough diamond and must first be polished to recognize its true value
Thank goodness, the Coronavirus pandemic is in the past
How does economic growth correlate with stock market performance and COVID-19 death rates during the pandemic?
What is the relationship between unemployment rates and stock market fluctuations in the context of the COVID-19 pandemic?
Can stock market performance of logistics companies serve as a proxy for supply chain disruptions during the pandemic?
We observed the general growth of the Healthcare industry. How has the stock performance of pharmaceutical companies been impacted by their direct involvement in the pandemic, f.e. by developing COVID-19 vaccines or treatments?
Can machine learning algorithms accurately predict future stock market movements based on past patterns of COVID-19 deaths and market responses?