My analysis concentrated on the years 2012 to 2016 because I had complete years’ worth of data for them. In 2010, for example, I have only one tweet from Elon Musk in my dataset.
1. I downloaded the .csv files from Kaggle and imported them into Excel. It is worth nothing that when the elonmusk_tweets.csv file was opened in Excel, the time format changed from hh:mm:ss to hh:mm.
2. I downloaded the various libraries I needed for my project.
3. For the file about Musk’s tweets, I conducted sentiment analysis.
4. I created the clean_up(text) function to clean my tweets.
5. I removed the ‘b’ that appeared in the front of each tweet. Afterwards, I used a variety of regular expressions to remove hyperlinks, Unicode characters and unnecessary string which remained.
6. I used Regexptokenizer to split my string into substring and remove punctuation.
7. I removed stop words to prevent them from contaminating my calculations utilizing the NLTK set of stop words.
8. I created the get_tweet_polarity(file,year) function to perform sentiment analysis on my cleaned tweets.
9. I used a for loop to iterate through all the tweets in my dataset and perform sentiment analysis for the tweets in a specified month and year. I did this by combining all the tweets made in each month within a year into one long string. Each month had their own long string of tweets. Afterwards, I used TextBlob on the months’ strings.
10. I created the tesla_stock_function(file,twelvemonth) function to calculate the monthly averages of Tesla stock prices in a given year using the adjusted closing prices. I used the adjusted closing prices because, according to Investopedia, adjusted closing price amends a stock's closing price to accurately reflect that stock's value after accounting for any corporate actions. It is considered to be the true price of that stock and is often used when examining historical returns or performing a detailed analysis of historical returns.
11. I created dictionaries to store the values of the total stock prices per month, the number of stock prices I had for that month and the daily stock price for each day in a year.
For example, in January 2016, I had 19 stock prices and the total stock price was $3886.68.
12. I calculated the average monthly stock price by dividing the total stock price for that month by the number of stock prices. For example, the average for January 2016 is: $3886.68/19 stock prices = $204.56.
13. I created the scatter_plot(x_axis,y_axis,year) function to visualize my data. I used Plotly to create a scatter plot with the average monthly stock prices on the y-axis and the polarity scores on the x-axis.
14. Finally, for extra credit, I created the ols(x_axis,y_axis,year) function to draw a line of best fit.
If I had seen a positive or negative correlation between the polarities of Musk’s tweets and Tesla’s average adjusted closing stock prices, then my hypothesis would have been proven. The project took me around 10 hours to complete.