We analyzed live streamed tweets about well-known billionaires Bill Gates, Elon Musk, Donald Trump, and Jeff Bezos to see whether Twitter held an more positive or negative overall view on each person. We presented this information in graphic form to visualize this perception and infer if it is affected by each billionaire's level of philanthropy and contribution to society.
Are people's biases of public figures influenced by objective or purely subjective factors? (List shown below.)
The first step was to gather the data we were hoping to analyze. To do this we used Tweepy, a Python library which allowed access to Twitter's tweets, and importantly, the ability to stream them. After we had extracted and began to stream the tweets, we recognized that we needed to omit certain data, such as hashtags, urls, and user mentions, as such needless data did not contribute to the analysis.
After defining a function to clean the text, a portion of code capable to be used to perform a specific task, we began our analysis. We used the TextBlob Python library to return a decimal value for each tweet, representing both the overall positive or negative sentiment of the tweet as well as how strongly the author communicated that sentiment.
To allow our data to be more easily understood and interpreted, we used two different methods to visually represent it.
To demonstrate the strength of the sentiment held by Twitter users about each billionaire, we created a running line graph. The vertical axis was between -1 and 1. On this scale, a neutral tweet would have a sentiment of 0, an extremely negative tweet would have a sentiment of -1, and an extremely positive tweet would have a sentiment of 1. The graph automatically updates with each new tweet and presents a picture of how people feel about the billionaires at the exact same time the program is being run.
To visualize the percentage split of positive and negative tweets per billionaire, we created pie charts to represent the general trends.
Of the pie, the green slice represents the positive percentage while the red slice represents the negative percentage. As more tweets are published, the graphs update in real time, shifting to show the changing sentiments.
It is important to note that the artificial intelligence algorithm utilized in the TextBlob library which we used to classify our tweets is not 100% accurate. The algorithm "learned" how to classify data by using "training sets" to learn by example. These training sets are large sets of data containing examples of text classified by sentiment, which the algorithm could use to gain a general idea of what words and phrases represent what sentiments. However, since these training sets can't contain every possible tweet, when the algorithm encounters new text (the livestreamed tweets from our project) or ambiguity (words with double meanings, slang, poor grammar) it must guess to try and classify it. The need to guess results in misclassifications and therefore error.
Additionally, the project data is gathered from Twitter, which represents only a subgroup of all social media users, and an even smaller representation of the world. As the project gathered texts based off of key words, only those who were actively tweeting about the billionaires will be contributing to the gathered sentiments. Since Twitter is worldwide, there are users from other countries, although most are from the USA. It is important to note, that the Twitter demographics are not stationary as users in different portions of the globe are active at different times due to varying timezones, so data gathered live at any given time is more likely to represent the opinions of a single region and not the world - which can again skew the results of our data.