Definition: The automated process of obtaining information from text
Sentiment Analysis
Tweets read each day and sorted
Analyzed for 'positive' vs 'negative' keywords
Data is matched with macro world events to create hedonometer.
Twitter data is by far the most widely used tool for sentiment analysis. Scientists refer to the public opinions and viewpoints to often predict financial profits via movement of securities, or even to take real-time polls of political debates. Because the text is limited to a certain number of characters, it becomes easy to extract using the same format.
Bag of Words model
Each sentence loses its grammar and punctuational sense
Data Scientists only care about words, or a group of words
Starting with Basics - more complex algorithims like google's includes punctuation and grammar using its search recommendations to autorcorrect on Google Drivex
Spam Filtering
Text mining takes all emails that come into the inbox, and filters based on previous factors and algorithims
All spam is sent to spam folder
If sender is not in spam list, text is analyzed for generic coupouns and advertisements