Mining Questions
What's your question?
We don't collect information from tweets just for the sake of it. Rather, we gather data from Twitter because we are driven by the motivations of any data mining project: to discover useful, valid, unexpected and understandable knowledge from data. We analyze information from Twitter because we have some particular question in mind. It can be a very simple question such as: what's the average number of characters in tweets about #Obama? Or it can be a more complex issue, for instance: what kind of associations can be established between two given hashtags such as #PabloPicasso and #SalvadorDali, if any? You can come up with literally dozens of questions about your favorite topic(s), and once you start answering them, sooner or later more questions will arise from those answers.
So, what questions can we ask?
Some of the most common things we can ask focus around the three big Qs
Q1: What's everyone talking about?
Q2: What are the frequencies in data?
Q3: What relationships can be extracted from the tweets?
Q4: What's the sentiment/opinion of the people?
Of course there are more questions, but we will only focus on these ones.
Q1: What are people talking about?
What are the trending topics?
What are people talking about some given #hashtag?
What are people talking about some given term?
What is a given user talking about?
What are some given users talking about?
Q2: What are the frequencies in data?
Related to the previous questions, we can go further in the exploration
by analyzing some summary statistics based on frequency analysis:
What is the average number of words per tweet?
What is the average word-length?
What is the number of hashtags per tweet?
What is the lexical diversity of tweets?
What are the most frequent words / terms?
Lexical diversity: number of unique tokens / number of total tokens
Frequency analysis: most frequent terms
Q3: What relationships can be extracted from the tweets?
We can learning as much as possible about twitterers by inspecting
the entities that appear in their tweets.
What are the kinds of associations between users, topics, words, etc?
Social graph linkages that exist among friends and followers
Graphs connecting twitterers who have retweet information
What are the most frequently occurring entities that appear in a user's tweets?
Who does a given user retweet the most often?
Q4: What's the sentiment/opinion of the people?
How people is expressing about some certain topic?
Can we infer the sentiment of some piece of tweet text?
© Gaston Sanchez - 2012