contain hashtags

Tweets Containing #hashtags user tweets

Checking the hashtags

It is widely accepted that tweets containing hashtags are more valuable than ones that don't because someone has deliberately gone to the process of embedding aggregatable information into those tweets. Applying the concepts of frequency analysis we could compute things like the average number of hashtags per tweet, or the average length of hashtags, for example. But since I love to visualize data, nothing more tempting than producing some word-tags. Let's see a simple example.

What do @user1, @user2, and @user3 hashtag about?

Let's say we want to know more about the tweets of three given users

For example, consider the twitter accounts of three USA government federal agencies:

@EPAgov (Environmental Protection Agency)

@NIHforHealth (National Institues of Health)

@CDCgov (Center for Disease Control and Prevention)

Step 1: Load the required packages

# load packages

library(stringr)

library(wordcloud)

Step 2: Collect tweets from each @user

# harvest tweets from each user

epa_tweets = userTimeline("EPAgov", n=500)

nih_tweets = userTimeline("NIHforHealth", n=500)

cdc_tweets = userTimeline("CDCgov", n=500)

# dump tweets information into data frames

epa_df = twListToDF(epa_tweets)

nih_df = twListToDF(nih_tweets)

cdc_df = twListToDF(cdc_tweets)

Step 3: Let's see what hashtags do they use

# get the hashtags

epa_hashtags = str_extract_all(epa_df$text, "#\\w+")

nih_hashtags = str_extract_all(nih_df$text, "#\\w+")

cdc_hashtags = str_extract_all(cdc_df$text, "#\\w+")

# put tags in vector

epa_hashtags = unlist(epa_hashtags)

nih_hashtags = unlist(nih_hashtags)

cdc_hashtags = unlist(cdc_hashtags)

# calculate hashtag frequencies

epa_tags_freq = table(epa_hashtags)

nih_tags_freq = table(nih_hashtags)

cdc_tags_freq = table(cdc_hashtags)

# put all tags in a single vector

all_tags = c(epa_tags_freq, nih_tags_freq, cdc_tags_freq)

Step 4: Let's plot wordclouds for each user

# EPA hashtags wordcloud

wordcloud(names(epa_tags_freq), epa_tags_freq, random.order=FALSE,

colors="#1B9E77")

title("\n\nHashtags in tweets from @EPAgov",

cex.main=1.5, col.main="gray50")

# NIH hashtags wordcloud

wordcloud(names(nih_tags_freq), nih_tags_freq + 7, random.order=FALSE,

colors="#7570B3")

title("\nHashtags in tweets from @NIHforHealth",

cex.main=1.5, col.main="gray50")

# CDC hashtags wordcloud

wordcloud(names(cdc_tags_freq), cdc_tags_freq, random.order=FALSE,

colors="#D95F02")

title("\n\nHashtags in tweets from @CDCgov",

cex.main=1.5, col.main="gray50")

Step 5: Now let's plot one single wordcloud

# vector of colors

cols = c(

rep("#1B9E77", length(epa_tags_freq)),

rep("#7570B3", length(nih_tags_freq)),

rep("#D95F02", length(cdc_tags_freq))

)

# wordcloud

wordcloud(names(all_tags), all_tags, random.order=FALSE, min.freq=1,

colors=cols, ordered.colors=TRUE)

mtext(c("@EPAgov", "@NIHforHealth", "@CDCgov"), side=3,

line=2, at=c(0.25, 0.5, 0.75), col=c("#1B9E77", "#7570B3", "#D95F02"),

family="serif", font=2, cex=1.5)

Page updated

Google Sites

Report abuse