Text Mining through R allows us to identify the most frequent used keywords in a paragraph of texts. Ryan will deploy text mining package (tm) and the word cloud generator package (wordcloud) to analyze the text and to visualize the keywords as a word cloud.
#Install the required packages
#OR Install the required packages through:
#Load the required packages
#Read the text file
#Load the data as a corpus
#Inspect the content of the document (optional)
#Transformation is performed using tm_map() function to replace, for example, special characters from the text
#Convert the text to lower case
#Remove numbers
#Remove english common stopwords
#Remove your own stop word
#Remove punctuations
#Eliminate extra white spaces
#Text stemming
#Build a term-document matrix (a table containing the frequency of the words)
#Generate the Word cloud
max.words=200, random.order=FALSE, rot.per=0.35,
colors=brewer.pal(8, "Dark2"))
#Identify frequent terms in the term-document matrix
[1] "airlin" "cabin" "crew" "flight" "food" "good" "servic" "singapor"
[9] "seat" "time"
#Analyze the association between frequent terms
$good
wine touchscreen base cathay oldest pacif proactiv remain
0.43 0.42 0.41 0.41 0.41 0.41 0.41 0.41
retir spark step section select exemplari seen terrif
0.41 0.41 0.41 0.34 0.33 0.31 0.31 0.31
video welldesign
0.31 0.31
#Plot word frequencies
barplot(d[1:10,]$freq, las = 2, names.arg = d[1:10,]$word,
col ="lightblue", main ="Most frequent words",
ylab = "Word frequencies")