Text Mining with tm

R package tm by (Ingo Feinerer)
The main package to perform text mining tasks in R is tm 
Make yourself a favor and check its documentation and vignettes:


Lexical Corpus
The main structure for managing documents in tm is a so-called Corpus which represents a collection of text documents. If your textual data is in a vector object, which it will usually be when extracting information from twitter, the way to create a corpus is:
mycorpus = Corpus(VectorSource(object))


Transformations
Once we have a corpus we typically want to modify the documents in it by doing some stemming, stopword, removal, etc. These tasks can be performed in tm with the so-called transformations via the tm_map function

stripWhitespace: eliminate extra white-spaces
mycorpus1 = tm_map(mycorpus, stripWhitespace)

tolower: convert text to lower case
mycorpus2 = tm_map(mycorpus, tolower)

removeWords: remove words like stopwords
mycorpus3 = tm_map(mycorpus, removeWords, stopwords("english"))

removePunctuation: remove punctuation symbols
mycorpus4 = tm_map(mycorpus, removePunctuation)

removeNumber: remove numbers
mycorpus5 = tm_map(mycorpus, removeNumber)

Apply various transformations at the same time
tm_map(x, 


Term-Document Matrices
A common approach in text mining is to create a term-document matrix from a corpus with the use of the functions:
TermDocumentMatrix create a matrix with terms as rows and documents as columns
DocumentTermMatrix create a matrix with documents as rows and terms as columns
Each one of these two types of matrices is in fact the meat-and-potatoes for most of the analysis in R because we apply classifications, cluster analysis, association analysis, and so on.