by xml
Getting Data via XML getting data
R package XML (by Duncan Temple Lang)
Most of the times, the functions in the package twitteR for searching and getting data will be enough. However, there will be occasions in which we'll need to use other functions to parse the data via the twitter search API by using the functions in the R package XML.
Twitter Search API
I strongly suggest you to first take a look at the Twitter Search API documentation to understand how searches are run with this application
Check also the GET search documentation that contains the operators you can use to specify a query.
XML structure of a twitter search
In order to parse the data with the XML functions, you need to understand the XML structure delivered by a twitter search in atom standard.
Imagine we have the following search url http://search.twitter.com/search.atom?q=datamining
If you copy and paste the url on your browser, you should get something similar (but not identical) to the following code:
The code displayed in the previous image are the search results in XML format, and this is precisely what we want to get from twitter!
How to parse the data from R?
First we need to load the package XML
(remember to install it first!)
# load XML
library(XML)
# define base twitter search url (following the atom standard)
twitter_url = "http://search.twitter.com/search.atom?"
# create twitter search query to be parsed
twitter_search = paste(twitter_url, "q=datamining", sep="")
> twitter_search
[1] "http://search.twitter.com/search.atom?q=datamining"
# let's parse with xmlParseDoc
results = xmlParseDoc(twitter_search, asText=FALSE)
> results
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns:google="http://base.google.com/ns/1.0" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns="http://www.w3.org/2005/Atom" xmlns:twitter="http://api.twitter.com/" xmlns:georss="http://www.georss.org/georss" xml:lang="en-US">
<id>tag:search.twitter.com,2005:search/datamining</id>
<link type="text/html" href="http://search.twitter.com/search?q=datamining" rel="alternate"/>
<link type="application/atom+xml" href="http://search.twitter.com/search.atom?q=datamining" rel="self"/>
<title>datamining - Twitter Search</title>
<link type="application/opensearchdescription+xml" href="http://twitter.com/opensearch.xml" rel="search"/>
<link type="application/atom+xml" href="http://search.twitter.com/search.atom?since_id=204706537694953474&q=datamining" rel="refresh"/>
<updated>2012-05-21T22:53:36Z</updated>
<openSearch:itemsPerPage>15</openSearch:itemsPerPage>
<link type="application/atom+xml" href="http://search.twitter.com/search.atom?page=2&max_id=204706537694953474&q=datamining" rel="next"/>
<entry>
<id>tag:search.twitter.com,2005:204706537694953474</id>
<published>2012-05-21T22:53:36Z</published>
<link type="text/html" href="http://twitter.com/iamdanw_links/statuses/204706537694953474" rel="alternate"/>
<title>Photo: DATAMINING (via A Tour Of Cenovus’ Energy’s In-Situ Christina Lake Facility - Business Insider) http://t.co/dKwnbnLD</title>
<content type="html">Photo: <em>DATAMINING</em> (via A Tour Of Cenovus’ Energy’s In-Situ Christina Lake Facility - Business Insider) <a href="http://t.co/dKwnbnLD">http://t.co/dKwnbnLD</a></content>
<updated>2012-05-21T22:53:36Z</updated>
<link type="image/png" href="http://a0.twimg.com/sticky/default_profile_images/default_profile_4_normal.png" rel="image"/>
<twitter:geo/>
<twitter:metadata>
<twitter:result_type>recent</twitter:result_type>
</twitter:metadata>
<twitter:source><a href="http://www.tumblr.com/" rel="nofollow">Tumblr</a></twitter:source>
<twitter:lang>en</twitter:lang>
<author>
<name>iamdanw_links (Robot Dan)</name>
<uri>http://twitter.com/iamdanw_links</uri>
</author>
</entry>
...
Constructing a query
The base of the search url is the character string "http://search.twitter.com/search.atom?"
In order to perform queries, you can use different search parameters and add them to the query string. The complete list of parameters is described in https://dev.twitter.com/docs/api/1/get/search
Some basic examples:
search term "datamining"
http://search.twitter.com/search.atom?q=datamining
search term "datamining" in spanish
http://search.twitter.com/search.atom?q=datamining&lang=es
search term "data mining OR analytics" and display 50 results per page
http://search.twitter.com/search.atom?q=data%20mining%20OR%20anaytics&rpp=50
© Gaston Sanchez - 2012