by xml

Getting Data via XML getting data

R package XML (by Duncan Temple Lang)

Most of the times, the functions in the package twitteR for searching and getting data will be enough. However, there will be occasions in which we'll need to use other functions to parse the data via the twitter search API by using the functions in the R package XML.

Twitter Search API

I strongly suggest you to first take a look at the Twitter Search API documentation to understand how searches are run with this application

Check also the GET search documentation that contains the operators you can use to specify a query.

XML structure of a twitter search

In order to parse the data with the XML functions, you need to understand the XML structure delivered by a twitter search in atom standard.

Imagine we have the following search url http://search.twitter.com/search.atom?q=datamining

If you copy and paste the url on your browser, you should get something similar (but not identical) to the following code:

The code displayed in the previous image are the search results in XML format, and this is precisely what we want to get from twitter!

How to parse the data from R?

First we need to load the package XML

(remember to install it first!)

# load XML

library(XML)

# define base twitter search url (following the atom standard)

twitter_url = "http://search.twitter.com/search.atom?"

# create twitter search query to be parsed

twitter_search = paste(twitter_url, "q=datamining", sep="")

> twitter_search

[1] "http://search.twitter.com/search.atom?q=datamining"

# let's parse with xmlParseDoc

results = xmlParseDoc(twitter_search, asText=FALSE)

> results

<?xml version="1.0" encoding="UTF-8"?>

<feed xmlns:google="http://base.google.com/ns/1.0" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns="http://www.w3.org/2005/Atom" xmlns:twitter="http://api.twitter.com/" xmlns:georss="http://www.georss.org/georss" xml:lang="en-US">

<id>tag:search.twitter.com,2005:search/datamining</id>

<link type="text/html" href="http://search.twitter.com/search?q=datamining" rel="alternate"/>

<link type="application/atom+xml" href="http://search.twitter.com/search.atom?q=datamining" rel="self"/>

<title>datamining - Twitter Search</title>

<link type="application/opensearchdescription+xml" href="http://twitter.com/opensearch.xml" rel="search"/>

<link type="application/atom+xml" href="http://search.twitter.com/search.atom?since_id=204706537694953474&amp;q=datamining" rel="refresh"/>

<updated>2012-05-21T22:53:36Z</updated>

<openSearch:itemsPerPage>15</openSearch:itemsPerPage>

<link type="application/atom+xml" href="http://search.twitter.com/search.atom?page=2&amp;max_id=204706537694953474&amp;q=datamining" rel="next"/>

<entry>

<id>tag:search.twitter.com,2005:204706537694953474</id>

<published>2012-05-21T22:53:36Z</published>

<link type="text/html" href="http://twitter.com/iamdanw_links/statuses/204706537694953474" rel="alternate"/>

<title>Photo: DATAMINING (via A Tour Of Cenovus’ Energy’s In-Situ Christina Lake Facility - Business Insider) http://t.co/dKwnbnLD</title>

<content type="html">Photo: &lt;em&gt;DATAMINING&lt;/em&gt; (via A Tour Of Cenovus’ Energy’s In-Situ Christina Lake Facility - Business Insider) &lt;a href="http://t.co/dKwnbnLD"&gt;http://t.co/dKwnbnLD&lt;/a&gt;</content>

<updated>2012-05-21T22:53:36Z</updated>

<link type="image/png" href="http://a0.twimg.com/sticky/default_profile_images/default_profile_4_normal.png" rel="image"/>

<twitter:geo/>

<twitter:metadata>

<twitter:result_type>recent</twitter:result_type>

</twitter:metadata>

<twitter:source>&lt;a href="http://www.tumblr.com/" rel="nofollow"&gt;Tumblr&lt;/a&gt;</twitter:source>

<twitter:lang>en</twitter:lang>

<author>

<name>iamdanw_links (Robot Dan)</name>

<uri>http://twitter.com/iamdanw_links</uri>

</author>

</entry>

...

Constructing a query

The base of the search url is the character string "http://search.twitter.com/search.atom?"

In order to perform queries, you can use different search parameters and add them to the query string. The complete list of parameters is described in https://dev.twitter.com/docs/api/1/get/search

Some basic examples:

search term "datamining"

http://search.twitter.com/search.atom?q=datamining

search term "datamining" in spanish

http://search.twitter.com/search.atom?q=datamining&lang=es

search term "data mining OR analytics" and display 50 results per page

http://search.twitter.com/search.atom?q=data%20mining%20OR%20anaytics&rpp=50

© Gaston Sanchez - 2012