Trang chủ‎ > ‎IT‎ > ‎Data Mining‎ > ‎Online Social Network Analysis‎ > ‎

Collect Twitter Data for Analysis

There are three ways to access Twitter data: 1) Twitter’s Search API, 2) Twitter’s Streaming API, and 3) Twitter’s Firehose.

Search API gives access to tweets that already exist and is limited to the last 5,000 tweets per search criteria.

Streaming API allows to get a sample of tweets as they occur and push them to the user based on a set of search criteria. However, the sample provided by streaming API is at most 1% of the entire traffic and not randomized. Therefore, the data is not statistically representative.

Twitter Firehose is a paid service provided by authorized resellers of Twitter data. One of those resellers is GNIP. It allows access to 100% of the tweets that match your search criteria. It is critical to have such a full access if you want your study to be statistically representative. The price depends on your search criteria and the amount of tweets you need. You can ask for a quote but it is not going to be cheap.

However, a few simple PHP scripts may get you relatively large amount of relevant tweets in a specific period of time for free. For example, you can write a PHP script to find out a number of historical tweets based on a set of keywords you identify. Then you can expand your dataset by mining tweets from the posters of the tweets in your first dataset based on the same keywords and so on. For one of my projects, I started with 2,000 tweets and ended up having ~250k unique tweets posted during June 2013 about a particular protest. This might work for you depending on what type of Tweets you need and whether you can use keywords, users or any other particular identifier to construct your dataset.


Twitter has 2 kinds of APIs : RESTful API, Stream API. If you want to analyse the current happenings on twitter then Stream API should be used. It needs to have persistent HTTP connections open. Any update on twitter will instantly be reflected on you application as well. 

Apart from collecting tweets of a particular user, you can collect followers of someone, trending topics on twitter as well as collect data that was generated a week ago. The links given are a good tutorial for a beginner to start on his own. 

Link for using the twitter api in python

Link for understanding twitter apis

Twitter Streaming API:


To collect data from Twitter you can use the 

I recommend the book "Mining the social web" (available as e-book). Here is a free chapter on Twitter:

All the source code is also available on gitHub:

code for Facebook:

code for Twitter REST API:


Mining Twitter Data with Python

Other tools
Tweetbinder and Topsy for easy results

NodeXL, a user-friendly data crawling tool. Number of tweets you collect each time cannot go over 18,000

For academic purposes, use Socialbakers and I find it very useful. It offers especially in-depth statistics for Facebook pages, but also Twitter, Google+ and other popular social media sites.

Open source twitter crawler project