Statistic Analysis Results

Weibo Statistic Analysis Results

We collected almost 2946612 tweets from Sina microblog in the time between March 19th and April 12th 2011. We selected some typical topics and build some line graphs and bar charts to analyze how topic impacts vary with location, economy and population. In the other hand we divide these tweets into several kinds. We choose three kinds topics, festival topics, commercial topics [Appendix C] and globe issue topics, to compare their difference in survival time.


1.    Topic Impacts

Fig 12: Tweets Quantities, GDP and Population of 9 cities

Data Retrieve GDP [1], Population [2]


The line graph above shows the quantity of tweets, GDP and population of the nine typical cities of large quantity of tweets. The blue and red lines represent the quantity of tweets we collected. We count these tweets in two ways. As we mentioned above, one is by database oracle, and the other is by wordcount in hadoop. The green line represents the GDP of these 9 cities in 2010. And the blue line represents the population. We can easily find that cities with higher GDP and larger population usually have a larger quantity of tweets, such as Beijing, Shanghai and Gurangzhou. The rest of cities also accord with this. In the graph, the tweets we collected form Urumuqi and LanZhou is the least, that’s because their low GDP and small population. Though Shanghai has the highest GDP and largest population but the tweets collected from Shanghai is less than Beijing and Guangzhou. That’s because Shanghai is one of the biggest port in the world, it has a big floating population. That directly explains why the quantity of tweets from shanghai is less than Beijing and Gurangzhou.

 2.    Survival Time

We divide these tweets into several kinds. To analyze their differences in survival time, we choose three kinds of topics, festival topics, commercial topics and globe issue topics, to analyze. We choose one or two typical topics from each kind of topics and build 
          2.1         Festival Topics

Fig 13: Tweets about April Fool’s Day


In this line chart above, we selected 1 typical festival topic. It shows the quantity of tweets about April fool’s Day we collected every day from March 19th to April 12th. We can see that before March 29th, this curve remains 0; it means almost nobody talks about April fool’s Day. Then the curve peaks on April 1st, right on the April fool’s day. This phenomenon happens almost on all festival topics. People pay lots of attention only on festival and 1 or 2 days before and after the festival. At the rest of time, almost nobody will talk about it. So we can conclude that the survival time of festival topics is usually very short. It lasts 2 or 3 days only.
          2.2   Globe Issue Topics

Fig 14: Tweets about Nuclear Leak

About Globe issue topics, we choose the nuclear leak on Japan as a typical instance. As we know the earthquake occurred on March 11th, then radioactive substances leaked. Due to the Hydrogen explosion and Radiation levels surge. More and more people pay attention about it. We can see the curve peaks on March 25. Because on that day someone said the reactor core may have been breached. After that day, it seemed everything was under control. Some people start to forget the nuclear leak until plutonium was detected. So that’s why the curve peaks on March 29th. Similarly, the curve peaks on March 7th and 11th. That’s because two aftershocks which magnitude over 7 occurred. People paid more attend on those two days. We figure it out that the globe issue topics, people will always keep an eye open about it. When some important events happen, people will suddenly be attracted. The survival time of globe issue topics is usually long.