Introduction

Microblogging is a broadcast medium in the form of blogging. Compared with traditional blogging, microblogging fulfills a need for an even shorter and faster mode of communication. And it "allows users to exchange small elements of content such as short sentences, individual images, or video links" . Now millions of person broadcast and share information about their activities, opinions and status on a microblogging platform. It has become an important part of our social lives. One of the most popular microblogging platforms is twitter. It was created in March 2006 by Jack Dorseyand launched in July. Since then Twitter has gained popularity worldwide. And now it estimated to have 200 million users . It is a rich source of data for opinions and sentiments. Such data can be efficiently used for marketing or social studies. In fact there have been some research works that were devoted to this topic using Twitter. However few of them have explored China’s microblogging field­. So we select a China’s twitter --- “Sina Weibo” as our analysis object. In this report, we will introduce our tool which can automatically collect data from Sina Weibo and store these data in Oracle database. Then mining sort through the data we have collected and turns up interesting and useful connection. The dataset used in this study was created by monitoring this public timeline for a period of half month starting from March19, 2011 to April 12, 2011. 20 recent updates were fetched once every 10 seconds. There are a total of 2946612 posts from distinct users in this collection. Meanwhile we built a Hadoop platform to process this huge dataset efficiently.

Comments