Microblogging today has become a very popular communication tool among Internet users. Millions of users share opinions on different aspects of their everyday life. Twitter, the most popular microblogging platform is a rich source of data for opinion mining and sentiment analysis. There have been some research works that were devoted to this topic using Twitter, however few of them has explored China’s microblogging field—Sina MicroBlog, also known as Weibo. In our project, we focus on developing a tool that automatically collects the public “tweets” from Weibo and employs the data processing techniques from Hadoop Platform (a leading large-scale data processing platform that enables parallel processing over commodity computers in local networks) to perform linguistic analysis of the collected data (Chinese characters) and try to explain discovered phenomena in a general sense. By analyzing the collected data, we are hoping to statistically reflect social behaviors towards some specific topic within a given period of time and summarize the trends of social response briefly by giving out the statistic table of data analysis (such as tweets-bonded user information or user-bonded tweets statistics) with respect to timeline. Through designated experimental process and evaluations, we expect to demonstrate that our proposed techniques are efficient and able to give out reliable data analysis results, which can be further developed and applied. In our project, we applied our techniques on Chinese posting with GBK and UTF-8 encodings. However, the proposed techniques can be applied to analyzing online posts in other languages.
See our project slides at Prezi：