Overview

Overview

The Problem:

Because of the increased role social media play in explaining interactions online but even offline, and because these interactions leave a computer trail that contains very rich data that can potentially be exploited by scholars, they have the capacity to revolutionize the way we do research.

Yet, there are two main impediments to fully exploiting the potential: accessibility and usability. First, while information exists on social networks, it is not yet clear whether scholars can access all that information and how. Fortunately, several companies are currently developing tools and software that will eventually make it easier for their users (companies, scholars, individuals) to access online data. Second, even if the information is accessed, the sheer magnitude of the data makes it impossible for researchers to analyze them in a non-systematic, non-algorithmic, manners. On Twitter for example, one billion messages are tweeted every two days, and more than half a trillion tweets have been sent out in the seven years of Twitter’s life. While accessing this massive dataset is now more straightforward – and we have access to all half a trillion tweets in this project – extracting specific information from it is not.

The Need:

Sentiment analysis provides a way to overcome the usability challenge. The analysis uses text mining and fact-based techniques to categorize text on sentiment as positive, neutral, or negative. And in doing so, it aggregates up information on what people think about a particular topic.

While there is a vast literature on the methodologies, challenges, and applications of sentiment analysis going back several decades (see the review by Pang and Lee 2008), work on developing sentiment analysis tools for social media is very limited. Some promising work has been undertaken in the last few years, but there is still more scope for innovation in this area.

Unfortunately, the entirety of the work on sentiment analysis, be on traditional text or on social media, fails to consider information written in Arabic. And because (i) the structures of the English and Arabic languages differ, (ii) the Arabic language comprises of many dialects, and (iii) micro-blogging has some unique features that introduce further challenges, simply replicating the methodologies developed to produce sentiment analysis in English for Arabic is not sufficient.

The Qatar University Sentiment Analysis in Arabic (QUSAA) project:

We propose to develop a sentiment analysis tool for tweets in Arabic language. Our research will investigate how well existing approaches to sentiment analysis work for Arabic; will develop new techniques to improve the existing approaches; and will use these new techniques to develop a tool for analyzing sentiments expressed in Arabic, and especially in Arabic tweets.

This tool will then enable us to incorporate data from social media into our research. We have three such applications in mind. First, we will perform a comparative study on the perception of US policy in the Middle East by analyzing the sentiment of both English and Arabic tweets regarding specific events. Second, we will revisit the role social media played in defining the Arab Spring by incorporating in our research data from social media, along with the sentiment analysis tool that will enable us to study both English and Arabic chatter. Third, we will study how the use of social media can improve measures of country and political risk. Currently, these measures are compiled at an annual frequency using macro data, and as a result, they are slow to respond to changes in conditions on the ground.

In summary, our project comprises of two main stages: the development stage of the sentiment analysis tool and the application stage. In each stage, our work makes significant contributions.

Research Tasks:

1. Data collection

2. Corpus establishment

3. Lexicon and feature extraction

4. Classifier identification

5. Accuracy testing

6. Application I: US Middle East Policy Evaluation

7. Application II: Arab Spring

8. Application III: Country and Political Risk