What can Twitter tell us about the real world?

News: Slides from tutorial are now online. A comprehensive list of references to follow soon. Tweet any comments using #TwitterAndRealWorld.


This half-day CIKM tutorial, held on October 28, presented a survey of recent work that analyzes data from Twitter to make quantitative statements about the “real world”, validated against some form of ground truth. Attendees were exposed to the problem studied, the methods applied and the data sets used. It covered four broad themes:

1. Economy: movie sales, consumer confidence, stock prices.
2. Politics: elections and political orientation.
3. Public health: detecting flu epidemics and understanding general well-being.
4. Event detection: detecting real events in online content, including traffic incidents and earthquakes.


We begin with predicting movie box-office revenues. Attention, as measured in mentions, retweets, and links to outside material, were shown to predict the financial success of movies in [Asur and Huberman, 2010], achieving Adjusted R2 of 0.97 on predicting the price of movie stocks at the Hollywood Stock Exchange. Real-world financial indices, including the Dow Jones Industrial Average (DJIA), Index of Consumer Sentiment and the Gallup’s “Economic Confidence” have been shown to correlate with tweet volume and sentiment by [O’Connor et al., 2010, Bollen et al., 2011]. For example, [Bollen et al., 2011] show the “calm” mood to be predictive of DJIA, and build a Self-organizing Fuzzy Neural Network that achieves 87.6% accuracy in predicting the direction of the DJIA. Similarly, [Zhang et al., 2011] find words related to anxiety, worry, and hope are highly predictive of major financial indices, including DIJA, NASDAQ, S&P500, and Chicago Board Options Exchange Volatility Index (VIX). Going beyond counting tweets, [Ruiz et al., 2012] represent the tweets during some time interval as an “interaction graph”, with the nodes being tweets, users, URLs, and hashtags. They find graph-based features such as PageRank effective for predicting the price of groups of stocks (financial indexes). Compared to other sources of data, such as news headlines and Google search volumes, Twitter volume of financial terms has the highest correlation with DIJA [Mao et al., 2011].


We begin with an overview of the methods for political leaning detection in Twitter, with the case of right- and left-leaning users in US being most prominent. In retweet networks, for instance, [Conover et al., 2011a] find two relatively homogenous clusters of users who preferentially propagate content within their own communities. The process starts with identifying hashtags exemplifying the discussions in each group [Conover et al., 2011b, Weber et al., 2013]. Not only are these hashtags useful in predicting users’ political orientation at 90% accuracy level, they are useful in identifying the latest trends in the discussions. Beyond tweet content, political leaning can be improved using user profiles, tweeting and following behavior, and the extended social network of the user [Pennacchiotti and Popescu, 2011]. These insights then can be used to monitor or predict larger-scale political change. We’ll discuss the predictive power of different user groups [Chen et al., 2012], and the controversial finding that tweet volume may correspond to the outcomes of elections [Tumasjan et al., 2010]. However, notable critiques of these approaches have lately shone light on the challenges of the election prediction task, including sentiment and volume tracking tool evaluation, baseline performance definition [Metaxas et al., 2011], and self-selection bias, tweet selection, and demographic skew [Gayo-Avello, 2012]. We conclude with a closer look at the manipulation of Twitter political discussion in order to present an altered view of national discourse, including the difference between the power users and their largely silent audience [Mustafaraj et al., 2011, Mejova et al., 2013], and how to spot nefarious groups of users trying to fake grass-roots movements [Ratkiewicz et al., 2011].

Public Health

Google Flu Trends [Ginsberg et al., 2008] marked the first visible example of how user-generated content in the form of query logs could be used to “nowcast” statistics related to real world phenomena such as flu epidemics. As web search logs are fairly difficult to get a hold off, researchers have built similar systems using Twitter data, which also comes with additional information such as user profiles and high resolution geo-locations. We will survey a variety of approaches which focus on different things such as statistical models, NLP techniques or the use of ontologies [Achrekar et al., 2011, Szomszor et al., 2012, Culotta, 2010, Lampos and Cristianini, 2010, Aramaki et al., 2011, Doan et al., 2012, Lamb et al., 2013, Lampos and Cristianini, 2012]. The reported precision of these methods rivals that of Google Flu Trends. Beyond flu epidemics, researchers have looked at generating hay fever maps [Takahashi et al., 2011], and localizing risk factors for illnesses such as allergies, obesity and even insomnia [Paul and Dredze, 2011]. The idea of quantifying the impact of external factors such as pollution or even the use of public transportation on health was explored in a recent work that used GPS-coded tweets from New York City [Sadilek and Kautz, 2013]. In a similar line of work, several researchers have tried to map “happiness” or, more generally, well-being in an attempt to identify offline factors that affect this variable [Schwartz et al., 2013, Quercia et al., 2012b, Quercia et al., 2012a, Mitchell et al., 2013].

Event Detection

Volume bursts on Twitter do not always correspond to real-world events and can, for example, be caused by a video going “viral”. How despite the sufficiency of volume-based signals, general real world events can be detected from the Twitter stream has been investigated in [Becker et al., 2011] and [Ritter et al., 2012]. Some researchers have looked at more specific event types such as unusual gatherings of people [Lee and Sumiya, 2010], individual show acts during a music festival [Packer et al., 2012] and traffic events such as accidents, traffic jams or roadblocks [Daly et al., 2013, Ribeiro Jr et al., 2012, Schulz and Ristoski, 2013]. Another particular type of event that researchers have concentrated on is earthquakes [Okazaki and Matsuo, 2011, Sakaki et al., 2010, Crooks et al., 2012, Robinson et al., 2013]. As with the other event types, the focus is not on detecting events that are hard to detect in principle (and earthquakes are quite exhaustively logged on a global scale) but, rather, to bring down the latency and the cost of the detection.

All papers to be presented go well beyond qualitative, anecdotal studies and most involve techniques from machine learning, information retrieval or natural language processing.