Research Leader

Yelena Mejova is a Research Leader at the ISI Foundation in Turin, Italy, a part of the Digital Epidemiology Group. Her research concerns the use of social media in health informatics, especially in lifestyle diseases, as well as for tracking political speech and other cultural phenomena. Previously as a scientist at the Qatar Computing Research Institute, Yelena was a part of the Social Computing Group working on computational social science, especially as applied to tracking real-life health signals. As a post-doc at Yahoo Research Barcelona, she was a part of Web Mining and User Engagement groups, working with Mounia Lalmas on the Linguistically Motivated Semantic Aggregation Engines project.

linkedIn | googleScholar | dblp | twitter | yelena.mejova@gmail.com | CV | Publications

Dec 15, 2019
Editing Frontiers special issue Responsible Big Data Solutions for Public Health, submit abstracts by Feb 29!
Jan 29, 2019
Teaching at the Summer School Series on Methods for Computational Social Science in Berlin on Day 2 of the school on using social media for studying lifestyle health.
Oct 08, 2018
Aug 01, 2018
Watch out for my invited talk at EPFL and a keynote at the Conference on Complex Systems at the end of September.
Mar 05, 2018
Will be giving a keynote at the Workshop on Online Social Networks and Media: Network Properties and Dynamics at the WebConference.
Feb 02, 2018
Co-chairing the course program at Russian Summer School in Information Retrieval (RuSSIR). Special topic is IR for Good. Submit Course Proposals by March 12!
Jan 30, 2018
Co-organizing a Workshop on Social Media for Health with focus on linking online and offline data at ICWSM'18 in Stanford on June 25. Submit your work!
Jul 20, 2017
Invited to speak at International Conference on Complex Networks CompleNet taking place in Boston, MA on March 5-8, 2018.


Facebook Ads as a Demographic Tool to Measure the Urban-Rural Divide
Daniele Rama, Kyriaki Kalimeri, Michele Tizzoni (ISI Foundation)
Ingmar Weber (QCRI)


We examine the usefulness of the Facebook Advertising platform, which offers a digital "census" of over two billions of its users, in measuring potential rural-urban inequalities. We focus on Italy, a country where about 30% of the population lives in rural areas. First, we show that the population statistics that Facebook produces suffer from instability across time and incomplete coverage of sparsely populated municipalities. To overcome such limitation, we propose an alternative methodology for estimating Facebook Ads audiences that nearly triples the coverage of the rural municipalities. The findings of this study illustrate the necessity of improving existing tools and methodologies to include under-represented populations in digital demographic studies.


Effect of Values and Technology Use on Exercise
Kyriaki Kalimeri (ISI Foundation)

UMAP'19 [Best Paper] 

In this study, we present a unique demographically representative dataset of 15k US residents that combines technology use logs with surveys on moral views, human values, and emotional contagion. Combining these data, we provide a holistic view of individuals to model their physical exercise behavior. We show which values determine the adoption of Health & Fitness mobile applications, we then achieve a weighted AUROC of .673 in predicting whether individual exercises, and find a strong link of exercise to respondent socioeconomic status, as well as the value of happiness. Informed by these findings, we propose actionable design guidelines for persuasive technologies targeting health behavior modification


Fake Cures: 
User-centric Modeling of Health Misinformation in Social Media
Amira Ghenai (University of Waterloo)


This work examines the individuals on social media that are posting questionable health-related information, and in particular promoting cancer treatments which have been shown to be ineffective (making it a kind of misinformation, willful or not). Using a multi-stage user selection process, we study 4,212 Twitter users who have posted about one of 139 such "treatments", and compare them to a baseline of users generally interested in cancer. Considering features capturing user attributes, writing style, and sentiment, we build a classifier to identify users prone to propagate such misinformation, providing a potential tool for public health officials to identify such individuals for preventive intervention.

Information Sources and Needs in the Obesity and Diabetes Twitter Discourse

Digital Health'18 | slides

We examine 1.5 million tweets mentioning obesity and diabetes in order to assess (1) the quality of information circulating in this conversation, as well as (2) the behavior and information needs of the users engaged in it. The analysis of top cited domains shows a strong presence of health information sources which are not affiliated with a governmental or academic institution. On the user side, we estimate over a quarter of non-informational obesity discourse to contain fat-shaming. We also find a great diversity in questions asked in these datasets, spanning definition of obesity as a disease, social norms, and governmental policies.


Online Health Monitoring using Facebook Advertisement Audience Estimates in the United States
Ingmar Weber (QCRI)
Luis Fernandez-Luque (QCRI)

JMIR Public Health and Surveillance Vol 4, No 1, 2018

We use the Facebook Marketing API to correlate estimated sizes of audiences having health-related interests with public health data. Using several study cases, we identify both potential benefits and challenges in using this tool. We introduce the use of placebo interest estimates to control for background level of user activity on the platform. Some Facebook interests such as plus-size clothing show encouraging levels of correlation (r=.74) across the 50 US states; however, we also sometimes find substantial correlations with the placebo interests such as r=.68 between interest in Technology and Obesity prevalence.


#Halal on Social Media: Religion, Commerce, Health
Benkhedda Youcef (Ecole Nationale Supérieure d’Informatique, Algeria)
Khairani (University of Indonesia)

Frontiers in Digital Humanities: Big Data'17 | Gulf Times | Khazanah

Halal is a religious term, a cultural staple, and huge market. Here, we investigate the meaning of halal in three global communities speaking Arabic, Bahasa Indonesian, and English. All three have a unique perception of this concept, identifying it more with trade, food, or cosmetics. Showing a complicated relationship with both religious and governmental authority, the concept of halal has its own life on the social media, redefining its traditional and market space.

Tracking Health Misinformation on Twitter: case of Zika
Amira Ghenai (University of Waterloo)


Misinformation and rumors in the health domain may not only cause inconvenience, but may increase medical care costs and even lead to the loss of life. Here, we build a pipeline for tracking Zika misinformation during the first half of 2016 when its incidence spiked in South America, incorporating crowdsourcing with machine learning.

Using Facebook Ads Audiences for Global Lifestyle Disease Surveillance
Ingmar Weber (QCRI)
Matheus Araújo (Federal University of Minas Gerais)
Fabricio Benevenuto (Federal University of Minas Gerais)

ICWSM'17 | WebSci'17 | Demo

In this series of studies we explore the use of demographically rich Facebook Ads audience estimates for tracking non-communicable diseases around the world, and especially in the Middle East. We compute the audiences of health "marker" interests, and evaluate their potential in tracking health conditions associated with lifestyle-related health conditions associated with tobacco use, obesity, and diabetes, as well as compare these to the performance of "placebo" interests.


Revisiting the American Voter
Huyen Le, Bob Boynton, Zubair Shafiq, Padmini Srinivasan (University of Iowa)

CHI'17 | HT'17

The American Voter – a seminal work in political science – uncovered the multifaceted nature of voting behavior which has been corroborated in electoral research for decades since. In this work, we leverage The American Voter as an analysis framework in the realm of computational political science, employing the factors of party, personality, and policy to structure the analysis of public discourse on online social media.

Kissing Cuisines: Exploring Worldwide Culinary Habits on the Web
Sina Sajadmanesh, Sina Jafarzadeh, Seyed Ali Ossia, Hamid R. Rabiee, Hamed Haddadi,
Yelena Mejova, Mirco Musolesi, Emiliano De Cristofaro, Gianluca Stringhini

A large-scale study of recipes published on the Web and their content. Using a database of more than 157K recipes from over 200 different cuisines, we analyze ingredients, flavors, and nutritional values which distinguish dishes from different regions, and use this knowledge to assess the predictability of recipes from different cuisines. We then use country health statistics to understand the relation between these factors and health indicators of different nations, such as obesity, diabetes, migration, and health expenditure.

Cultural Pluralism: case of Charlie Hebdo
Jisun An (QCRI)
Haewoon Kwak (QCRI)
Sonia Alonso Saenz De Oger (Georgetown University, Qatar)
Braulio Gomez Fortes (Deusto University)


We ask whether the stances on the issue of freedom of speech can be modeled using established sociological theories, including Huntington’s culturalist Clash of Civilizations, and those taking into consideration social context, including Density and Interdependence theories. At an individual level, we find social context to play a significant role, with non-Arabs living in Arab countries using #JeSuisAhmed (“I am Ahmed”) five times more often when they are embedded in a mixed Arab/non-Arab (mention) network.

Privacy and Twitter in Qatar: Traditional Values in the Digital World
Norah Abokhodair (University of Washington)
Sofiane Abbar (QCRI)
Sarah Vieweg (QCRI)


We explore the meaning of "privacy" from the perspective of Qatari nationals as it manifests in digital environments. Our mixed-methods analysis of 18K Twitter posts that mention "privacy" focuses on the online and offline contexts in which privacy is mentioned, and how those contexts lead to varied ideologies regarding privacy.

Crowdsourcing Health Labels
Ingmar Weber (QCRI)

Digital Health'16

Is it feasible to use profile pictures to infer a user's health, such as weight? We show that this is indeed possible and further show that the fraction of labeled-as-overweight users is higher in U.S. counties with higher obesity rates. As obesity-related conditions such as diabetes, heart disease, osteoarthritis, and even cancer are on the rise, this obese-or-not label could be one of the most useful for studies in public health.

#Foodporn and Health Around the World
Sofiane Abbar (QCRI)
Hamed Haddadi (Queen Mary University of London)

How is food redefined in social media? Does food fetishizing via a plethora of images shifting our understanding of food? Our international study of #foodporn hashtag on Instagram shows the obsession with chocolate and sweets, but also reveals a tendency of the communities to like and comment more on healthier content, suggesting a new avenue for healthy lifestyle interventions.

Health in Qatar
Hamed Haddadi (Queen Mary University of London)
Ingmar Weber (QCRI)
Sofiane Abbar (QCRI)
Azadeh Ghahghaei (Freie Universitat Berlin)

Using a near-complete dataset of Instagram checkins in Qatar, we examine the behavior of Arabic- and English-speaking populations. We find behavior changes around major religious holidays, including Ramadan, which affects the dietary patterns of this highly diverse country.

View on Obesity through Instagram
Ingmar Weber (QCRI)
Hamed Haddadi (Queen Mary University of London)
Anastasios Noulas (University of Cambridge)

Using millions of Instagram posts in locations all over US, we examine the social media signals surrounding obesity. Our analysis reveals a relationship between small businesses and local foods with better dietary health, yet a tendency of social media users to reinforce unhealthy dietary habits through likes and comments, with donuts and cupcakes being the most "liked" foods.

Twitter: A Digital Socioscope
Ingmar Weber (QCRI)
Michael Macy (Cornell)

This book surveys how to use Twitter data to study human behavior and social interaction on a global scale. It is a reference for behavioral and social scientists who want to explore the use of online data in their research, and for non-professionals that follow the social impact of new technologies.

Relating Social Media Users to Little-known Content
Ingmar Weber (QCRI)
Javier Borge-Holthoefer (QCRI)

Long-tail content -- news stories, music, even people -- may need a little more help in order for people to notice them. We combine the notions of serendipity and explainability to build "bridges" between content and users, utilizing high-quality knowledge bases as well as users' interests profiles, as estimated using their social media presence.

Controversy and Sentiment in News
Carlos Castillo (QCRI)
Nicholas Diakopoulos (University of Maryland)
Amy X. Zhang (MIT CSAIL)

In the news: CrowdFlower  Source

Using lexical resources, such as those on sentiment and bias, we explore the use of emotional language around controversial topics by mainstream news agencies. Our aim is to eventually detect these controversial topics and to automatically find the sides of the discussion.

Monitoring Dietary Health via Social Media
Ingmar Weber (QCRI)
Sofiane Abbar (QCRI)
Hamed Haddadi (QCRI)

Using online check-ins and posts related to food, we track diet-related diseases like obesity and diabetes, and relate the perceptions of food to the demographics of the individuals. Social media also gives us an opportunity to explore the relationship between social connections and dietary habits -- indeed, there seem to be a connection between your social-media-detected diet and that of your friends.

Linguistically Motivated Semantic Aggregation Engines (LiMoSINe)
Ilaria Bordino (Yahoo Labs Barcelona)
Mounia Lalmas (Yahoo London)
Olivier Van Laere (Yahoo Labs Barcelona)
Byungkyu Kang (UC Santa Barbara)

As a part of the LiMoSINe EU project, we are building search engines based on semantics found in large document collections. These semantics include entities, sentiment expressed about them, the quality of writing about them, their topical categorization, and, of course, the relationships between these entities. Our faceted search prototype allows the user to explore the web of entities, adjusting the data views along various metadata attributes.

Understanding Donation Behavior through Email
Ingmar Weber (Qatar Computing Research Institute)
Venkata Rama Kiran Garimella (Aalto University)
Michael C Dougal (UC Berkeley)

We analyze a two-month anonymized email log from several perspectives motivated by past studies on charitable giving: (i) demographics, (ii) user interest, (iii) external time-related factors and (iv) social network influence. We show that email captures the demographic peculiarities of different interest groups, for instance, predicting demographic distributions found in the US 2012 Presidential Election exit polls. We show the importance of the social connection in predicting whether an individual donates, showing that, although annoying to the most of us, email campaigns can be effective.

Economic, Social, and Cultural Boundaries in International Communication
Ruth Garcia (Universitat Pompeu Fabra Barcelona)
Daniele Quercia (Yahoo Labs Barcelona)

In this study we show that the international Twitter communication landscape is not only still largely predetermined by physical distance, but that it also depends on countries' social, economic, and cultural attributes. This communication, as measured using @mentions, is correlated (r = 0.68) with the Gravity Model, which hypothesizes that the flow between two areas is proportional to their masses and inversely proportional to the distance between them. Our final model, which takes into consideration income, trade share, migration, language, and Hofstede's cultural variables, achieves an Adjusted R2 of 0.80.

Political Sentiment Classification & Tracking
Padmini Srinivasan (Computer Science, University of Iowa)
Bob Boynton (Political Science, University of Iowa)

Understanding the nature of political discourse on social media allows us to gauge the motivations of its constituent voices and representativeness of its message. A thorough evaluation of sentiment classification algorithms applied to political writings shows this to be a non-trivial task. 

Content Reuse in an Organization
Klaar De Schepper (Columbia University)
Lawrence Bergman (IBM T.J. Watson Research Center)
Jie Lu (IBM T.J. Watson Research Center)

CHI'11 [Honorable Mention]

How important is it for an organization to keep track of the content it generates? Turns out very much so. We track content reuse across a collection of slideshow presentations, modeling the flow of information within the organization. Our ethnographic survey and interviewing effort resulted in a set of guidelines for a content management system that supports modular content search, team management, and provenance tracking. 

Event Tracking in Social Media
Viet Ha-Thuc (University of Iowa)
Padmini Srinivasan (University of Iowa)

Large-scale analysis of social media such as blogs allows us a glimpse into the mind of a large segment of Earth's population. This record of people's thoughts can be leveraged to track significant events and discussion about them. Using Viet Ha-Thuc's new topic modeling approach we were able to track major news events over a period of time.