Publications

Abstract:

We investigate the predictive power behind the language of food on social media. We collect a corpus of over three million food-related posts from Twitter and demonstrate that many latent population characteristics can be directly predicted from this data: overweight rate, diabetes rate, political leaning, and home geographical location of authors.

For all tasks, our language-based models significantly outperform the majority-class baselines, and performance is further improved with more complex natural language processing, such as topic modeling. We analyze which textual features have most predictive power for thesedatasets, providing insight into the connections between the language of food, geographic locale, and community characteristics.

Lastly, we design and implement an online system for real-time query and visualization of the dataset. Visualization tools, such as geo-referenced heatmaps, semantics-preserving wordclouds and temporal histograms, allow us to discover more complex, global patterns driven

by the language of food.

Full Paper [PDF]

The other relevant papers for this research work are referenced below:

Towards Using Social Media to Identify Individuals at Risk for Preventable Chronic Illness

Detecting diabetes risk from social media activity

A Test of The Risk Perception Attitude Framework as a Message Tailoring Strategy to Promote Diabetes Screening