The Causal Effect for Correcting

Flu Shot Misconceptions

The Problem (Can one use Twitter reply to correct misconceptions about flu shot?):

We try to measure the effectiveness of posting educational feedback as a reply to Tweets with misconceptions about flu shot. Specifically, this project tests whether linking to the CDC’s educational resource page on flu shots slows the spread of misconceptions that flu shots cause illness or the flu on Twitter. Public health officials could find the results of this experiment useful.

The Solution (Develop Tweet Bot to run randomized, controlled experiment and perform causal analysis.):

We used Tweeter API to collect Tweets and apply semantic analysis to select Tweets that have misconceptions of flu shot. The ROXO method (Randomize-Observe-Experiment-Observe) is adopted to our experiment. After blocking each set of searched and filtered tweets by block size as two (2) Tweets from the previous 24 hours, we randomly assign one of these two Tweets into the treatment group and the other to the control group. The treatment group will be replied with educational link for flu shot, and the control group will be replied with other link that is not related to flu shot. As soon as the replies were sent, the baseline measures of retweet and favorite counts were observed. We performed 10 daily administrations and collected 148 tweets/subjects. The causal analysis shows that there is a positive effect for Tweets from U.S. area, but a negative effect for Tweets from Non-U.S. area.

Comparing to output measures of Tweets like retweet count and favorite count, the new Tweets prepared by subjects can reflect more about what he/she accept or reject our CDC treatment. During our experimental procedure, however, it is not easy to collect such data due to time constraints or other natural restrictions. Different from textbook approach, we adopt Bayesian methods to provide us a natural framework for accounting for missing data without need to rely on ad hoc imputation. The basic idea is that we first collect other available variables, the tweets positive rate (ALL) generated by a subject from his/her ALL previous Tweets averaged by days. Here positive Tweets mean those Tweets contain pre-defined physical and mental positive related words, e.g., strong, happy, healthy. Since we discover that there is a linear model between the tweets positive rate (ALL) and tweets positive rate (Flu Shot) (positive Tweets also containing key words: flu shot), then we can apply Gibbs sampling to determine such linear model coefficients. After we have these model coefficients, we can get missing tweets positive rate (Flu Shot) for each subjects based on his/her tweets positive rate (ALL).

Project Website

For more details, check out the project's website here.

Important Figures


Experiment Design

API used for Tweet Bot

Location Effect: U.S. area v.s. Non-U.S area

Gibbs Sampling for Missing New Tweets