Sentiment Analysis Judgment Dataset

INTRODUCTION

Job Title: Judge emotions about weather from Twitter

The CrowdFlower sentiment data asks the rater to judge the sentiment of a tweet discussing the weather. The data is comprised of 98,979 tweets. Each tweet was evaluated by at least 5 raters, for a total of approximately 500,000 answers. 

The possible answers are:

0 Negative
1 Neutral / author is just sharing information
2 Positive
3 Tweet not related to weather condition
4 I can't tell

For your analysis, we have included a few views into the data:

1) Basic Data
2) Full Data (no longer data)
3) Reference (Gold) Data
4) CrowdFlower Aggregated Data (OPTIONAL, see details below)


OBJECTIVE

Your objective is to generate the best possible answer for each question, based on the judgments of five or more raters per question. The “solution file” or reference data that we’ll compare your answers against is on Kaggle! Link to competition coming soon. 


BASIC DATA DESCRIPTION

This is the exact same data as the Full Data, but stripped down. It provides only:
- question (the question ID)
- rater (the rater ID)
- judgment (the rater’s answer, from 0 through 4)


FULL DATA DESCRIPTION

This is the firehose of data. It provides the following:
- question (the question ID)
- rater (the rater ID)
- judgment (the rater’s answer, from 0 through 4)
- tweet_text (the tweet content itself)
- country (the country of the rater)
- region (the region of the rater)
- city (the city of the rater)
- started_at (the timestamp of when the rater started working on a page of 15 tweets)
- created_at (the timestamp of when the rater finished working on a page of 15 tweets)


REFERENCE DATA DESCRIPTION

This is our reference data set – in this data set, you’ll see:
- question (the question ID)
- answer (the reference answer or “solution”, from 0 through 4)

You'll see that 300 of the questions are public to start. These are for you to tune your solution. We'll be evaluating your solution against a private set of 700 reference answers. 


CROWDFLOWER AGGREGATED DATA DESCRIPTION

CrowdFlower automatically turns “Full Data” with answers from multiple raters into a single predicted correct answer, or aggregated answer. Using CrowdFlower Aggregated Data is COMPLETELY OPTIONAL. If you’d like to learn more, visit the “CrowdFlower Aggregated Data” page.


ACKNOWLEDGMENTS

Huge thanks to researchers at the University of Minnesota for letting us publish this data. Thanks to Emma Ferneyhough at CrowdFlower for compiling the data.
ċ
cf-gold.zip
(7k)
Matt Lease,
Nov 30, 2016, 12:29 PM
ċ
cf-sentiment-basic.csv.bz2
(2461k)
Matt Lease,
Nov 30, 2016, 12:28 PM
Comments