Datasets

We encourage (but not require) submissions describing experiments using the following datasets. Researchers are free to define their own relevant tasks using the datasets if they elect to use any of them.

GE2017: Tweets Dataset on the UK General Election 2017

GE2017 is a dataset of around 18M tweets collected between April 28th and June 8th 2017 on the British General Elections 2017. A set of 56 keywords related to GE2017 was used to collect tweets on the topic. The Twitter streaming API was used to retrieve tweets containing any of these keywords over the period of study. The keywords consist of hashtags, accounts, and terms representing phrases on the elections (e.g. #GE2017, general elections), politicians involved in the elections (Theresa May, Corbyn, #jc4pm), and related topics (e.g., Brexit, NHS). Due to the restrictions of tweets redistribution, we only share the tweet IDs of the dataset.

The dataset contains only original tweets (retweets are excluded). The format of the dataset is as follows:

tweet_ID     Date     list_of_matching_keywords

Each tweet ID is associated with the date of the tweet and the list of keywords in the streaming queries that match the tweet. The main idea of providing such information is to allow participants to work on only subsets or periods of the collection that focus on a certain topic instead of the need to downloading the whole collection.

A report paper on the dataset is found on arXiv: Cram L., R. Hill, C. Llewellyn, and W. Magdy. UK General Election 2017: a Twitter Analysis. 2017

Dataset could be downloaded from here.

USPresElect2016: Labelled Tweets Dataset on the US Presidential Elections 2016

USPresElect2016 is a dataset of 3,450 labelled tweets representing the top 50 most retweeted tweets on the US presidential elections 2016 for every day during the period from 1 Sep 2016 to 8 Nov 2016 (the election day). The total number of retweets for these 3,450 tweets are over 26M times.

Each tweet is labeled as: support/attack Trump/Clinton, or both, or neither (neutral).

Full description of the data can be found in this paper: Darwish K., W. Magdy, and T. Zanouda. Trump vs. Hillary: What went Viral during the 2016 US Presidential Election. SocInfo 2017.

Dataset could be downloaded from here.