Dataset
The EmotionGIF 2020 Challenge has ended.
The big winner of both Rounds, with a total of participating teams, is Team Mojitok of Platfarm Inc., South Korea, with MR@6 of 0.6247 (Round 1) and 0.6255 (Round 2). Congratulations!
The shared task includes a first-of-its-kind dataset of 40,000 two-turn Twitter threads: the original tweet, and the response tweet (which includes an animated GIF).
See the example on the right: provided is the original tweet ("Tomorrow looks like...") with response tweet. The response tweets include both a reply text ("Hell yeah") and an animated GIF, which in this case belongs to the "applause" category.
The dataset is split into the following three files:
- train_labeled.json: 32,000 samples with gold labels, to be used for training the model
- dev_unlabeled.json: 4,000 unlabeled samples used for practice
- eval_unlabeled.json: 4,000 unlabeled samples used for final evaluation. The results for this file will determine the winners.
The files are all in JSON Lines format (each line in the file is a JSON value, representing one sample).
Each sample includes the text of the original tweet, and information about the response: the reply text and the category(ies) of the GIF response in the tweet.
Here is the description of the JSON fields in each sample:
- idx: running index of the samples (0 tp 39999)
- text: the text of the original tweet; may include mentions (@user), hashtags (#example), emojis etc. The dataset contains text-only tweets; tweets which contain links have been filtered out.
- reply: the text content of the response tweet. In cases where the reply only contained a GIF response, this field will be an empty string ("reply": "").
The training data also contains the gold label:
- categories: The GIF category (or categories) of the GIF response which was included in the reply tweet, from a list of 43 categories. This field contains between 1 and 6 categories.
In addition, the training data also contains the following:
- mp4: the file name of the MP4-version of the animated GIF response; a ZIP file with all MP4s is available for download. The MP4 files are provided for completeness only. We do not expect that the participants will use the video files as part of their model features. Download the ZIP file.
Here is a sample line:
{"idx": 32, "text": "Fell right under my trap", "reply": "Ouch!", "categories": ['awww','yes','oops'], "mp4": "fe6ec1cd04cd009f3a5975e2d288ff82.mp4"}
In the example above, the text of the original tweet is "Fell right under my trap", and the response included the text "Ouch!" as well a as animated GIF belonging to the following categories: awww, yes, and oops. The GIF is available at file the MP4 file fe6ec1cd04cd009f3a5975e2d288ff82.mp4.
Development and Evaluation datasets
The development and evaluation datasets are also in JSON Lines format, identical to the training data, but the categories and mp4 fields are missing. You will need to predict the GIF's categories by adding the categories field in your submission files. Read about the submission format.