Task 1 Data

 

Dataset


The provided task corpus comprises tweets in Spanish and English from diplomats representing four international actors: China, Russia, the United States, and the European Union. These authorities include government accounts, embassies, ambassadors, and other diplomatic profiles such as consuls and missions.   


Task 1:  propaganda identification and characterization


Task 1 of DIPROMATS 2024 encompasses two annotated datasets, one composed of tweets in English and another one of tweets in Spanish. The tweets, which were collected through the Twitter API for Academic Research, were published between January 1st, 2020 and March 11th, 2021, coinciding this last day with the first anniversary of the declaration of the COVID-19 pandemic. 


The dataset in Spanish includes 9,591 tweets published by 135 authorities and distributed as follows: 


The English dataset contains 12,012 tweets from 619 authorities, with the following distribution:


We split the data with a temporal criterion, choosing for each dataset the date that divides positive tweets in a 70/30 proportion, with the 70% subset being the oldest and the 30% subset the newest. The first will be the training set and the second the test set. Test data will be kept private, to prevent overfitting in post-campaign experiments.