Data

For the annotation process we first collected a dataset with 1,292,885 tweets published by 163 Chinese, 217 Russian, 283 EU’s and 314 US’ authorities between January 1st, 2020, and March 11th, 2021, coinciding this last date with the first anniversary of the declaration of the COVID-19 pandemic by the World Health Organization. These authorities included governmental accounts, embassies, ambassadors, and other diplomatic profiles such as consuls and missions in international organizations. Then we divided the dataset into two sections depending on the language of the tweets. For the dataset in Spanish 120,524 tweets that were in Spanish were selected, whereas for the dataset in English 704,609 tweets in English were considered.  


In both datasets, a further reduction of the sample was based on the strategic narratives’ theory (Miskimmon et al, 2013). According to these authors, international actors disseminate three types of narratives: identity narratives, that inform about the history, values and goals of a country; system narratives, that explain how the international system works, who the important actors are and how they are characterized; and issue narratives, that set events and policies in context exposing who the main characters are and how the matter will be resolved. We thus searched for tweets that contained different terms associated to each type of narrative.  


Finally, the dataset in Spanish was reduced to 9,591 tweets published by 135 authorities and distributed as follows: 


The final dataset in English consists of 14,747 tweets from 619 authorities, with the following distribution:


We split the data with a temporal criterion, choosing for each dataset the date that divides positive tweets in a 70/30 proportion, with the 70% subset being the oldest and the 30% subset the newest. The first will be the training set and the second the test set. Test data will be kept private, to prevent overfitting in post-campaign experiments.