Data and text processing

We manually collect from the central banks websites a data corpus consisting of 591 monetary policy minutes, with a total of 50,451 sentences, for the period between January 2007 and January 2022, for central banks in Czechia, Hungary, Poland, and Romania. The documents are labelled in the following format YYYY/MM/DD. The date corresponds to the publication of the document. The corpus of the minutes for the four central banks can be downloaded from here.

We randomly select a sample of sentences from the minutes' corpus. We manually annotate each sentence by considering the hawkishness, dovishness, and neutral stance of monetary policy. The final sample consists of 1,998 labeled sentences, representing approximately 4.0% of the total corpus sentences. The database can be downloaded from here.

Alternatively, the complete dataset is available by accessing the Harvard Database.

The following links provides some text processing analysing. Specifically, for the corpus of the minutes, we reveal the main topics, the readability scores, and the keyword co-occurrences.

Page updated

Google Sites

Report abuse