A German term-level dictionary of EU references for automated text analysis

Post date: Sep 9, 2014 8:34:29 PM

In a recently finished article to be published in European Union Politics I analyze the salience of EU affairs in 1,393 plenary debates of the German Bundestag between 1991 and 2013. The empirical analysis relies on the counts of term-level references to the polity, politics, and policies of the EU in the 148,869 individual MP statements made during this period.

These references were automatically retrieved based on an encompassing and rather flexible dictionary of references to the EU. It covers mentions of the overall supranational polity, the major institutional actors, as well as various supranational policies and policy instruments. By exploiting extended regular expressions as used in the R software package (PERL = FALSE), it also covers the gradual name change from the European Communities to the EU and reliably detects respective references independent of inflections, plurals or compound terms that occur in natural German language. For validation of the dictionary refer to the online appendix of the original article.

I believe that this dictionary can as well be fruitfully exploited in other text mining approaches that aim to study EU references in other contexts of German political speech. Possible areas of application include large scale newspaper corpora, blog posts or other online fora, as well as party manifestos or position papers of different political actors.

The dictionary is thus publically available in the article’s replication archive (see the Data and Resources section). Feel free to use it for any research purposes but please refer to the original article if you do so. If you require further details on the implementation of a respective tagging procedure in R, please do not hesitate to get in contact.