often find myself in discussions about the extent to which academics can and
should contribute to public (often meaning: mediatized) debates or even actual
consultation of policy-makers (at my current institution particularly triggered
by this, in the otherwise rather dozy
German Political Science Association hotly debated here). Thus the recently published 'F.A.Z.
Ökonomeneranking', which annually crowns the top 100 'most influential'
German-speaking economists, caught my interest: it aims to combine measures for both public
and academic impact.
essentially. In contrast to some of the arguments frequently invoked in the
discussions with my colleagues, the different 'spheres' of academic impact seem
hardly related. Neither is there evidence for a trade-off between scholarly impact
on the one hand, and public or political presence on the other (often heard by
those particularly successful on the former dimension). Nor is there evidence
that scholarly impact is a sort of necessary condition for influence in public
and political debates (often the lowest-common denominator agreement from a
somewhat more normative perspective).
In my opinion the outcome of the British referendum is wrong on so many accounts. Personally, some of my most formative European experiences took place on British soil. Good friends as well as dear colleagues live and work in the UK. The ‘Leave’ majority has upset their way of living and it shattered my beliefs about many future opportunities for me and my family. But beyond these personal motives, I think that the referendum outcome is wrong because many if not most contemporary societal challenges require a decent level of economic and political cooperation across national borders.
Don’t get me wrong, I am not one of those claiming that the EU constantly surfs the Pareto frontier distributing nothing but goodies for everybody who is smart enough to recognize them. Rather, my academic writing has hopefully helped to clarify that cooperation across borders comes with institutions that do make intrinsically political choices (see e.g. here, here or here). Like in national capitals, collective decision-making in Brussels will at times benefit some while disadvantaging others. Thus it is true that the sacked worker in Birmingham, the frustrated fisherman in Cornwall, or the NHS-dependent old lady in Brighton can put forward pretty rational arguments against key EU policies. And it is far from trivial to adequately weigh their justified scepticism against the equally justified EU defences that can be expected from the Oxford student, the City banker or the Polish shop owner.
In this situation – I have argued e.g. here, here and here – public and controversial debates about EU issues become increasingly likely. And I have also argued that this is not a bad thing per se. Moving supranational decision-making into the public spotlight gives also those groups a say that are by now not so well represented at the office and negotiation tables in Brussels. If their arguments and preferences become visible in public debates about European integration, it will be much clearer which kind of EU and which supranational policies are actually acceptable in the affected collectives.
But we should also not be naïve, politicisation will not automatically equal an enlightened debate. Indeed, most of the recent research emphasizes that EU politicisation often favours right-wing populists and can quickly deteriorate into a tyranny of the majority over well-balanced processes (good starting points are here, here and here). The UK referendum process we have just observed unfortunately supports this: particularly the ‘Leave’ campaigners successfully invoked diffuse cultural threats and addressed the gut feelings rather than the rational interests of those feeling left behind by the EU.
It did not matter whether the numbers on Boris’ bus were fact or fairy tale. A plummeting FTSE as well as revolting entrepreneurs and companies did not challenge the promise of more jobs and higher incomes in a UK outside of the EU. It did not matter that no ‘leave’ leader had to explain how the UK could maintain single market access without buying into the full set of rules. In sum, the ‘leave’ leaders actually saw no need to specify any meaningful economic or political alternatives. So while the Eurosceptic campaign successfully built on the not completely unjustified collywobbles of the worker, the fisherman and the old lady, it did not offer any substantial remedy or clue on how the lives of these people would improve without the EU.
So it is easy to blame demagoguery (or the half-hearted responses from the ‘remain’ campaign). But we cannot expect that self-righteous populists will take responsibility. Rather we should be reminded that voters are responsible, too – they took the ultimate decision. The lack of reasonable political alternatives for their country outside of the EU was apparently not a significant obstacle to voting ‘leave’. It seems, casting a simple protest vote against the political establishment in London and Brussels was motivation enough to make this fundamental choice.
This leaves us with actually very little to learn about the voters' sincere EU and policy preferences. But thus repeating or circumventing the referendum is equally wrong. If one buys that European integration has created some societal losers that will have to be compensated at some stage, if one buys that containing public EU politicisation is highly unlikely in this context, and if one buys that so far mainly irresponsible populists have gained from politicisation, any attempt to sneak around the referendum outcome perpetuates protest voting against the EU. In contrast, now accepting the majority vote in last weeks’ referendum sends a clear message to UK and EU citizens. It says: your choices do count, but this power comes with responsibility. It says: your vote is consequential, but not only for the political establishment but also for your very own life circumstances. This is not about punishing British voters for what I – personally – consider a wrong decision. Rather, it is about empowering them. In the light of equally irresponsible and populist platforms in various democracies, voters in Europe and beyond should be strongly encouraged to evaluate their options wisely.
So while I think that the BREXIT decision is wrong, I am deeply convinced that it should now be implemented quickly. We are in a game of poker with the Johnson, Hofer, and Wilders types, and if we don’t call their bluffs as early as possible, we risk losing the whole evening.
Alexandros Tokhi and Christian Rauh
Big Data has become a ubiquitous buzzword. Social networks, smartphones, and various online applications and websites constantly produce and provide information on an unprecedented scale and level of detail. Data storage is cheap and ever-improving analytical tools seem to herald a revolution in the way we understand the world. Some argue that this also fundamentally transforms science. Just crunching more data, it is assumed, allows us to take better decisions and solve social and political problems. So is the Big Data revolution the end of the social sciences as we know them?
We contend that it is not. As social scientists we are particularly well trained to know that observations alone – no matter how many of them there are – hardly lead to meaningful inferences without carefully specifying the underlying assumptions and constructing valid research designs. It is the combination between the principles of social scientific inquiry with novel methods of automated data generation and processing that bears the potential to generate new insights into socially relevant questions.
Don’t believe the hype – at least not all of it
The bold statements on the transformative power of the Big Data revolution are often rooted in rather naïve views on data analysis and largely unclear specifications of what Big Data actually is. One major pitfall is the belief – popular amongst computer scientists, Internet pundits and data journalists – that Big Data signals the end of theory. Corresponding arguments essentially equate Big Data with an N=all approach in which simple correlations reveal the ultimate truth about the world. For social scientists this clearly misses a crucial point: data is meaningful only with regards to a particular theory and we need to explicate the assumptions that drive our conclusions. Theory influences and even determines what we observe in a given data set, which aspects or phenomena we identify as relevant, and how we distinguish spurious correlations from causally meaningful relationships. Google’s search routines, for example, are built on the assumption that more incoming links indicate higher importance of a website. You might or might not buy into this assumption, but you should be aware of it when interpreting the results.
A related pitfall is reducing Big Data only to really, really large data sets. In the social sciences we judge the number (and the composition) of our observations relative to the population of interest. The Big Data hype usually invokes images of the billions of observations that e.g. social media produce. But individuals select themselves into these networks – do all of your friends use Twitter? – and if we ignore these processes we might quickly draw skewed conclusions. In other, socially equally relevant, areas – such as the political commitment to international treaties discussed below – a comparatively modest number of observations might already represent the universe of cases quite well. From a social science point of view, there is simply no absolute criterion for drawing the line between ‘big’ and ‘small’ data without having a specific social phenomenon in mind. And accordingly, big and small data are subject to the same analytical challenges when it comes to measurement validity or selection bias. The modern social sciences have developed an encompassing toolkit to draw valid inferences from observational data – and with the abundance of digitally available information this asset has become more rather than less important.
From our perspective, thus, Big Data does not put an end to the scientific method – but it holds significant potential to enhance it: What we indeed consider as ‘revolutionary’ is the ever expanding set of methods to automatically collect, process, and analyze digital information. In various research fields, these methods can help us in systematically analyzing only loosely structured data sources such as websites or text documents. Big Data, in other words, provides inexpensive and timesaving means to tap uncharted sources of empirical evidence. Combined with explicit theory and sound social scientific methods, these innovative and replicable means of data collection and analysis contribute to answering substantial questions.
Analyzing politics beyond the nation state with big data technologies
The study of international and supranational politics is probably not the first research area that comes to mind in the context of Big Data. Still, two examples from our own research are well suited to exemplify the potential of Big Data methods.
The first example concerns a central question in international relations research: do demanding legal obligations encourage or discourage the ratification of international treaties by states? Past scholarship has produced equivocal findings, because it analyzed only subsets of relevant treaties; as a result, the question remains unresolved.
The usual expectation is that states ignore demanding treaties in order to retain their freedom of action. We argue, however, that this decision varies with the substantive issue area a treaty covers. Where tougher treaty rules do commit oneself and especially other states to a specific course of action (e.g. reduction of air pollution, lower trade barriers), states should be more willing to sign onto more demanding obligations. When demanding treaty obligations mainly constrain one’s own freedom of action without binding others, more demanding obligations discourage ratification. The latter mechanism prevails in international human rights law, while the former applies in areas where inter-state cooperation is needed to provide common public goods (e.g. clean air, security).
Our research question and theory inform our data choices: we need to capture individual states’ willingness for treaty ratification, here measured by the time needed to take this step, to isolate issue areas and their treaties from each another, and to seize the entire variation in demanding treaty obligations therein. We are quickly confronted with several thousand observations if we consider each of the 193 states for about 80 treaties from human rights and environmental regulation over the past 50 years.
This calls for Big Data methods. Accordingly, we program and implement a webscraping algorithm in the programming language Python to automatically extract and reshape data from the United Nations Treaty Collection Database. Within 2,5 minutes we gather around 140,000 observations. Speed is, though, not the only advantage. To isolate demanding obligations, we exploit the fact that different types of treaties, which regulate the same set of rights, differ only in their degree of obligations. Framework conventions impose fewer obligations on states than their optional protocols. The Python algorithm automatically recognizes the type of treaty and then codes our demandingness indicator. In the statistical analysis of these data, we provide consistent evidence in support of our argument and resolve the question about demanding obligations.
The second example from our current research focuses on EU affairs in national parliaments. To the extent that parliaments provide publically visible debates about the EU, they might dampen democratic deficits of supranational policy-making. While some argue that the incentives to politicize EU issues have increased with each transfer of political competences to the supranational level, others claim that selective partisan incentives drive public emphasis of EU topics. Extant research restricts itself to selected parliamentary debates where explicit EU questions figure on the formal agenda – ignoring the fact that the EU provides constraints and opportunities almost over the entire range of domestic issues possibly discussed in parliament.
For a systematic evaluation of these competing claims, we need information on the parliamentary salience of EU affairs over time across various issue areas, different levels of EU authority, and different settings of domestic partisan competition. To meet these considerable informational requirements, we scrape all plenary protocols between 1991 and 2013 from the document server of the German Bundestag. Using our method of automated pattern recognition, we split these texts into more than 148,000 individual speeches by parliamentarians from all parties. Finally, we count all term-level references to the policies, politics, and polity of the EU in each speech with a dictionary-based text-mining algorithm implemented in the R environment.
The data reveal that the amount of EU references in the German Bundestag has indeed systematically increased with each consecutive revision of EU treaties. Our results are robust to the inclusion of partisan differences and other control variables. Whether this holds beyond the German case remains to be seen and we currently extend this data collection strategy to other parliaments in EU member states.
Big Data is what we make of it
Big Data alone is hardly a panacea for all challenges modern societies face. If no meaningful theory is leveraged to contextualize and make sense of the abundant information, we at best only stare at huge piles of numbers. At worst, we infer policy recommendations from spurious correlations and skewed samples. That’s why the social sciences should make their voice heard in the current debates. Big Data does not end theory. It rather highlights the need for social scientists to critically reflect and give meaning to the massive flows of digits.
But this also requires that the social sciences open up to Big Data technologies. First, we should be able to understand the assumptions that drive modern algorithms when assessing their societal implications. Second, we can make technologies such as webscraping, pattern recognition, and text mining part of our methodological toolkit in order to save time and other costs involved in procuring the information we need for the questions we consider relevant. Big data will not transform the social sciences, but we have a lot to add to and a lot to gain from the technologies that it offers.
This post originally appeared in the December 2015 issue of the WZB-Mitteilungen.
In a recently finished article to be published in European Union Politics I analyze the salience of EU affairs in 1,393 plenary debates of the German Bundestag between 1991 and 2013. The empirical analysis relies on the counts of term-level references to the polity, politics, and policies of the EU in the 148,869 individual MP statements made during this period.
These references were automatically retrieved based on an encompassing and rather flexible dictionary of references to the EU. It covers mentions of the overall supranational polity, the major institutional actors, as well as various supranational policies and policy instruments. By exploiting extended regular expressions as used in the R software package (PERL = FALSE), it also covers the gradual name change from the European Communities to the EU and reliably detects respective references independent of inflections, plurals or compound terms that occur in natural German language. For validation of the dictionary refer to the online appendix of the original article.
I believe that this dictionary can as well be fruitfully exploited in other text mining approaches that aim to study EU references in other contexts of German political speech. Possible areas of application include large scale newspaper corpora, blog posts or other online fora, as well as party manifestos or position papers of different political actors.
The dictionary is thus publically available in the article’s replication archive (see the Data and Resources section). Feel free to use it for any research purposes but please refer to the original article if you do so. If you require further details on the implementation of a respective tagging procedure in R, please do not hesitate to get in contact.