Quantifying Qualitative Data for Understanding Controversial Issues

Controversy is a state of sustained public debate on a topic or issue that evokes conflicting opinions, beliefs, claims, arguments, and points of view. Here, we will refer to utterances of opinion, belief, claim, argument, or point of view relevant to an issue as assertions. People make assertions on a controversy (or controversial issue) both in the physical world and on social media. Others might agree or disagree with the assertions. An individual’s position on an issue is not simply a binary support-or-oppose stance on the issue, but rather a cumulative sum of many nuanced beliefs and opinions on various aspects of the issue. Thus assertions are a useful means of capturing one’s position on a controversial issue. Examples of common controversial issues include the legalization of marijuana, government policy on refugees, and gun rights.

Here, we present a dataset that allows to examine how complex positions on controversial issues are formed by relevant assertions. The data was created by combining qualitative and quantitative surveys that were launched on the crowdsourcing platform crowdflower. Thereby we first create a comprehensive and nuanced list of assertions relevant to an issue, and then rank these assertions by both the degree of support and the degree of agreement.

Figure 1 gives an overview on how we create a dataset that allows us to quantify qualitative information relevant to controversial issues.

Here you can find the complete list of issues, explanations and example assertions (4 per issue).

  • First use a crowdsourcing task in which we collectively brainstorm relevant assertions on an issue and demographic variables that describes relevant stakeholders. The survey for this generation can be found here.
  • Subsequently, we use crowdsourcing task in which we ask people whether they personally agree or disagree to these assertions, and how important the assertion is for their overall stance on the issue. The whole survey for rating agreement and importance of these assertions can be found here.
  • In addition, we collect demographic data from these contributors. The survey used for this collection can be found here.

We also link the created data to a large collection of Tweets that were obtained by searching for appropriate hashtags. This allows us to define several natural language processing tasks that target an automated construction of the survey data from the collected Tweets. The list of used hashtags can be found here.

This a joint research project between the National Research Council Canada (NRC) and the Language Technology Lab of the University of Duisburg-Essen (LTL).

The following persons are involved:


This work is licensed under a Creative Commons Attribution-Non Commercial-No Derivatives 4.0 International License (CC BY-NC-ND 4.0)