Accepted Papers

We are excited to see the following papers accepted for our workshop at WebSci:
  • Sanja Štajner, Nicole Baerg, Simone Paolo Ponzetto and Heiner Stuckenschmidt. Automatic Detection of Speculation in Policy Statements.
    In this paper, we present the first study of automatic detection of speculative sentences in official monetary policy statements. We build two expert-annotated datasets. The first contains the transcripts of monetary policy meetings on the U.S. central bank’s monetary policy committee (Debates). The second contains the official monetary policy statements (Decisions). We use the first part of the Debates dataset to build dictionaries with lexical triggers for speculative and non-speculative sentences. We then test their performance on an in-domain test set (the second part of the same dataset) and on an out-of-domain test set (the Decisions dataset) using several rule-based and machine learning classifiers. Our best classifiers achieve an accuracy of 82.5% (0.70 F-score on the speculative class), comparable with automatic detection of speculative sentences in Wikipedia articles.

  • Federico Nanni, Simone Paolo Ponzetto and Laura Dietz. Entity Relatedness for Retrospective Analyses of Global Events.
    Tracking global events through time would ease many diachronic analyses which are currently carried out manually by social scientists and humanities scholars. While entity linking algorithms can be adapted to identify mentions of an event that goes by a common name, such name is often not established in early stages leading up to the event. This study evaluates the utility of entity relatedness for the task of identifying entities related to the event and  textual resources that describe the involvement of the entity in the event. In a small study we find that simple relatedness methods obtain MAP score of 0.74, outperforming many advanced baseline systems such as Stics and Wiki2Vec. A small adaptation of this method provides sufficient explanations of entity involvement on 68% of relevant entities.

  • Asmelash Teka Hadgu, Netaya Lotze and Robert Jäschke. Telling English Tweets Apart: the Case of US, GB, AU.
    Current state-of-the-art general-purpose language identification systems do not distinguish texts among different national varieties. However, this is an important NLP task that has far reaching implications for instance in building machine translation systems that are adapted to the language variety of interest. In this paper, we study how to automatically tell different varieties of English on Twitter apart by taking samples from American (US), British (GB) and Australian (AU) English. We track cities and apply filters to generate ground-truth data. Subsequently, we perform expert evaluation to get a sense of the difficulty of the task. We then cast the problem as a classification task: given a tweet (or a set of tweets from a user) in English, the goal is to automatically identify whether the tweet (or set of tweets) is US, GB or AU English. We perform experiments to compare some linguistic features against simple statistical features and show that character n-grams are quite effective for the task. Our work is closely related to socio-linguistics, especially
    research on diatopic varieties, linguistic landscapes, and World Englishes.

  • Jens Bergmann, Asmelash Teka Hadgu and Robert Jäschke. Tweeting in times of exposure: A mixed-methods approach for exploring patterns of communication related to business scandals on Twitter.
    Currently, three trends mutually influence each other and can be observed using social media: (a) the growing use of social media, in particular Twitter, by organizations, (b) increased expectations of transparency towards organizations, and (c) massive public response to organizational crises via social media. Getting an understanding on how customers and organizations react to crises and crises responses as well as identifying different communication strategies is difficult, since the large amount of actors and the abundance of messages can not be handled by traditional methods from the Social Sciences. These often rely on manual work, for instance, interviews, qualitative studies, or questionnaires. Even large parts of content analysis using computer-assisted qualitative data analysis software have to be supported by manual work. At the same time, the availability and accessibility of large volumes of messages on Twitter also opens up possibilities for mixed-methods approaches to analyze this data. In particular, natural language processing can support the analysis of large sets of tweets. In this work we present first steps towards a large-scale analysis of Twitter communication during corporate crises by leveraging a mixed-methods approach. Such analyses can improve our understanding of organizational crises and their communication and can also prove beneficial to provide recommendation for successful reactions and interactions.