CSS and Language Technology Workshop @ SLTC 2024, Linköping Univeristy, Sweden

Computational Social Science & Language Technology

Workshop during the Swedish Language Technology Conference (SLTC) 2024
@ Linköping University, Sweden

Call for contributions

What's the workshop about?

Where? Linköping University

When? 29th November 2024

What to do? Register for the conference until the 17th of November 2024.

Call for contributions

The call for contributions to the workshop is closed. You can, however, still register as a participant in both the conference and the workshop until the 17th of November 2024. You can find the registration form for the main conference here.

What's the workshop about?

The workshop aims at providing a platform where these perspectives can be exchanged, bridges be built, research objectives be linked, and already existing connections strengthened.

Computational Social Science (CSS) is an interdisciplinary field concerned with computational approaches to study phenomena of the social world, such as political polarisation, opinion dynamics, urban segregation, cultural dynamics, and labour market inequalities. One of the most powerful tools in the CSS is computational text analysis, which draws on methods developed in computational linguistics, NLP, ML, and statistics. While social scientists borrow and adapt numerous methods from computational linguistics or NLP, their research goals are often different from those of the traditional NLP researchers. Social scientists commonly view text data as a form of social sensor that can reveal important insights into social structures and individuals’ characteristics, while NLP researchers instead focus, for example, rather on debiasing training data, making predictions more precise, and performing tasks involving language better.

Target group

The target group is young researchers and senior researchers working in the intersection of computational linguistics, NLP, CSS, and Social Science.

Schedule

9:00 – 10:00 Keynote speech by Andrea Voyer, Professor of Sociology, Stockholm University

10:00 – 10:30 Flash talks

10:30 – 11:00 Fika

11:00 – 12:00 Contributed talks

Titles and Abtracts

Keynote speech by

Andrea Voyer,

Stockholm University

Meaning in the Machine: The Interpretive Turn in Computational Social Science

The rising popularity of computational sociological research has been marked by a strong positivistic orientation, where tools from computational linguistics, machine learning, and NLP have traditionally been seen as quantitative supplements to social statistics. However, as language technologies, including large language models (LLMs), become more prominent in Computational Social Science (CSS), their potential to contribute to a qualitatively focused, interpretive paradigm is gaining recognition. The computational future is increasingly qualitative, and this keynote explores how LLMs can be used to extend the interpretive framework of sociology, moving beyond prediction and quantification to reveal insights about social structures, cultural dynamics, and human meaning-making. By adopting the understanding of text as an artifact of everyday semantic spaces and applying a framework of technological reflexivity, computational social scientists can support a transformative, meaning-centered approach in CSS that preserves interpretive depth while addressing core concerns related to research quality and transparency.

Contributed talk by
Igor Ryazanov,
Umeå University

How ChatGPT Changed the Media’s Narratives on AI:
A Semi-Automated Narrative Analysis Through Frame Semantics

We perform a mixed-method frame semantics-based analysis on a dataset of more than 49,000 sentences collected from 5846 news articles that mention AI. The dataset covers the twelve-month period centred around the launch of OpenAI’s chatbot ChatGPT and is collected from the most visited open-access English-language news publishers. Our findings indicate that during the six months succeeding the launch, media attention rose tenfold—from already historically high levels. During this period, discourse has become increasingly centred around experts and political leaders, and AI has become more closely associated with dangers and risks. A deeper review of the data also suggests a qualitative shift in the types of threat AI is thought to represent, as well as the anthropomorphic qualities ascribed to it.

Contributed talk by
Erik Nylander and
Håkan Sundblad,
Linköping University

From Farming to Fork: Mapping a Century of Educational Change in Swedish Folk High Schools

The project “The Folk High School as Mirrors of Society: Sociological Perspectives on the Educational Offerings of Folk High Schools 1868-2018” (2022-04460_VR), funded by the Swedish Research Council, aims to compile and publish a 150-year data archive on Sweden’s 155 folk high schools as an open educational resource. This archive includes more than 10,000 scanned documents at the moment, which have been digitized through Optical Character Recognition (OCR) to enhance accessibility and usability. To explore this extensive corpus, we are employing large-scale text analysis tools, such as topic modelling, which has already revealed latent structures in postwar folk high school catalogues (Nylander & Holmer, 2022). Additionally, we are utilizing the Llama3 model via the Swedish National Supercomputer Centre (NSC) to automate text labelling, further streamlining data organization and discovery. Future developments include the extraction and automatic labelling of embedded images to enhance searchability and insight generation. This presentation will report on the current progress and methodological advancements of the project, covering the full digitization journey “from farming to fork”, building a national infrastructure while using computational methods.

Flash talk by
Sara Stymne,
Uppsala University

Global disaster framing and climate action

I will present one task from the research project "Enabling climate-resilient development: How disasters can act as a pathway to a safer and more sustainable world". We want to identify mentions of natural disasters and climate policy in a representative sample of documents on a global scale. In order to do this: we are collecting a corpus of representative data and creating an annotation scheme for the categories of interest. For the corpus, we are collecting country statements from the COP conferences held from 1995 onwards. This is challenging for several reasons, including missing statements and documents of poor quality. We are currently mining the text from existing records and combining this with the transcriptions of talks. In parallel, we are designing and evaluating an annotation scheme for mentions of relevant entities, including natural disasters, policy statements, beliefs, and norms. We will use this scheme to annotate a subset of the data, on which we will train automatic classifiers that will be used for labelling the full COP corpus. This will serve as a basis for investigating the framing of climate action with respect to natural disasters across time and countries.

Flash talk by
Anika Binte Habib,
Lund University

Can Algorithm be Green? AI in Environmental Sustainability

Sustainability is now central to Artificial Intelligence (AI) as it grapples with its environmental impact. However, sustainability is a complex, multi-faceted concept that varies by context. With increased environmental degradation due to industrial activity, AI is increasingly viewed as a tool to address these issues. For example, AI technologies can potentially ameliorate energy efficiency by optimising energy production and consumption, predicting demands and responses, developing sustainable energy storage systems, detecting faults and diagnosing them, reducing energy waste, and many more. Critics, however, have questioned AI’s capacity to foster environmental sustainability. Some argue that the infrastructure supporting AI —such as resource extraction and waste streams—is often ignored. Despite these critiques, AI is widely trusted to solve environmental challenges. This research is motivated by the growing hype around AI’s role in environmental sustainability. While AI may address some environmental issues, are we aligned on what sustainability means and how to achieve it? This study explores whether AI merely provides short-term solutions while overlooking the underlying causes of environmental degradation. My contention is that AI will not be able to attain environmental sustainability without recognising and acknowledging the core drivers of current challenges. This study examines the prevailing narratives about AI and their inherent limitations through discourse analysis. While I take a critical stance on AI in this context, my methodology includes Natural Language Process (NLP) and Machine Learning (ML), and for this, I will use the Swedish User and Project Repository (SUPR) supercomputers. This creates a tension that I need to address in my PhD.

Flash talk by
Denitsa Saynova,
Chalmers Technical University

Replicating Social Behavioural Science Studies with LLMs

There is increased concern with the reproducibility of social science experimental results. Since replication studies are expensive and time-consuming, recent work has explored if large language models (LLMs) can be used to support these efforts by simulating human responses. This research has indicated that some proprietary large models (such as OpenAI’s GPT) have the potential to be used as supplementary tools as they have learned associations relevant to these studies (political, cognitive, economic, psychological) that, to a certain extent, reflect human behaviour. In this work, we aim to investigate if LLM results can indicate the reproducibility of a study – that is, do LLMs replicate replicable studies and show no/opposite effect for studies that are not replicable? To investigate this, we focus on 14 studies from the ManyLabs2 Project, out of which eight are replicated with human subjects. We also study how different LLM behaviours affect these results. First, we wish to explore if open-sourced models have the same capacity as proprietary ones. Second, we investigate how the intrinsic stochastic nature of LLMs affects results. We test the effects of prompt sensitivity, temperature, response scales, and refusal.

Flash talk by
Julia Gottstein,
Charles University Prague

Exploring Dark Participation: Analyzing Czech Online Political Discourse with Large Language Models

This study is a methodology-driven segment of a broader research project investigating dark participation in online political discourse. It explores the efficiency of Large Language Models in analyzing opinionated communication within medium-resource language contexts, specifically assessing how various LLMs perform in categorizing Czech-language text compared to human judgment. This focus is particularly relevant given the growing importance of online engagement and the increasing volume of user-generated data, which often exceeds human capacity for manual analysis. The dataset consists of a diverse sample of 35,000 comments from Facebook news posts related to the first round of the Czech Presidential Elections 2023. To systematically assess the nature of discussions — including instances of hate speech, trolling, and their targets, as well as vulgar language — we employed a two-tiered analysis. This approach combines inductive coding by human analysts with the application of three different LLMs for text classification using predetermined categories. We evaluated the performance of two monolingual models, RobeCzech and RetroMAE, alongside the multilingual XLM-RoBERTa model. Additionally, we translated the corpus into English to assess XLM-RoBERTa's performance on translated text classification. Preliminary results indicate that the RetroMAE model demonstrates the highest stability and accuracy, particularly in the classification of rhetoric, identification of key targets, and detection of pejorative framing of communism. Notably, none of the models demonstrated strong performance in categorizing vulgar language. We are currently training the RetroMAE model on an expanded dataset to further enhance its performance. While this study is methodology-driven, its outcomes have broader implications. Given that monolingual LLMs trained on Czech corpora have only recently emerged, they potentially might offer new opportunities for analyzing online discourse in local linguistic contexts. This research contributes to the growing field of computational social science by providing insights into the capabilities and limitations of LLMs in processing medium-resource languages.

Flash talk by
Richard Johansson, Chalmers Technical University and University of Gothenburg

Countering bias in AI methods in the social sciences

AI techniques for text have resulted in some of the most spectacular success stories recently. In this project, we will investigate text-based AI techniques applied in causal inference scenarios, where researchers investigate cause-and-effect questions such as “What is the effect of an IMF program on poverty?” Recently, researchers have proposed methods for exploiting AI in such investigations, allowing researchers to include texts in their investigations that might otherwise be difficult to use. However, previous research by the participants in this project has shown that a naive use of text-based AI risks introducing biases skewing the estimates of causal effects. The risks of drawing incorrect conclusions about effects are obvious and may lead to the implementation of harmful

policies. In the project, we will investigate the challenges of bias when applying text-based AI in causal inference applied to the social and political sciences. We will investigate to what extent these problems can be measured, so that we could know if we are on thin ice when applying the new methods. Furthermore, we will try to “correct” the biases and make the text-based AI more robust when applied in research.

Contacts

The organising committee:

Alexandra Rottenkolber 📧

Anastasia Menshikova 📧
Måns Magnusson 📧

with support of the Swedish Excellence Centre for Computational Social Science (SweCSS).

Credits for the images: LiU

Page updated

Google Sites

Report abuse

Computational Social Science & Language Technology

Workshop during the Swedish Language Technology Conference (SLTC) 2024 @ Linköping University, Sweden

Call for contributions

What's the workshop about?

Target group

Schedule

Titles and Abtracts

Contacts

Workshop during the Swedish Language Technology Conference (SLTC) 2024
@ Linköping University, Sweden