StanceEval

Stance Detection in Arabic Language Shared Task

Part of the The Second Arabic Natural Language Processing Conference (ArabicNLP 2024) Co-located with ACL 2024

The vast growth of social media platforms, online news outlets, and digital communication has increased user-generated content exponentially in recent years. This unprecedented surge in online discourse has sparked an urgent need to develop automated tools and techniques to effectively analyze the opinions and attitudes expressed within these expansive streams of text. Stance detection, a critical task within the field of Natural Language Processing (NLP), aims to identify the position or perspective of a writer towards a specific topic or entity by analyzing their written text and/or social media activity, such as preferences and connections. The applications of stance detection are diverse and encompass domains such as politics, marketing, and social media analysis.

Shared task description

The goal of this shared task is to propose models for detecting writers' stances (Favor, Against, or None) towards three selected topics (COVID-19 vaccine, digital transformation, and women empowerment). Participants can approach the stance detection task through single-task or multi-task learning (MTL). Single-task learning-based models depend only on the stance data for model development and training. MTL-based models can use other information, such as the sentiment and sarcasm of each tweet, to boost the performance of the stance detection system.

Classes

The possible stance labels are:

FAVOR means that we can infer from the post that the author supports the target (e.g., explicitly supporting the target or something aligned with the target, or if the post contains information such as news, a quote, a story, which reveals that the author is in favor of the target).
AGAINST means that we can infer from the tweet that the author is against the target (e.g., explicitly opposing the target or something aligned with the target, or if the post contains information such as news, a quote, a story, which reveals that the author is against the target).
NONE means that the tweet provides no hint as to the author's stance toward the target (e.g., there is no evidence in the tweet to judge the author's stance, such as inquiries, or news that does not express any positive or negative position).

Dataset

Participating teams will use the publicly available "Mawqif" dataset. Mawqif comprises 4,121 entries distributed across "COVID-19 vaccine" (1,373 entries), "digital transformation" (1,348 entries), and "women empowerment" (1,400 entries).

It is structured as a multi-label dataset with labels including stance (Favor, Against, None), sentiment (Positive, Negative, Neutral), and sarcasm (Sarcastic and Non-sarcastic).

Access Links:

"Mawqif" dataset paper

"Mawqif" Dashboard

EVALUATION

We will use the macro F1-score as the bottom-line evaluation metric. The macro F1-score is computed as the average of the F1-scores for the "FAVOR" and "AGAINST" categories. This metric is computed for each target separately, and then the overall macro F-score is computed across all targets.

An evaluation script has been provided so that you can:

Check the format of your gold standard file and prediction file.
Provide error messages if the format of the gold standard file or prediction file is incorrect or if there are invalid labels or targets.
Compute precision, recall, and F1-score for each target category separately.
Compute the overall macro F1-score across all target categories.

A blind test set will be used to evaluate the outputs of participating teams.