FadeIT is based on Faina (Ramponi et al., 2025), a dataset for fallacy detection in Italian social media posts spanning 4 years of public discourse on migration, climate change, and public health. It includes annotations at the fine-grained level of text segments with potential overlaps for 20 fallacy types (see a complete list below). The whole dataset has been labeled by two annotators which underwent multiple rounds of discussions to remove annotation errors (e.g., due to attention drops) whilst keeping signals of human label variation (Plank, (2022); such as genuine disagreement due to different plausible interpretations, as shown in the Figure on the right).
Following previous work advocating the importance of going beyond the "single ground truth" assumption in NLP (Plank, 2022; Cabitza et al., 2023; Aroyo and Welty, 2015; inter alia), we provide participants with individual annotators' labels across all the instances. Participants are free to leverage this information (signal or noise? 🙂) or perform labels' aggregation following their own aggregation strategy. In either case, test set predictions will be compared against all gold standards (see task description).
The list of the 20 fallacy types is presented below. For extended descriptions and fallacy examples, please refer to Ramponi et al. (2025).
Ad hominem: A personal attack to an individual or a group which deviates from the main thesis;
Appeal to authority: The author appeals to an authority or a group consensus to support their thesis, without further evidence;
Appeal to emotion: It involves the use of negative or positive personal emotions (e.g., shame, indignation, pity, or affection) to intentionally or unintentionally influence the audience;
Causal oversimplification: It involves a simplified and fallacious causal relation;
Cherry picking: It consists of choosing evidence to support a thesis while ignoring any other contrary evidence;
Circular reasoning: An error of circularity in which the end of an argument comes back to the beginning without having proven itself;
Doubt: It is used to intentionally question the credibility of someone or something;
Evading the burden of proof: A thesis is advanced without any support as if it was self-evident, meaning that one or more arguments are missing in the argument structure;
False analogy: It occurs when two different things or situations are placed on the same level because they are supposed to share similar aspects;
False dilemma: It presents only two options or sides when there are many;
Flag waving: It occurs when the author intentionally plays on a sense of belonging to a country, a group, or an ideology to support an argument, as if waving a flag;
Hasty generalization: It occurs when a generalization is drawn from a sample which is too small, not representative, or not applicable to the whole situation if all the variables are taken into account;
Loaded language: It involves using words or phrases with strong emotional implications (either positive or negative) to influence the audience;
Name calling or labeling: It involves labeling something or someone positively or negatively to influence the audience, for example associating it with an ideology;
Red herring: The argument supporting the claim diverges the attention to issues which are irrelevant for the claim at hand;
Slippery slope: It implies that an exaggerated consequence could result from a particular action;
Slogan: It consists of a brief and striking phrase that is used to provoke excitement of the audience;
Strawman: It consists of distorting someone else's argument and then tearing it down. The arguer misinterprets an opponent's argument for the purpose of more easily attacking it, demolishes it, and then concludes that the opponent's real argument has been demolished;
Thought-terminating cliché: It consists of a short and generic phrase that discourages critical thought and meaningful discussion;
Vagueness: It is found when ambiguous words are shifted in meaning in the process of arguing or are left vague, being potentially subject to skewed interpretations.
Subtask A: Coarse-grained fallacy detection
The data format is in a tab-separated format and contains a header line. Each line consists of information about each post (i.e., id, date, topic, text, labels). Post-level annotations by each annotator are provided in separate columns and multiple annotations for the same post and annotator are separated by a pipe (|). Specifically, each post is represented as shown below:
where:
$POST_ID: the identifier of the post (integer);
$POST_DATE: the date of the post (YYYY-MM);
$POST_TOPIC_KEYWORDS: the set in which the keyword that led to the post selection belongs (migration, climate change, or public health);
$POST_TEXT: the text of the post (anonymized with [USER], [URL], [EMAIL], and [PHONE] placeholders);
$LABELS_BY_ANN_j: the fallacy label(s) assigned by annotator j for the post (e.g., "Vagueness", "Strawman"). In the case where multiple labels for the post are assigned by the same annotator j, these are separated by a pipe (|) and ordered lexicographically, e.g., "Strawman|Vagueness". In the case where no labels for the post are assigned by the same annotator j, the label is empty.
Example of a post for subtask A. The last two columns indicate multiple plausible post-level
annotations provided by annotators A and B due to different (equally valid) interpretations.
Subtask B: Fine-grained fallacy detection
The data format is based on the CoNLL format. Each post is separated by a blank line and consists of a header with post information, followed by each token in the text (with tab-separated information) separated by newlines. Token annotations follow the BIO scheme (i.e., B: begin, I: inside, O: outside) and multiple annotations for the same token and annotator are separated by a pipe (|). Specifically, each post is represented as shown below:
where:
$POST_ID: the identifier of the post (integer);
$POST_DATE: the date of the post (YYYY-MM);
$POST_TOPIC_KEYWORDS: the set in which the keyword that led to the post selection belongs (migration, climate change, or public health);
$POST_TEXT: the text of the post (anonymized with [USER], [URL], [EMAIL], and [PHONE] placeholders);
$TOKEN_i: the index of the token within the post (incremental integer);
$TOKEN_i_TEXT: the text of the i-th token within the post;
$TOKEN_i_LABELS_BY_ANN_j: the fallacy label(s) assigned by annotator j for the i-th token within the post. Each label follows the format $BIO-$LABEL, where $BIO is the BIO tag and $LABEL is the fallacy label (e.g., "Vagueness", "Strawman"), e.g., "B-Vagueness", "I-Strawman", and "O". In the case where multiple labels for the i-th token are assigned by the same annotator j, these are separated by a pipe (|) and ordered lexicographically by $LABEL, e.g., "I-Strawman|B-Vagueness". In the case where no labels for the i-th token are assigned by the same annotator j, the label is "O".
Example of a post for subtask B. The last two columns indicate multiple plausible span-level annotations provided by annotators A and B due to different (equally valid) interpretations.
The FadeIT data consists of two data splits: one for training/development and one for testing. These have been created by paying particular attention to time and topic distribution across the splits to ensure reliability in the official evaluation:
train/dev set: given on September 22, 2025 (see important dates) along with the evaluation scorer for designing your solution(s) / training and assessing the performance of your model(s). It represents 80% of the posts with gold labels. Participants are free to decide how to split this set into train/dev portions as part of their design decisions;
test set: given on November 3, 2025 (see important dates) without gold labels (20% of the posts). You will have to return predictions (results of the runs) to us for the official test set evaluation, following the subtask data format described in the previous sections (instruction for submissions are available on the FadeIT repository on GitHub).
The FadeIT shared task is meant to study fallacious argumentation in social media texts. The Faina dataset underlying it can be used for non-commercial purposes only and is released upon request in an anonymized format, with no users' information nor original post identifiers to preserve their anonymity. The user must declare to avoid deanonymization of the data by any means and to use the dataset in compliance with current user protection regulations, therefore excluding data misuse. The dataset cannot be redistributed by the user to third parties or in online repositories without the consent of the authors.
âš Since the dataset consists of social media posts, it may contain some profanities, slurs, and hateful content.