Description
The task is divided into two subtasks:
Subtask 1: 5W1Hs identification: Participants will be provided with a text, and they will have to determine the essential content of it by annotating the answers to the 5W1H questions of the document. To participate in this task please use the following Kaggle link: FLARES: Subtask 1 --5W1Hs identification--.
Subtask 2: 5W1H-based reliability: For each 5W1H detected, participants will have to determine if the language used in each item is "confiable", "semiconfiable" or "no confiable", following the RUN-AS guideline. To participate in this task please use the following Kaggle link: FLARES: Subtask 2 --Reliability classification-- .
The RUN-AS annotation guideline enables the detection of the essential parts of a news item together with the reliability of its semantic elements, as well as other linguistic elements of interest that allow to find linguistic patterns of reliability in text, without using external knowledge. The goal of this annotation proposal is to analyze content on the basis of a purely linguistic analysis to find out whether the way in which a news item is structured or written influences its reliability. To find out whether a news item presents objective information and follows journalistic standards, this proposal enables a three-level annotation: Structure (Inverted Pyramid hypothesis), Content (5W1H technique), and Elements of Interest (key expressions, orthotypography, quotes, etc.). For the present task, only the second level will be used, that is the use of the 5W1H, to detect the most important information in a text along with its reliability.
The labels proposed for annotating this task are WHAT (fact), WHO (subject), WHEN (time), WHERE (place), WHY (cause), and HOW (manner).
Using the 5W1H technique, we can break down the sentence “The arrest of the Italian scientist took place by force yesterday in Milan for selling an unauthorized vaccine” as follows:
What: The arrest
Who: Italian scientist
How: by force
When: yesterday
Where: in Milan
Why: for selling an unauthorized vaccine
Along with these labels, the attribute “reliability” will be used to classify reliability in language with the values “confiable”, “semiconfiable” or “no confiable”, depending on the two following criteria:
Accuracy of the content (which takes into account aspects such as vagueness, ambiguity, lack of evidence or orthotypography).
Example: “A long time ago” (inaccurate = no confiable) vs. “On Friday 19 march” (accurate = confiable)
Example: “A scientist” (inaccurate = no confiable) vs. “The European Medicines Agency” (accurate = confiable)
Neutrality of the content (which considers personal remarks, emotionally charged content, quotes, author’s stance or the objectivity of the title)
Example: “In my opinion” (subjective = no confiable) vs. “According to the rector of the University of Alicante” (objective = confiable)
Example: “This news item can save your life” (subjective = no confiable) vs. “This news item talks about the positive effects” (objective = confiable)