Task description 📝

AUtomated TEXt IdenTIFICATION on languages of the Iberian peninsula (IberAuTexTification) is the second version of the AuTexTification at IberLEF 2023 shared task (Sarvazyan et al., 2023). We extend our previous task in three dimensions: more models, more domains and more languages from the Iberian Peninsula (in a multilingual fashion), aiming to build more generalizable detectors and attributors. In this task, participants must develop models that exploit clues about linguistic form and meaning to identify automatically generated texts from a wide variety of models, domains, and languages. We plan to include LLMs like GPT-3.5, GPT-4, LLaMA, Coral, Command, Falcon, MPT, among others. New domains like essays, or dialogues, and cover the most prominent languages from the Iberian Peninsula: Spanish, Catalan, Basque, Galician, Portuguese, and English (in Gibraltar).  

A novelty from this edition is to detect in a multilingual (languages from the Iberian peninsula such as Spanish, English, Catalan, Gallego, Euskera, and Portuguese), multi-domain (news, reviews, emails, essays, dialogues, wikipedia, wikihow, tweets, etc.), and multi-model (GPT, LLaMA, Mistral, Cohere, Anthropic, MPT, Falcon, etc.) setup, whether a text has been automatically generated or not, and, if generated, identify the model that generated the text.

Subtask 1: Human or Generated

Participants will be provided a text, and they will have to determine whether the text has been automatically generated or not. To encourage models to learn features that generalize to new writing styles, five domains will be used for training, and two different domains for testing.

Subtask 2: Model Attribution

Participants will be provided an automatically generated text, and they will have to determine what model generated it. To encourage models to learn features that generalize to new writing styles, five domains will be used for training, and two different domains for testing.

The first subtask is a binary classification task with two classes: 👩🏻 and 🤖, while the second one is a multi-class classification task. The models used to generate text are instructed LLMs from very different providers like OpenAI, Amazon Bedrock, Anthropic, Cohere, AI21, Google Vertex AI, Meta, etc. The datasets have been generated using TextMachina, a tool to create MGT datasets through a wide variety of prompts, controlling classical biases present in this kind of datasets.

The datasets will include texts from domains like essays, news, social media (tweets, forums, dialogues), wikipedia, wikihow, etc. Texts from uncontrolled domains as extracted from the OSCAR (Abadji et al., 2022) and Colossal Cleaned Multilingual Common Crawl (Raffel, 2019) will be included too.

IMPORTANT: there are restrictions to participate in the task. Please, read the Task constraints section.

Examples

Subtask 1: Human or Generated

🤖 No le digas lo que tiene o no tiene, pero sí que puede ayudarte a verle con otros ojos y entender mejor sus sentimientos. No importa si es tu amigo o tu pareja, una persona tóxica es la que hace sentir mal a las demás personas y, aunque seas su amiga o compañero sentimental, también te afecta negativamente. Así pues, ante el primer signo de toxicidad, actúa rápido y ten cuidado al hablar con esa persona. Trata de alejarla de ti.

👩🏻 Las culatas de los coches, las cajas de cambios, los armazones de las bombas de aguas y los pistones son opciones estupendas. Otras fuentes más comunes son objetos como las latas de cerveza o refrescos, las estructuras de los muebles, los revestimientos de las casas, los moldes para tartas o las bandejas para hornear pavo. Sin embargo, estos objetos tienden a ser de aleaciones más frágiles con gran cantidad de impurezas.

🤖 It may take up to an hour or more to reach that temp. When the water is heated, you can add your soap (or other cleaning agent) into the tank. Then, mix everything with a bottle of shaker sprayer. I like this one because it's easy to use and comes with its own attachment for making really powerful streams of spray. The best part? You don't have to worry about running out of soap. Once you've filled the tank, there are no more refills needed!

👩🏻 Talk to your doctor and let him or her know about persistent side effects. Discuss ways to manage them and still get the benefit you need from the medication. Sometimes, doses can be adjusted, timing can be altered, different medications can be tried, or you can be switched to or from a longer acting product that can help reduce your side effects. You are the most important member of your healthcare team. Take an active role in monitoring your condition. This may lead to a lower dose.

Subtask 2: Model Attribution

🤖 (A) Hay personas que cuando se les pregunta qué es el amor, responden de manera inmediata amor romántico.

🤖(B) @ElDatoDelDia: VIDEO: Dramático momento en el que el equipo de Riquelme intentó quitarle la pelota a Boca. En un video.

🤖(C) Lo juro por las cenizas de mis antepasados que nunca más volveré a hablar de esto. ¡Basta!

🤖(D) Massimo Bottura está en América. Uno de los restaurantes que no consiguió renovar en el circuito turístico del mundo es el restaurante italico

🤖(E) @CuriosoDato: Perdonar es el primero paso. Lo mejor es no perdonar. Ricardo: Es verdad.

🤖(F) Gómez, es el cuarto día de la semana. Se celebra el jueves. El jueves es un día de trabajo en la mayoría de los países

🤖 (A) 1 January 1996. This document is being distributed to all States Parties under the symbol C.N.15.E.8. The text reproduced below was adopted by the Administrative Committee at its meeting held on 30 September 1995, pursuant to paragraph 7 (b) (i) of Annex III to the Agreement Concerning the Adoption of Uniform Conditions of Approval and Reciprocal Recognition of wheeled vehicles and their equipment (1958). It shall be applicable mutatis mutandis to other UN Regulations annex

🤖 (B) Art. 2, paragraph 4. In addition to the requirements laid down in the technical regulations referred to in Annex I to this Regulation, the following additional provisions must apply: 4.1.2. the mechanical power transmitted to the axle must not exceed 3 kW. 4.2. it may also be necessary to restrict the number of wheels mountable on the vehicle at a given time and at any speed when such restriction is considered expedient for reasons of safety or good road conditions.

🤖 (C) The Commission shall decide annually on the allocation of these additional resources among the Member States. 3. The Commission shall publish a list of eligible beneficiaries within three months after the end of the previous calendar year. 4. In order to determine the eligibility of an applicant, the Commission shall verify that the applicant has complied with the requirements set out in article 16. Article 19. The Council shall adopt regulations implementing this Regulation. Article 20. 1. This Regulation shall come into force on 1 January 1999

🤖 (D) (2) This Decision shall enter into force on the date of its publication in the Official Journal of the European Union. (3) For the purposes of the procedure provided for in Article 114 of the Treaty on the Functioning of the European Union, the Council shall adopt acts in accordance with the ordinary legislative procedure. (4) The Government and the Commission shall, as appropriate, communicate the adoption of this Decision to the European Parliament and to the Council.

🤖 (E) The provisions of the third paragraph of Article 9(2) shall apply to references made by the Commission in accordance with the procedure referred to in paragraph 3, by the ECB in accordance with the procedure referred to in Article 10(2) and by the ESMA in accordance with the procedure referred to in Article 11(1). 5. In the case of a reference to the Court pursuant to Article 9(2) in respect of a measure adopted by a Member State or the ECB, the Court shall

🤖 (F) Article 1 Member States shall take all necessary measures to ensure that total emissions of nitrogen oxides in the Community, measured in terms of nitrogen dioxide (NO2) and nitrogen oxides (NOx), are reduced to levels that do not exceed those of 1990 by the end of the period 1995 to 1997. Article 2 Member States shall take all necessary measures to ensure that total emissions of all nitrogenized pollutants in the Community are reduced to levels that do not exceed