Autextification 🤖👩🏻

 Welcome 🤗 

The AuTexTification: Automated Text Identification shared task will take place as part of IberLEF 2023, the 5th Workshop on Iberian Languages Evaluation Forum at the SEPLN 2023 Conference, which will be held in Jaén, Spain on the 26th of September, 2023.

Introduction

The new era of automatic content generation has surged through powerful causal language models like Generative Pre-trained Transformer (GPT) (Radford et al., 2019) (Ouyang et al., 2022), Pathways Language Model (PaLM) (Chowdhery et al., 2022), BLOOM (Scao et al., 2022) or ChatGPT. Most of these models are publicly available, which boosts research and development of cutting-edge applications. However, they can also be used by malicious users or bots to spread untruthful news, reviews, or opinions (Jahawar et al., 2020). Thus, it is imperative to develop technology to automatically detect generated text for content moderation, including detecting fake news (Deng et al., 2022), bots in online environments (Tourille et al., 2022), and technical research (Rodríguez et al., 2022). Besides, in some legal and security applications, merely identifying machine-generated text may not be sufficient. Instead, it would be required to attribute the text to a generation model, e.g., to notify the developers of a model, protect intellectual property, or to distill responsibilities (Uchendu et al., 2020). The malicious potential of generated text is already a reality, which has led some conferences such as ICML to explicitly ban content generated by language models. In the not-so-distant future, advances in automatic text generation can lead opinion spam to the next level, which will be an imminent threat to companies, consumers, and readers.

 What has already been done?

Several works have studied to what extent humans can detect automatically generated text, reporting that (i) trained evaluators have an accuracy of ~70% on relatively large models like GPT-2 (Ippolito, 2020), (ii) their performance seems to decrease until they perform no better than random chance on larger models such as GPT-3 (Clark et al., 2021) (Ethayarajh et al., 2022), (iii) human evaluators can improve their detection performance through errors inherent to the genre/domain of the text (Dugan et al., 2022), and (iv) the decoding strategy, along with the capacity of the models, has great impact on detection performances (Ippolito et al. 2020). These works give us insights on how humans can detect text, but what about automatic approaches? Some of the biggest technology companies such as Google, OpenAI or Turnitin are trying to develop AI generated text detectors. The purpose of the AuTexTification task is to boost research and development of automatic systems to detect automatically generated text, obtained by state-of-the-art language models, in English and Spanish. 

 Can you spot generated text? 🤔

Could you spot whether the following texts have been automatically generated? Just consider that automatically generated text could show factual, grammatical, or coherence artifacts (Massarelli et al., 2020), along with statistical abnormalities that make the distributions of automatic and human texts differ (Ippolito et al., 2020), despite having well-shaped form (Bender et al., 2020).

I recently purchased a pair of Celestron Skymaster Binoculars and I am quite impressed with their performance. They are solidly built and provide a good balance of size and weight. The optics are sharp and the image is bright. I was able to easily spot many celestial objects in the night sky, including planets, stars, and galaxies. The focusing mechanism is smooth and easy to use.

Después de la entrevista, Isabel Rodríguez se reunió con los miembros del PSOE para discutir su postura en relación a las enmiendas presentadas al proyecto de ley. La reunión fue productiva y los miembros del partido acordaron apoyar algunas de las enmiendas presentadas por el PP, siempre y cuando se ajustaran a los principios y valores del PSOE. Rodríguez señaló que el partido continuará trabajando en conjunto con el PP en el interés del bienestar del país.

 References