Autextification 🤖👩🏻
Welcome 🤗
The AuTexTification: Automated Text Identification shared task will take place as part of IberLEF 2023, the 5th Workshop on Iberian Languages Evaluation Forum at the SEPLN 2023 Conference, which will be held in Jaén, Spain on the 26th of September, 2023.
Introduction
The new era of automatic content generation has surged through powerful causal language models like Generative Pre-trained Transformer (GPT) (Radford et al., 2019) (Ouyang et al., 2022), Pathways Language Model (PaLM) (Chowdhery et al., 2022), BLOOM (Scao et al., 2022) or ChatGPT. Most of these models are publicly available, which boosts research and development of cutting-edge applications. However, they can also be used by malicious users or bots to spread untruthful news, reviews, or opinions (Jahawar et al., 2020). Thus, it is imperative to develop technology to automatically detect generated text for content moderation, including detecting fake news (Deng et al., 2022), bots in online environments (Tourille et al., 2022), and technical research (Rodríguez et al., 2022). Besides, in some legal and security applications, merely identifying machine-generated text may not be sufficient. Instead, it would be required to attribute the text to a generation model, e.g., to notify the developers of a model, protect intellectual property, or to distill responsibilities (Uchendu et al., 2020). The malicious potential of generated text is already a reality, which has led some conferences such as ICML to explicitly ban content generated by language models. In the not-so-distant future, advances in automatic text generation can lead opinion spam to the next level, which will be an imminent threat to companies, consumers, and readers.
What has already been done?
Several works have studied to what extent humans can detect automatically generated text, reporting that (i) trained evaluators have an accuracy of ~70% on relatively large models like GPT-2 (Ippolito, 2020), (ii) their performance seems to decrease until they perform no better than random chance on larger models such as GPT-3 (Clark et al., 2021) (Ethayarajh et al., 2022), (iii) human evaluators can improve their detection performance through errors inherent to the genre/domain of the text (Dugan et al., 2022), and (iv) the decoding strategy, along with the capacity of the models, has great impact on detection performances (Ippolito et al. 2020). These works give us insights on how humans can detect text, but what about automatic approaches? Some of the biggest technology companies such as Google, OpenAI or Turnitin are trying to develop AI generated text detectors. The purpose of the AuTexTification task is to boost research and development of automatic systems to detect automatically generated text, obtained by state-of-the-art language models, in English and Spanish.
Can you spot generated text? 🤔
Could you spot whether the following texts have been automatically generated? Just consider that automatically generated text could show factual, grammatical, or coherence artifacts (Massarelli et al., 2020), along with statistical abnormalities that make the distributions of automatic and human texts differ (Ippolito et al., 2020), despite having well-shaped form (Bender et al., 2020).
I recently purchased a pair of Celestron Skymaster Binoculars and I am quite impressed with their performance. They are solidly built and provide a good balance of size and weight. The optics are sharp and the image is bright. I was able to easily spot many celestial objects in the night sky, including planets, stars, and galaxies. The focusing mechanism is smooth and easy to use.
Después de la entrevista, Isabel Rodríguez se reunió con los miembros del PSOE para discutir su postura en relación a las enmiendas presentadas al proyecto de ley. La reunión fue productiva y los miembros del partido acordaron apoyar algunas de las enmiendas presentadas por el PP, siempre y cuando se ajustaran a los principios y valores del PSOE. Rodríguez señaló que el partido continuará trabajando en conjunto con el PP en el interés del bienestar del país.
References
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., & Lowe, R. (2022). Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155.
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., & Fiedel, N. (2022). Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., & Manica, M. (2022). Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
Jawahar, G., Abdul-Mageed, M., & Lakshmanan, L. (2020). Automatic Detection of Machine Generated Text: A Critical Survey. In Proceedings of the 28th International Conference on Computational Linguistics (pp. 2296–2309). International Committee on Computational Linguistics.
Deng, R., & Duzhin, F. (2022). Topological Data Analysis Helps to Improve Accuracy of Deep Learning Models for Fake News Detection Trained on Very Small Training Sets. Big Data and Cognitive Computing, 6(3).
Tourille, J., Sow, B., & Popescu, A. (2022). Automatic Detection of Bot-Generated Tweets. In Proceedings of the 1st International Workshop on Multimedia AI against Disinformation (pp. 44–51). Association for Computing Machinery.
Rodriguez, J., Hay, T., Gros, D., Shamsi, Z., & Srinivasan, R. (2022). Cross-Domain Detection of GPT-2-Generated Technical Text. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1213–1233). Association for Computational Linguistics.
Ippolito, D., Duckworth, D., Callison-Burch, C., & Eck, D. (2020). Automatic Detection of Generated Text is Easiest when Humans are Fooled. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 1808–1822). Association for Computational Linguistics.
Uchendu, A., Le, T., Shu, K., & Lee, D. (2020, November). Authorship attribution for neural text generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 8384-8395).
Clark, E.; August, T.; Serrano, S.; Haduong, N.; Gururangan, S.; and Smith, N. A. 2021. All That’s ‘Human’Is Not Gold: Evaluating Human Evaluation of Generated Text. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 7282–7296.
Ethayarajh, K., & Jurafsky, D. (2022). How human is human evaluation? Improving the gold standard for NLG with utility theory. arXiv preprint arXiv:2205.11930.
Dugan, L., Ippolito, D., Kirubarajan, A., & Callison-Burch, C. (2020). RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 189–196). Association for Computational Linguistics.
Massarelli, L., Petroni, F., Piktus, A., Ott, M., Rocktäschel, T., Plachouras, V., Silvestri, F., & Riedel, S. (2020). How Decoding Strategies Affect the Verifiability of Generated Text. In Findings of the Association for Computational Linguistics: EMNLP 2020 (pp. 223–235). Association for Computational Linguistics.
Bender, E., & Koller, A. (2020). Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 5185–5198). Association for Computational Linguistics.