A.I. Authorship Analysis

The A3 (A.I. Authorship Analysis) project @ PIKE of Penn State University, USA, investigates various authorship-related issues in the generation, detection, perturbation, and obfuscation of AI-generated human languages such as LLM-generated texts. In particular, we aim to find good solutions to research questions such as the following:

What are the characteristics of LLM-generated texts, distinct from human-written texts? Are there such?
How to build efficient and effective Turing Testers to differentiate LLM-generated texts from human-written texts?
Is it possible to hide effective, robust, and undetectable watermark to LLM-generated texts?
How to obfuscate texts to disguise their true authorship?
What are innovative applications and scenarios where true understanding on AI authorship can benefit users?

Team

Current

Jooyoung Lee, PhD student @ Penn State
Nafis Tripto, PhD student @ Penn State
Saranya Venkatraman, PhD student @ Penn State
Dongwon Lee, Faculty @ Penn State

Alumni

Adaku Uchendu, PhD student @ Penn State --> Technical Staff @ MIT Lincoln Lab
Thai Le, PhD student @ Penn State --> Assistant Professor @ U. Mississippi
Erix Xing, undergraduate summer intern @ Western Kentucky Univ. --> PhD student @ WashU
Ziyao Wang, undergraduate summer intern @ Wuhan Univ. --> PhD student @ UMD
Jialin Shao, undergraduate summer intern @ Beijing Univ. of Technology --> MS student @ UIUC

Publication

2024

Authorship Obfuscation in Multilingual Machine-Generated Text Detection

Dominik Macko, Robert Moro, Adaku Uchendu, Ivan Srba, Jason Lucas, Michiharu Yamashita, Nafis Irtiza Tripto, Dongwon Lee, Jakub Simko, Maria Bielikova, arxiv.org/abs/2401.07867

A Ship of Theseus: Curious Cases of Paraphrasing in LLM-Generated Texts
Nafis Irtiza Tripto, Saranya Venkatraman, Dominik Macko, Robert Moro, Ivan Srba, Adaku Uchendu, Thai Le, Dongwon Lee, arXiv:2311.08374
TopRoBERTa: Topology-Aware Authorship Attribution of Deepfake Texts

Adaku Uchendu, Thai Le, Dongwon Lee, arXiv:2309.12934

Catch Me If You GPT: Tutorial on Deepfake Texts

Adaku Uchendu, Saranya Venkatraman, Thai Le, Dongwon Lee, Annual Conf. of the North American Chapter of the Asso. for Comp. Linguistics (NAACL), Mexico City, Mexico, June 2024 (Tutorial)

GPT-who: An Information Density-based Machine-Generated Text Detector

Saranya Venkatraman, Adaku Uchendu, Dongwon Lee, Annual Conf. of the North American Chapter of the Asso. for Comp. Linguistics (NAACL-Findings), Mexico City, Mexico, June 2024

ALISON: Fast and Effective Stylometric Authorship Obfuscation
Eric Xing, Saranya Venkatraman, Thai Le, Dongwon Lee, 38th AAAI Conf. on Artificial Intelligence (AAAI), Vancouver, Canada, February 2024

2023

MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection Benchmark
Dominik Macko, Robert Moro, Adaku Uchendu, Jason Lucas, Michiharu Yamashita, Matus Pikuliak, Ivan Srba, Thai Le, Dongwon Lee, Jakub Simko, Maria Bielikova, Conf. on Empirical Methods in Natural Language Processing (EMNLP), Singapore, December 2023
UPTON: Preventing Authorship Leakage from Public Text Release via Data Poisoning
Ziyao Wang, Thai Le, Dongwon Lee, Findings of Conf. on Empirical Methods in Natural Language Processing (EMNLP-Findings), Singapore, December 2023
HANSEN: Human and AI Spoken Text Benchmark for Authorship Analysis
Nafis Irtiza Tripto, Adaku Uchendu, Thai Le, Mattia Setzu, Fosca Giannotti, Dongwon Lee, Findings of Conf. on Empirical Methods in Natural Language Processing (EMNLP-Findings), Singapore, December 2023
Does Human Collaboration Enhance the Accuracy of Identifying LLM-Generated Deepfake Texts?

Adaku Uchendu, Jooyoung Lee, Hua Shen, Thai Le, Ting-Hao Huang, Dongwon Lee,11th AAAI Conf. on Human Computation and Crowdsourcing (HCOMP), Delft, Netherlands, November 2023

Do Language Models Plagiarize?

Jooyoung Lee, Thai Le, Jinghui Chen, Dongwon Lee, The ACM Web Conference (WWW), Austin, TX, April 2023

Catch Me If You GAN: Generation, Detection, and Obfuscation of Deepfake Texts

Adaku Uchendu, Thai Le, Dongwon Lee The ACM Web Conference (WWW), Austin, TX, April 2023 (Tutorial)

Attribution and Obfuscation of Neural Text Authorship: A Data Mining Perspective

Adaku Uchendu, Thai Le, Dongwon Lee, SIGKDD Explorations, Vol. 25, No. 1, page 1-18, June 2023

2022

SHIELD: Defending Textual Neural Networks against Multiple Black-Box Adversarial Attacks with Stochastic Multi-Expert Patcher
Thai Le, Noseong Park, Dongwon Lee, 60th Annual Meeting of the Asso. for Comp. Linguistics (ACL), Dublin, Ireland, May 2022
Perturbations in the Wild: Leveraging Human-Written Text Perturbations for Realistic Adversarial Attack and Defense
Thai Le, Jooyoung Lee, Kevin Yen, Yifan Hu, Dongwon Lee, Findings of 60th Annual Meeting of the Asso. for Comp. Linguistics (ACL-Findings), Dublin, Ireland, May 2022

2021

TuringBench: A Benchmark Environment for Turing Test in the Age of Neural Text Generation

Adaku Uchendu, Zeyu Ma, Thai Le, Rui Zhang, Dongwon Lee, Findings of Conf. on Empirical Methods in Natural Language Processing (EMNLP-Findings), November 2021

2020

Authorship Attribution for Neural Text Generation

Adaku Uchendu, Thai Le, Kai Shu, Dongwon Lee, Conf. on Empirical Methods in Natural Language Processing (EMNLP), November 2020

2019

Characterizing Man-made vs. Machine-made Chatbot Dialogs

Adaku Uchendu, Jeffrey Cao, Qiaozhi Wang, Bo Luo, Dongwon Lee, Int'l Conf. on Truth and Trust Online (TTO), London, UK, October 2019

A Reverse Turing Test for Detecting Machine-Made Texts

Jialin Shao, Adaku Uchendu, Dongwon Lee, 11th Int'l ACM Web Science Conf. (WebSci), Boston, MA, July 2019