A.I. Authorship Analysis
The A3 (A.I. Authorship Analysis) project @ PIKE of Penn State University, USA, investigates various authorship-related issues in the generation, detection, perturbation, and obfuscation of AI-generated human languages such as LLM-generated texts. In particular, we aim to find good solutions to research questions such as the following:
What are the characteristics of LLM-generated texts, distinct from human-written texts? Are there such?
How to build efficient and effective Turing Testers to differentiate LLM-generated texts from human-written texts?
Is it possible to hide effective, robust, and undetectable watermark to LLM-generated texts?
How to obfuscate texts to disguise their true authorship?
What are innovative applications and scenarios where true understanding on AI authorship can benefit users?
Team
Current
Jooyoung Lee, PhD student @ Penn State
Nafis Tripto, PhD student @ Penn State
Saranya Venkatraman, PhD student @ Penn State
Dongwon Lee, Faculty @ Penn State
Alumni
Adaku Uchendu, PhD student @ Penn State --> Technical Staff @ MIT Lincoln Lab
Thai Le, PhD student @ Penn State --> Assistant Professor @ U. Mississippi
Erix Xing, undergraduate summer intern @ Western Kentucky Univ. --> PhD student @ WashU
Ziyao Wang, undergraduate summer intern @ Wuhan Univ. --> PhD student @ UMD
Jialin Shao, undergraduate summer intern @ Beijing Univ. of Technology --> MS student @ UIUC
Publication
2024
Dominik Macko, Robert Moro, Adaku Uchendu, Ivan Srba, Jason Lucas, Michiharu Yamashita, Nafis Irtiza Tripto, Dongwon Lee, Jakub Simko, Maria Bielikova, arxiv.org/abs/2401.07867
A Ship of Theseus: Curious Cases of Paraphrasing in LLM-Generated Texts
Nafis Irtiza Tripto, Saranya Venkatraman, Dominik Macko, Robert Moro, Ivan Srba, Adaku Uchendu, Thai Le, Dongwon Lee, arXiv:2311.08374TopRoBERTa: Topology-Aware Authorship Attribution of Deepfake Texts
Adaku Uchendu, Thai Le, Dongwon Lee, arXiv:2309.12934
Adaku Uchendu, Saranya Venkatraman, Thai Le, Dongwon Lee, Annual Conf. of the North American Chapter of the Asso. for Comp. Linguistics (NAACL), Mexico City, Mexico, June 2024 (Tutorial)
Saranya Venkatraman, Adaku Uchendu, Dongwon Lee, Annual Conf. of the North American Chapter of the Asso. for Comp. Linguistics (NAACL-Findings), Mexico City, Mexico, June 2024
ALISON: Fast and Effective Stylometric Authorship Obfuscation
Eric Xing, Saranya Venkatraman, Thai Le, Dongwon Lee, 38th AAAI Conf. on Artificial Intelligence (AAAI), Vancouver, Canada, February 2024
2023
MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection Benchmark
Dominik Macko, Robert Moro, Adaku Uchendu, Jason Lucas, Michiharu Yamashita, Matus Pikuliak, Ivan Srba, Thai Le, Dongwon Lee, Jakub Simko, Maria Bielikova, Conf. on Empirical Methods in Natural Language Processing (EMNLP), Singapore, December 2023UPTON: Preventing Authorship Leakage from Public Text Release via Data Poisoning
Ziyao Wang, Thai Le, Dongwon Lee, Findings of Conf. on Empirical Methods in Natural Language Processing (EMNLP-Findings), Singapore, December 2023HANSEN: Human and AI Spoken Text Benchmark for Authorship Analysis
Nafis Irtiza Tripto, Adaku Uchendu, Thai Le, Mattia Setzu, Fosca Giannotti, Dongwon Lee, Findings of Conf. on Empirical Methods in Natural Language Processing (EMNLP-Findings), Singapore, December 2023Does Human Collaboration Enhance the Accuracy of Identifying LLM-Generated Deepfake Texts?
Adaku Uchendu, Jooyoung Lee, Hua Shen, Thai Le, Ting-Hao Huang, Dongwon Lee,11th AAAI Conf. on Human Computation and Crowdsourcing (HCOMP), Delft, Netherlands, November 2023
Jooyoung Lee, Thai Le, Jinghui Chen, Dongwon Lee, The ACM Web Conference (WWW), Austin, TX, April 2023
Adaku Uchendu, Thai Le, Dongwon Lee The ACM Web Conference (WWW), Austin, TX, April 2023 (Tutorial)
Adaku Uchendu, Thai Le, Dongwon Lee, SIGKDD Explorations, Vol. 25, No. 1, page 1-18, June 2023
2022
SHIELD: Defending Textual Neural Networks against Multiple Black-Box Adversarial Attacks with Stochastic Multi-Expert Patcher
Thai Le, Noseong Park, Dongwon Lee, 60th Annual Meeting of the Asso. for Comp. Linguistics (ACL), Dublin, Ireland, May 2022Perturbations in the Wild: Leveraging Human-Written Text Perturbations for Realistic Adversarial Attack and Defense
Thai Le, Jooyoung Lee, Kevin Yen, Yifan Hu, Dongwon Lee, Findings of 60th Annual Meeting of the Asso. for Comp. Linguistics (ACL-Findings), Dublin, Ireland, May 2022
2021
Adaku Uchendu, Zeyu Ma, Thai Le, Rui Zhang, Dongwon Lee, Findings of Conf. on Empirical Methods in Natural Language Processing (EMNLP-Findings), November 2021
2020
Adaku Uchendu, Thai Le, Kai Shu, Dongwon Lee, Conf. on Empirical Methods in Natural Language Processing (EMNLP), November 2020
2019
Adaku Uchendu, Jeffrey Cao, Qiaozhi Wang, Bo Luo, Dongwon Lee, Int'l Conf. on Truth and Trust Online (TTO), London, UK, October 2019
Jialin Shao, Adaku Uchendu, Dongwon Lee, 11th Int'l ACM Web Science Conf. (WebSci), Boston, MA, July 2019