A.I. Authorship Analysis
The A3 (A.I. Authorship Analysis) project @ PIKE of Penn State University, USA, investigates various authorship-related issues in the generation, detection, perturbation, and obfuscation of AI-generated human languages such as LLM-generated texts. In particular, we aim to find good solutions to research questions such as the following:
What are the characteristics of LLM-generated texts, distinct from human-written texts? Are there such?
How to build efficient and effective Turing Testers to differentiate LLM-generated texts from human-written texts?
Is it possible to hide effective, robust, and undetectable watermark to LLM-generated texts?
How to obfuscate texts to disguise their true authorship?
What are innovative applications and scenarios where true understanding on AI authorship can benefit users?
Team
Current
Mahjabin Naher, PhD student @ Penn State
Nafis Tripto, PhD student @ Penn State
Saranya Venkatraman, PhD student @ Penn State
Dongwon Lee, Faculty @ Penn State
Alumni
Jooyoung Lee, PhD student @ Penn State --> Applied Scientist @ Amazon
Adaku Uchendu, PhD student @ Penn State --> Technical Staff @ MIT Lincoln Lab
Thai Le, PhD student @ Penn State --> Assistant Professor @ Indiana Univ.
Erix Xing, undergraduate summer intern @ Western Kentucky Univ. --> PhD student @ WashU
Ziyao Wang, undergraduate summer intern @ Wuhan Univ. --> PhD student @ UMD
Jialin Shao, undergraduate summer intern @ Beijing Univ. of Technology --> MS student @ UIUC
Publication
2024
Dominik Macko, Robert Moro, Adaku Uchendu, Ivan Srba, Jason Lucas, Michiharu Yamashita, Nafis Irtiza Tripto, Dongwon Lee, Jakub Simko, Maria Bielikova
arxiv.org/abs/2401.07867
Mahjabin Naher, Haeseung Seo, Eun-Ju Lee, Aiping Xiong, Dongwon Lee
Conf. on Language Modeling (COLM), Philadelphia, PA, October 2024
Adaku Uchendu, Thai Le, Dongwon Lee
European Conf. on Artificial Intelligence (ECAI), Santiago de Compostela, Spain, October 2024
Nafis Irtiza Tripto, Saranya Venkatraman, Dominik Macko, Robert Moro, Ivan Srba, Adaku Uchendu, Thai Le, Dongwon Lee
62nd Annual Meeting of the Asso. for Comp. Linguistics (ACL), Bangkok, Thailand, August 2024
Adaku Uchendu, Saranya Venkatraman, Thai Le, Dongwon Lee
Annual Conf. of the North American Chapter of the Asso. for Comp. Linguistics (NAACL), Mexico City, Mexico, June 2024 (Tutorial)
Saranya Venkatraman, Adaku Uchendu, Dongwon Lee
Annual Conf. of the North American Chapter of the Asso. for Comp. Linguistics (NAACL-Findings), Mexico City, Mexico, June 2024
ALISON: Fast and Effective Stylometric Authorship Obfuscation
Eric Xing, Saranya Venkatraman, Thai Le, Dongwon Lee
38th AAAI Conf. on Artificial Intelligence (AAAI), Vancouver, Canada, February 2024
2023
MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection Benchmark
Dominik Macko, Robert Moro, Adaku Uchendu, Jason Lucas, Michiharu Yamashita, Matus Pikuliak, Ivan Srba, Thai Le, Dongwon Lee, Jakub Simko, Maria Bielikova
Conf. on Empirical Methods in Natural Language Processing (EMNLP), Singapore, December 2023
UPTON: Preventing Authorship Leakage from Public Text Release via Data Poisoning
Ziyao Wang, Thai Le, Dongwon Lee
Findings of Conf. on Empirical Methods in Natural Language Processing (EMNLP-Findings), Singapore, December 2023
HANSEN: Human and AI Spoken Text Benchmark for Authorship Analysis
Nafis Irtiza Tripto, Adaku Uchendu, Thai Le, Mattia Setzu, Fosca Giannotti, Dongwon Lee
Findings of Conf. on Empirical Methods in Natural Language Processing (EMNLP-Findings), Singapore, December 2023
Adaku Uchendu, Jooyoung Lee, Hua Shen, Thai Le, Ting-Hao Huang, Dongwon Lee
11th AAAI Conf. on Human Computation and Crowdsourcing (HCOMP), Delft, Netherlands, November 2023
Jooyoung Lee, Thai Le, Jinghui Chen, Dongwon Lee
The ACM Web Conference (WWW), Austin, TX, April 2023
Adaku Uchendu, Thai Le, Dongwon Lee
The ACM Web Conference (WWW), Austin, TX, April 2023 (Tutorial)
Adaku Uchendu, Thai Le, Dongwon Lee
SIGKDD Explorations, Vol. 25, No. 1, page 1-18, June 2023
2022
SHIELD: Defending Textual Neural Networks against Multiple Black-Box Adversarial Attacks with Stochastic Multi-Expert Patcher
Thai Le, Noseong Park, Dongwon Lee
60th Annual Meeting of the Asso. for Comp. Linguistics (ACL), Dublin, Ireland, May 2022
Perturbations in the Wild: Leveraging Human-Written Text Perturbations for Realistic Adversarial Attack and Defense
Thai Le, Jooyoung Lee, Kevin Yen, Yifan Hu, Dongwon Lee
Findings of 60th Annual Meeting of the Asso. for Comp. Linguistics (ACL-Findings), Dublin, Ireland, May 2022
2021
Adaku Uchendu, Zeyu Ma, Thai Le, Rui Zhang, Dongwon Lee
Findings of Conf. on Empirical Methods in Natural Language Processing (EMNLP-Findings), November 2021
2020
Adaku Uchendu, Thai Le, Kai Shu, Dongwon Lee
Conf. on Empirical Methods in Natural Language Processing (EMNLP), November 2020
2019
Adaku Uchendu, Jeffrey Cao, Qiaozhi Wang, Bo Luo, Dongwon Lee
Int'l Conf. on Truth and Trust Online (TTO), London, UK, October 2019
Jialin Shao, Adaku Uchendu, Dongwon Lee
11th Int'l ACM Web Science Conf. (WebSci), Boston, MA, July 2019