Frontiers in Deepfake Voice Detection and Beyond

Honolulu, Hawaii, USA | Dec. 07, 2025 (SUN)

About

The goal of this special session is to address the emerging challenges beyond real vs. fake.

Background

With the rapid development of deep learning and generative AI, it has become increasingly easy to create or manipulate synthetic media. Even inexperienced users can now generate highly realistic content with minimal effort. While this progress brings exciting opportunities, it also introduces significant risks, particularly when these technologies are misused for malicious purposes such as spreading misinformation or compromising security systems.

Justification

However, there is currently no dedicated session or related topic at ASRU 2025 that addresses the emerging frontiers beyond conventional deepfake detection. Although researchers have made significant progress on conventional deepfake detection tasks within the well-known challenges and databases such as ASVspoof and ADD, emerging topics such as Partial Spoof (synthesized or transformed speech segments are embedded into a bona fide audio) [1][2], CodecFake (deepfake speech generated by codec-based speech generation) [3][4], and proactive defense have attracted growing interest. Despite their growing relevance, these topics have not yet received sufficient attention within the speech community. Additionally, we include singing voice deepfake detection (SVDD) to address emerging challenges that have not yet been explored compared to the first SVDD special session at SLT 2024 [5]. Emerging challenges for SVDD include detecting fake singing voices generated by full-song generation (i.e., Suno, Udio, etc.), source singer identification, etc. Beyond these topics, many emerging challenges have received limited attention in conventional deepfake detection. For example, how can we proactively protect our voice as generated speech becomes increasingly realistic, especially if perfect detection is theoretically unsolvable? What strategies are effective when domain shifts occur due to evolving generative models or language/environment/speaker change? How can we trace the source or creator of a spoofed voice? Can humans reliably detect spoofed audio? How can we design reliable countermeasures? etc. We believe now is the right time to bring together researchers from the anti-spoofing community to broaden perspectives and go beyond conventional deepfake detection to explore new frontiers, foster discussion, share insights, and advance the field.

Objectives

This special session aims to address the emerging challenges posed by recent advances in deepfake audio generation, and to promote research beyond conventional anti-spoofing tasks. We seek to foster discussion and innovation in tackling new forms of deepfakes while ensuring robustness against existing threats and mitigating catastrophic forgetting. Specifically, we welcome contributions that explore novel detection techniques, evaluation protocols, and datasets, etc..

Acknowledgment

We would like to thank Prof. Junichi Yamagishi, Prof. Tomi Kinnunen, and Prof. Thomas Fang Zheng for their valuable comments on title and topics.

References

[1] Zhang, L., Wang, X., Cooper, E., Evans, N. and Yamagishi, J., 2022. The partialspoof database and countermeasures for the detection of short fake speech segments embedded in an utterance. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31, pp.813-825.

[2] Yi, J., Tao, J., Fu, R., Yan, X., Wang, C., Wang, T., Zhang, C.Y., Zhang, X., Zhao, Y., Ren, Y. and Xu, L., 2023. Add 2023: the second audio deepfake detection challenge. arXiv preprint arXiv:2305.13774.

[3] Du, J., Chen, X., Wu, H., Zhang, L., Lin, I., Chiu, I., Ren, W., Tseng, Y., Tsao, Y., Jang, J.S.R. and Lee, H.Y., 2025. CodecFake+: A Large-Scale Neural Audio Codec-Based Deepfake Speech Dataset. arXiv preprint arXiv:2501.08238.

[4] Xie, Y., Lu, Y., Fu, R., Wen, Z., Wang, Z., Tao, J., Qi, X., Wang, X., Liu, Y., Cheng, H. and Ye, L., 2025. The codecfake dataset and countermeasures for the universally detection of deepfake audio. IEEE Transactions on Audio, Speech and Language Processing.

[5] Zhang, Y., Zang, Y., Shi, J., Yamamoto, R., Toda, T. and Duan, Z., 2024, December. Svdd 2024: The inaugural singing voice deepfake detection challenge. In 2024 IEEE Spoken Language Technology Workshop (SLT) (pp. 782-787). IEEE.

Call for Papers

- Partial Spoof Detection, Localization, and Diarization
- Detection of Codec-Based Deepfake Speech (CodecFake)
- Proactive Defense
- Singing Voice Deepfake Detection
- Multimodal Deepfake Detection
- Adversarial Attack and Defenses
- Generalization and/or Domain Adaptation
- Source Tracing
- Human Perception vs. Machine Detection
- Analysis, Explainability, and Evaluation on Deepfake Detection
- Defenses Against Other Attacks (e.g., Shallow Fakes, Tampering, Fake Emotion, etc.)

Submission Link: Submit Paper

Instruction: We are following the same Author Instructions for ASRU 2025. When submitting, please be sure to select: "SS2. Frontiers in Deepfake Voice Detection and Beyond" as your primary subject area to ensure your paper is properly considered for inclusion.

Tip: If you are interested in responsible generative AI, we encourage you to consider submitting to our partner special session at IEEE ASRU 2025: "Responsible Speech and Audio Generative AI." ⚔️ 🛡️

Scientific Committee

(sorted in alphabetical order by last name)

Nicholas Andrews, Johns Hopkins University
Jean-Francois Bonastre, Avignon University
Zexin Cai, Johns Hopkins University
Jianwu Dang, SIAT
Rohan Kumar Das, Fortemedia Singapore
Hector Delgado, Microsoft
Jiawei Du, National Taiwan University
Ruibo Fu, Chinese Academy of Sciences
Leibny Paola Garcia, Johns Hopkins University
Wanying Ge, National Institute of Informatics
Jee-weon Jung, Apple
Piotr Kawa, Resemble AI
Tomi Kinnunen, University of Eastern Finland
Ivan Kukanov, KLASS
Kong Aik Lee, The Hong Kong Polytechnic University
Menglu Li, Toronto Metropolitan University
Tianchi Liu, Tencent Singapore
Xuechen Liu, National Institute of Informatics
Xu Li, Xiaohongshu

Ming Li, DKU, Duke, WHU
Hieu-Thi Luong, Nanyang Technological University
Jagabandhu Mishra, University of Eastern Finland
Nicolas M. Müller, Fraunhofer AISEC
Oldrich Plchot, Brno University of Technology
Johan Rohdin, Brno University of Technology
Md Sahidullah, TCG CREST & AcSIR
Davide Salvi, Politecnico di Milano
Chng Eng Siong, Nanyang Technological University
Themos Stafylakis, AUEB | Omilia
Piotr Syga, Wrocław Tech
Hye-jin Shim, Carnegie Mellon University
Longbiao Wang, Tianjin University
Matthew Wiesnerr, Johns Hopkins University
Marcin Witkowski, AGH UST
Zhizheng Wu, Chinese University of Hong Kong
Junichi Yamagishi, National Institute of Informatics
Ruiteng Zhang, Tianjin University
Yi Zhu, Reality Defender & INRS

Organizers

Dr. Lin Zhang (session chair)

Johns Hopkins University

Mr. You Zhang

University of Rochester

Dr. Haibin Wu (session chair)