DEEPFOX 2.0 : A Multi-Lingual Audio Deepfake Corpus
All the registered participants team leader will be provided the dataset via registered email address within 2-4 working days only after dates as mentioned in timeline.
The DeepFox 2.0 dataset is a large-scale multilingual resource designed for audio deepfake detection, encompassing 20 languages, including 6 indian (Sanskrit, Hindi, Gujarati, Bengali, Punjabi, Tamil) and 14 International (Arabic,, French,, Russian, Portuguese, Spanish, Vietnamese, German, Urdu, Bahasa Indonesia, Japanese, Swedish, Finnish, Mandarin Chinese, and English). The dataset is carefully structured, with separate training and testing sets to ensure reliable evaluation of detection models. The training and validation set consists of 8 different deepfake generation techniques, including 2 GAN-based methods, 2 voice conversion (VC) methods, and 4 text-to-speech (TTS) synthesis methods. The testing set is significantly more diverse, incorporating 51 deepfake models, among which 6 employ VC and speech-to-speech synthesis techniques, 6 use GAN-based approaches, 3 are generated using large language models (LLMs), and the remaining 38 utilize TTS methods.
With a total of 249k files in the training and validation set and 160k files in the testing set, DeepFox 2.0 provides an extensive and diverse benchmark for deepfake detection. Additionally, the dataset maintains text symmetry between real and fake audio samples to ensure consistency in evaluation. To achieve this, XLSR ASR was employed to generate transcripts of real audio, aligning text content across real and synthetic speech. As one of the largest and most comprehensive multilingual deepfake detection datasets, DeepFox 2.0 is an invaluable resource for researchers and practitioners working in AI security, speech forensics, and adversarial detection.