Every Breath You Don't Take

News Articles

As we do not own the rights to the audio clips that we use, we provide links to each file for independent download.

We make no promises on the availability of each news article, at the time of publication all links have been verified as working.

Some links are direct downloads, while others are pointers to the webpage of the news article itself.

Training Samples: link

Synthetic Samples: link

There are 56 real samples totaling 25.54 hours and 277 synthetic samples totaling 26.94 hours.

How to use this Dataset

This dataset is considered a test bed for pre-trained Synthetic Audio Detectors.

The real and synthetic samples are broken into training and test splits.
The format of the train/test files is as follows: <CLASS (real/synthetic) | <DOWNLOAD_URL>
If one wishes, the training split may be used to augment existing pre-trained models (we do this in the paper with the SSL-wav2vec ASVspoof model).

The positive class is considered the important class and should be designated as 1 and is synthetic audio in this case. Conversely, the real audio should be designated as 0.

As a final output, a single binary prediction should be given for each audio sample.
If a model requires input size to be limited; each sample should be broken into subsamples and a soft vote on the predictions should be performed to produce a single prediction.

We recommend reporting a suite of metrics to allow for proper comparison and contextualization.
EER (even though the metric is deprecated, it is still the community standard)
AUPRC (as this metric is able to 'focus' on the important class -synthetic speech-)
Precision, Recall, F1
Total number of True positive, False positive, True negative, and False negative values

Page updated

Google Sites

Report abuse