Software
Framework for assessing self-supervised learning (SSL) representations for speech processing
https://github.com/LeBenchmark
LeBenchmark (http://lebenchmark.com) is a reproducible benchmark for evaluating speech SSL models for different speech tasks in French:
Speech Recognition (ASR)
Spoken Language Understanding (SLU)
Speech Translation (AST)
Emotion Recognition (AER).
Models: https://huggingface.co/LeBenchmark
Papers:
Task Agnostic and Task Specific Self-Supervised Learning from Speech with LeBenchmark at NeurIPS 2021 Datasets and Benchmarks Track
LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech at Interspeech 2021
Privacy preserving speech processing software
Baseline voice anonymization systems and evaluation software for the VoicePrivacy 2022 Challenge;
Baseline voice anonymization systems and evaluation software for the VoicePrivacy 2020 Challenge:
Implementation of two different baseline voice anonymization systems:
Baseline-1: Anonymization using x-vectors and neural waveform models
Baseline-2: Anonymization using McAdams coefficient
Metrics to assess anonymization integrated in the setup:
EER * Cllr * Cllr_min * Zero evidence biometric recognition assessment (ZEBRA) framework metrics: expected privacy disclosure (population) and worst case privacy disclosure (individual) * Linkability * Voice Similarity Matrices * De-identification * Gain of Voice Distinctiveness
WER (utility metric)
Papers:
2020:
Design Choices for X-Vector Based Speaker Anonymization at Interspeech 2020
Speaker anonymisation using the McAdams coefficient at Interspeech 2021
The Privacy ZEBRA: Zero Evidence Biometric Recognition Assessment at Interspeech 2020
Speech Pseudonymisation Assessment Using Voice Similarity Matrices at Interspeech 2020
Supplementary material to the paper The VoicePrivacy 2020 Challenge: Results and findings
2022:
Automatic speech recognition software
Implementation of the mixup (between class learning) technique for ASR training and its extension for sequence-trained neural networks on lattice-free MMI (in Kaldi).
Papers:
Data
Paper: TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation
Contents:
2351 audio talks in NIST sphere format (SPH), including talks from TED-LIUM 2
452 hours of audio
2351 aligned automatic transcripts in STM format
Dictionary with pronunciations (159848 entries), same file as the one included in TED-LIUM 2
Two corpus distributions:
the legacy one, on which the dev and test datasets are the same as in TED-LIUM 2 (and TED-LIUM 1).
the ‘speaker adaptation’ one, especially designed for experiments on speaker adaptation.
Related software links: