Software

Framework for assessing self-supervised learning (SSL) representations for speech processing

LeBenchmark (http://lebenchmark.com) is a reproducible benchmark for evaluating speech SSL models for different speech tasks in French:

Speech Recognition (ASR)
Spoken Language Understanding (SLU)
Speech Translation (AST)
Emotion Recognition (AER).

Models: https://huggingface.co/LeBenchmark

Papers:

Task Agnostic and Task Specific Self-Supervised Learning from Speech with LeBenchmark at NeurIPS 2021 Datasets and Benchmarks Track
LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech at Interspeech 2021

Privacy preserving speech processing software

Implementation of two different baseline voice anonymization systems:

Baseline-1: Anonymization using x-vectors and neural waveform models
Baseline-2: Anonymization using McAdams coefficient

Metrics to assess anonymization integrated in the setup:

EER * Cllr * Cllr_min * Zero evidence biometric recognition assessment (ZEBRA) framework metrics: expected privacy disclosure (population) and worst case privacy disclosure (individual) * Linkability * Voice Similarity Matrices * De-identification * Gain of Voice Distinctiveness
WER (utility metric)

Papers:

2020:

2022:

The VoicePrivacy 2022 Challenge evaluation plan

Automatic speech recognition software

Mixup

Implementation of the mixup (between class learning) technique for ASR training and its extension for sequence-trained neural networks on lattice-free MMI (in Kaldi).

Papers:

Data

Corpus: TED-LIUM Release 3

Paper: TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation

Contents:

2351 audio talks in NIST sphere format (SPH), including talks from TED-LIUM 2
452 hours of audio
2351 aligned automatic transcripts in STM format
Dictionary with pronunciations (159848 entries), same file as the one included in TED-LIUM 2

Two corpus distributions:

the legacy one, on which the dev and test datasets are the same as in TED-LIUM 2 (and TED-LIUM 1).
the ‘speaker adaptation’ one, especially designed for experiments on speaker adaptation.