Resources

The SpeechBrain project

SpeechBrain is an open-source and all-in-one speech toolkit relying on PyTorch.

The goal is to create a single, flexible, and user-friendly toolkit that can be used to easily develop state-of-the-art speech technologies, including systems for speech recognition (both end-to-end and HMM-DNN), speaker recognition, speech separation, multi-microphone signal processing (e.g, beamforming), self-supervised and unsupervised learning, speech contamination / augmentation, and many others. The toolkit will be designed to be a stand-alone framework, but simple interfaces with well-known toolkits, such as Kaldi will also be implemented.

SpeechBrain is currently under development and has been announced in September 2019. A first alpha version will be available in the next months.

https://speechbrain.github.io/

PyTorch-Kaldi:

The PyTorch-Kaldi project aims to bridge the gap between the Kaldi and the PyTorch toolkits, trying to inherit the efficiency of Kaldi and the flexibility of PyTorch. PyTorch-Kaldi is not only a simple interface between these software, but it embeds several useful features for developing modern speech recognizers. I think it could also be particularly useful for people that would like to try their model in a speech recognition task but are not 100% familiar with the full and rather complex pipeline of a modern speech recognizer.

https://github.com/mravanelli/pytorch-kaldi

SincNet:

SincNet is a neural architecture for processing raw audio samples. It is a novel Convolutional Neural Network (CNN) that encourages the first convolutional layer to discover more meaningful filters. SincNet is based on parametrized sinc functions, which implement band-pass filters.

https://github.com/mravanelli/SincNet

DIRHA ENGLISH CORPUS

The corpus is composed of both real and simulated sequences recorded with 32 sample-synchronized microphones in a domestic environment. The database contains signals of different characteristics in terms of noise and reverberation making it suitable for various multi-microphone signal processing and distant speech recognition tasks. The part of the dataset currently released is composed of 6 native US speakers (3 Males, 3 Females) uttering 409 wsj sentences. The current repository provides the related Kaldi recipe and the tools that are necessary to generate the training material for kaldi-based distant speech recognizer.

The corpus can be download from the Linguistic Data Consortium. The baselines can be found here:

https://github.com/SHINE-FBK/DIRHA_English_wsj

https://github.com/SHINE-FBK/DIRHA_English_phrich

DIRHA SIMULATED CORPUS

The DIRHA II Simulated Corpus is a multi-microphone, multi-room and multi-language database generated in the context of the DIRHA project.

The overall corpus, which is now available in 4 different languages (Italian, German, Portuguese and Greek), includes 675 simulated acoustic sequences of duration 60 seconds observed by 40 microphones distributed over 5 different rooms (living-room, kitchen, bedroom, bathroom and corridor) of a real apartment. The sampling rate is 48kHz.

Each sequence consists of real background noise (actually recorded in the target apartment) with superimposed various localized acoustic events (Speech and Noises) occurring randomly in time and in space with various amplification gains.

All the occurrences, which generated by means of a multi-microphone simulation tool (MMSS) implemented in Matlab, are completely documented and fully annotated in specific text files.

Due to the impressive realism, the huge amount of microphone and positions, this corpora can be suitable for Distant-talking Speech Recognition, Acoustic Localization, Multi-microphone signal processing, Acoustic Echo Cancellation (AEC), Source Separation, Acoustic Event Detection and Classification, and Speech/non-speech discrimination.

Several tests have demonstrated the suitability of the corpus for experiments under the DIRHA project.

We plan to extend soon the corpus by including the English version.

Data will be made publicly available at the end of the DIRHA project (December 2014).

Parts of the corpus have been released under the following initiatives:

- HSCMA 2014

- EUSIPCO 2014

- EVALITA 2014

You can find some samples here:

- Single channel example

- 6 complete sequences (1GB, 40 channels)

For more info: http://dirha.fbk.eu/simcorpora

MAIN PAPER:

    • L. Cristoforetti, M. Ravanelli, M. Omologo, A. Sosi, A. Abad, M. Hagmueller, P. Maragos, "The DIRHA simulated corpus", in Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), 2014, pp. 2629-2634.

RELATED PAPERS:

    • A. Brutti, M. Ravanelli, P. Svaizer, M. Omologo, "A speech event detection and localization task for multiroom environments", in Proceedings of the 4th Joint Workshop on Hands-free Speech Communication and Microphone Array, HSCMA 2014, Nancy, France
    • M. Matassoni, R. Astudillo, A. Katsamanis, M. Ravanelli, “The DIRHA-GRID corpus: baseline and tools for multi-room distant speech recognition using distributed microphones”, in Proceedings of INTERSPEECH 2014, Singapore.
    • M. Ravanelli, A. Sosi, P. Svaizer, M. Omologo, "Impulse response estimation for robust speech recognition in a reverberant environment". In Proceedings of EUSIPCO 2012.