Brazilian Speech Database

What is BrSD?

BrSD is a novel dataset freely available created to support the development of speech-based recognition tasks. It is composed by 400 utterances from 80 different contributors, all of them brazilians and portuguese speakers. It is recommended for classification tasks related to speech recognition, for instance age, gender and accent.

Who developed BrSD?

The database was developed by Marco Aurélio Deoldoto Paulino, Alisson Renan Svaigen and Yandre Maldonado e Gomes da Costa from State University of Maringá (UEM), Maringá, Paraná, Brazil.

BrSD had the following contributors in its development: Linnyer Beatrys Ruiz Aylon, from State University of Maringá (UEM); Alceu S. Britto Jr., from Pontifical Catholic University of Paraná (PUC-PR); and Luiz E. S. Oliveira, from Federal University of Paraná (UFPR).

Content

This website provides to researchers the following items:

BrSD files;
BrSD spectrograms;
Audio and Visual Features generated by the database creators;
Index of papers which use BrSD.
Contact

BrSD Specification

A comprehensive specification of BrSD structure, speakers, utterances and studies carried out is available here.

BrSD Content Files

You can download the BrSD version 1.0 here

The zip file is organized as follows:

A directory named "utterances", containing 400 utterances in wma format;
A pdf file named "utterances_info", containing meta information about the utterances, for instance speaker, language, age, gender, etc.

BrSD Spectrograms

Spectrograms are a visual representation of the audio signal and have been widely used in many different audio classification tasks.

We provided spectrograms from 400 BrSD utterances. They are excerpted with SoX, a free software that can convert various computer audio file formats into others. The generated spectrograms have frequency limit of 3.4 kHz and the amplitude range from -60 dBFS to 0 dBFS.

Spectrograms can be downloaded here.

Audio and Visual Features

We provided four acoustic-based features and one visual-based feature for researchers. The available features are:

Local Binary Pattern (LBP), considering 8 neighboring pixels equidistant to a radius of 2 pixels;
Mel Frequency Cepstral Coefficients (MFCC), generated by 50 windows of 30 milliseconds equally spaced;
Rhythm Pattern (RP);
Rhythm Histogram (RH);
Statistical Spectrum Descriptor (SSD);

RP, RH and SSD was generated by RP Extract framework in its standard parametrization.

Features can be downloaded here.

Index Papers

Paulino, M. A et al. A Brazilian Speech Database. In: Tools with Artificial Intelligence (ICTAI), 2018 IEEE 30th International Conference on. IEEE, 2018.

Contact

For suggestions, contributions or direct contact with creators, you can send an e-mail to:

brazilian.speech.database@gmail.com

Google Sites

Report abuse