The project TaRSila aims at growing speech datasets for Brazilian Portuguese language, looking to achieve state-of-the-art results for the following tasks: 

(a) automatic speech recognition (ASR) that automatically transcribes speech;

(b) multi-speaker synthesis (TTS) that generates several voices from different speakers;  

(c) speaker identification/verification that selects a speaker from a set of predefined members (speakers seen during the training of the models --- called closed-set sceneario --- or in open-set scenario in which the verification occurs with speakers not seen during the training of the models); and 

(d) voice cloning that usess a few minute/second voice dataset to train a voice model with synthesis methods, which can read any text in the target voice.

In TaRSila,  we manually validated speech datasets of academic projects such as:  (i) Nurc-Recife (OLIVEIRA JR, 2016); (ii) SP 2010 (MENDES, 2013); (iii) ALIP (GONÇALVES, 2019); and (iv) C-ORAL Brasil (RASO & MELLO, 2012). 

A collection of 365 hours of the Museu da Pessoa (MuPe) life-stories was processed to be be part of our large corpus CORAA (COrpus de Aúdios Anotados) and NURC-SP  Audio Corpus  was also processed for the purpose of training ASR models. See details of all the datasets created on CORAA Versions.

Regarding the tools, we aim to investigate recent deep learning methods for training robust ASR and TTS models for Portuguese. 

The project also foresees applications in semantic search from speech transcriptions, as well as sentiment analysis and automatic organization of speech datasets into topics.