a speech processing project

Project description

This page aims to promote the Mr. Falante, which is a personal project about speech processing in Brazilian Portuguese. This project aims to research, develop and train models based on Deep Learning (DL) for speech processing. Among these projects, models for speech synthesis (Text-to-Speech - TTS), speech transcription (Speech-to-Text - STT), speaker recognition, Speaker Diarization, Denoisers, Upsampling, etc.

Artificial intelligence

Artificial Intelligence (AI) can be seen as the ability of a machine to reproduce human-like skills. It involves developing systems that can learn autonomously, recognizing patterns and generating insights without being explicitly programmed to do so. Unlike traditional programming, AI-based models learn to extract patterns from examples, i.e. data. DL-based models need a lot of data, learning from experience, and thus being able to perform tasks like us humans.

Neural Networks are computational models inspired by the human brain.

Models for speech transcription (STT) based on artificial neural networks are state-of-the-art.

Learn more here.

The Wav2Lip tool uses artificial neural networks for lip syncing in videos.

Learn more here.

Models for speech synthesis (TTS) based on deep learning present results similar to human speech.

Learn more here.

Models that allow you to clone a person's voice from a small sample.

Learn more here.


Get in touch by email
