Projects

Text-to-Speech (TTS) basically consists of synthesizing speech, that is, transforming text into audio. By using Deep Learning it is possible to synthesize speech as close as possible to human speech. So, forget that monotonous, robotic voice, typical of synthesized speeches. With Deep Learning, the synthesized voice is virtually identical to human speech, with all the prosody characteristics of the original speech.

Learn more...

Speech-to-Text (STT) refers to speech transcription, i.e. turning audio into text. Traditional techniques used statistical models, such as Hidden Markov Models (HMMs). Techniques based on Deep Learning have much greater accuracy, requiring, on the other hand, a large amount of data to train the models.

Learn more...


Wav2Lip is a neural network that adapts the video with a face, lip-synchronizing the speech audio different from the original. It is a technology that uses several models based on state-of-the-art neural networks to synchronize human lips in the video recording with an audio track. It is the technology used to generate Deep Fakes.

Learn more...


Voice Cloning allows you to clone the voice of famous people from a small sample. In this way, it is possible to synthesize a person's voice using only a sample of less than one minute.

Learn more...