CS 639: Final Project - Text-to-Speech

Text-To-Speech Conversion

Why Text-To-Speech:

After using OCR to extract extract from speeches, we use a text-to-speech converter. For people who might have an even tough time reading what OCR has extracted for them, the text-to-speech algorithm will help them understand the image contents even better.

Implementation in VRT:

We implement Text-to-Speech in VRT using a python library called pyttsx3. This library uses the eSpeak speech engine, which is a compact open-source speech synthesis engine that supports a variety of languages. To convert our text into English, we explicitly state it by using the 0th index in the voices array provided to us by pyttsx3. The voice is then played to the console and stores to a mp3 files.

Sample Output:

audio_1.mp3

audio_7.mp3

Page updated

Report abuse