Text-To-Speech Conversion
Text-To-Speech Conversion
After using OCR to extract extract from speeches, we use a text-to-speech converter. For people who might have an even tough time reading what OCR has extracted for them, the text-to-speech algorithm will help them understand the image contents even better.
We implement Text-to-Speech in VRT using a python library called pyttsx3. This library uses the eSpeak speech engine, which is a compact open-source speech synthesis engine that supports a variety of languages. To convert our text into English, we explicitly state it by using the 0th index in the voices array provided to us by pyttsx3. The voice is then played to the console and stores to a mp3 files.