Tutorials

What are the components of a modern speech recognition system?

Building a automatic speech recognition (ASR) system using deep neural networks will usually involve training a supervised acoustic model using speech and corresponding text. The performance can be improved using a language model, trained only on relevant text (i.e text from target language, domain, dialect etc). The performance can be further boosted by obtaining acoustic features from pre-trained self-supervised models such as Wav2Vec2.


How to build a speech recogniser?

There are many high quality toolkits built to train and decode ASRs. Some of them are listed below -

Note that each of them has a specific data preparation format and some toolkit specific features. You will find installation instructions and training tutorials in the links above.


Where can I learn more about speech recognition systems?

There are a lot of resources available online on ASRs. Some of them are listed below -