Investigates various Speech signal processing schemes for acoustic modeling so that more robust speech recognition can be achieved. Our aim is to perform the state-of-art research providing effective means for achieving:
Automatic Speech Recognition
Admin 2021-04-08 👁️ 498
Automatic Speech Recognition
Contents
1. Introduction
2. The Acoustic modeling of speech recognition unit
3. Statistical Language Modeling
4. Word Network
5. Lexical Decoding
6. Application Demos
1. Introduction
Automatic speech recognition system is composed of feature extraction, acoustic modeling, language modeling and searching. We estimate parameters of acoustic models using training data and estimate language model using text corpora. Then, we decode speech signal into recognized word sequence using acoustic models, language models and word network.
2. Acoustic Modeling of Speech Recognition Unit
Acoustic model describes how speech signal is expressed. Recently, the most frequently used acoustic model is HMM (Hidden Markov Model). Each HMM models temporal and spectral variation of a speech-recognition unit. We estimate parameters of acoustic models using training data.
The choice of speech recognition units
whole-words : Context Independent, Context Dependent.
subword segments : phone, syllable, semisyllable, triphone, diphone etc.
The training of speech recognition unit model
Baum-Welch algorithm
Discriminative training
3. Statistical Language Modeling
The probabilistic relationship among a sequence of words can be directly derived and modeled from the corpora with the statistical language models. We mainly use bigram or trigram language model as n-grams language model.
4. Word Network
We use two kinds of networks i.e. linear lexicon and lexical tree. Linear lexicon is composed of words in parallel and used for small vocabulary recognition. Lexical tree holds previously listed pronunciations in common and is used for large vocabulary recognition.
5. Lexical Decoding
Lexical decoding of continuous speech is to find the word sequence of the highest score out of all possible word sequences given observations sequence, acoustic model and language model using word network. In evaluation (recognition), Viterbi decoding and forward-backward algorithm are used.
6. Application Demos
6.1 Voice Navigation
6.2 Keyword Recognition
6.4 LVCSR Demo (English)