RESEARCH

Acoustic & Speech

Image & Vision

Natural Language Processing

Acoustic & Speech

Investigates various Speech signal processing schemes for acoustic modeling so that more robust speech recognition can be achieved. Our aim is to perform the state-of-art research providing effective means for achieving:

Automatic Speech Recognition

Admin 2021-04-08 👁️ 498

Automatic Speech Recognition

Contents

1. Introduction

2. The Acoustic modeling of speech recognition unit

3. Statistical Language Modeling

4. Word Network

5. Lexical Decoding

6. Application Demos

1. Introduction

Automatic speech recognition system is composed of feature extraction, acoustic modeling, language modeling and searching. We estimate parameters of acoustic models using training data and estimate language model using text corpora. Then, we decode speech signal into recognized word sequence using acoustic models, language models and word network.

2. Acoustic Modeling of Speech Recognition Unit

Acoustic model describes how speech signal is expressed. Recently, the most frequently used acoustic model is HMM (Hidden Markov Model). Each HMM models temporal and spectral variation of a speech-recognition unit. We estimate parameters of acoustic models using training data.

The choice of speech recognition units
- whole-words : Context Independent, Context Dependent.
- subword segments : phone, syllable, semisyllable, triphone, diphone etc.
The training of speech recognition unit model
- Baum-Welch algorithm
- Discriminative training

3. Statistical Language Modeling

The probabilistic relationship among a sequence of words can be directly derived and modeled from the corpora with the statistical language models. We mainly use bigram or trigram language model as n-grams language model.

4. Word Network

We use two kinds of networks i.e. linear lexicon and lexical tree. Linear lexicon is composed of words in parallel and used for small vocabulary recognition. Lexical tree holds previously listed pronunciations in common and is used for large vocabulary recognition.

5. Lexical Decoding

Lexical decoding of continuous speech is to find the word sequence of the highest score out of all possible word sequences given observations sequence, acoustic model and language model using word network. In evaluation (recognition), Viterbi decoding and forward-backward algorithm are used.

6. Application Demos

6.1 Voice Navigation

6.2 Keyword Recognition

6.4 LVCSR Demo (English)

Page updated

Report abuse