readings:
- J&M, chapter 9 (don't worry about the signal processing too much, unless you find it interesting!)
- J&M, chapter 3
We've been talking a lot about the NOISY CHANNEL MODEL recently.
What do we write down for the noisy channel model for speech recognition?
Let's talk about that for a minute. Let's ask the question: what's hidden? What's observed? Can we reasonably say that the hidden thing is causing the observed thing?
Maybe this is a pretty good use of the noisy channel model!
http://tinyurl.com/alexr-asr-intro-2012
Let's write down the chart we've been using for POS tagging.
In POS tagging, what are the states?
In ASR, what are the states?
We kind of want to say that there are two kinds of transitions in the ASR case. There are transitions between phones, and then there are transitions between words.
And we have a model of the transitions between words... (WHAT IS IT?) (HOW DO WE MODEL TRANSITIONS OVER WORDS?)
Formally, what's a FST?
- It has a finite number of states.
- an input alphabet
- an output alphabet
- start state
- some final states
- transitions
- outputs
- optionally weights!!
What might you represent with the weights?
Four things we can do with an FST:
- recognize things: is this string in this language, with a given output?
- generate things: just ask it to produce pairs of strings for us. Sample from it!
- translate things: read one string and produce another string (possibly many strings!)
- relations over sets: description of our beliefs about the relationship between one set and another set.