Chapter 4: Language Modeling with N-grams

Important Terms

probability
- the chance of a certain word appearing after another
spelling correction
- using probability to determine if the word present is favored probability-wise
language models
- the model that does the predicting
Markov assumptions
- when a word's probability depends only on the immediate words before it
extrinsic evaluation
- putting an algorithm into a larger program and testing how much better the larger program is
intrinsic evaluation
- measuring the quality of an algorithm by itself
training set
- the corpus or data used when first making the model to "start out"
test set
- the unseen corpus or data that tells us how good our model is
perplexity
- the inverse probability to the 1/N root (where N is the number of words)

Chapter Notes

I. Introduction

A. you can use the probability (#1) of certain words appearing after others to correct spelling (#2), suggest suggestions, and provide more accurate translations

B. the simplest language model (#3) is the N-gram

1) 2-gram (bigram)

would predict "please turn" or "your homework"

2) 3-gram (trigram)

would predict "please turn your" or "turn your homework"

II. N-grams

A. P(w|h) is the probability of a certain word w appearing after some phrase h

ex) <s> I am Sam </s>

<s> I do not like green eggs and ham </s>

P(I|<s>) = 2/3

P(Sam|<s>) = 1/3

P(am|I) = 2/3

P(</s>|Sam) = 1/2

P(Sam|am) = 1/2

P(do|I) = 1/3

P(i want english food) = 0.25 * 0.33 * 0.0011 * 0.5 * 0.68 = 0.000031

or, use the log probability

def: P1 * P2 * P3 = exp(logP1 + logP2 + logP3)

III. evaluating language models

A. you can do it extrinsically (#5)

B. or you can do it intrinsically (#6), using a training set (#7) to get a prototype list of probabilities, and then apply it to a test set (#8)

C. once you use the training set enough, you call it the development set (or devset), and then use a different training set going forward

D. NEVER let the model see a sentence from the test set while training it

E. perplexity (#9)

1) you're looking to minimize it throughout the experiment

Google Sites

Report abuse