Here are some suggested exercises for Jurafsky & Martin.
Exercises: 2.1, 2.3, 2.4, 2.5, 2.6, 2.8
The book has no good exercises for chapter 4, so we give some alternative variants.
The Restaurant Corpus
1. Use tables 4.1 / 4.2 to calculate the approximate probabilities for the following sentences:
P( <s> i want chinese food </s> )
P( <s> i want chinese chinese food </s> )
2. Now do the same calculations but using the smoothed tables 4.5 / 4.6 instead.
My Teeny-Weeny Swedish Corpus
Assume we have the following corpus:
<s> en såg såg en såg en såg såg , en annan sågade sågen sågen såg . </s>
1. What is the vocabulary of this corpus?
2. Tabulate C(wi), C(wi,wj), P(wj | wi) in the same way as in tables 4.1 / 4.2. Remember to include the start-/end-of-sentence markers, <s> and </s> (which J&M should have included in their tables, but didn't).
3. Use these tables to calculate the probability of the following sentence:
P( <s> en sågade en såg </s> )
4. Create add-one (Laplacian) smoothed variants of C(wi), C(wi,wj), P(wj | wi), like the tables 4.5 / 4.6.
5. Use these new tables to calculate the probability of the previous sentence.
The solution to this exercise can be found here.
Exercises: 5.1, 5.2, 5.3, 5.4
Exercises: 12.1–12.10
NP grammar in another language
Write a grammar for noun phrases in another language than English, such as your mother tongue.
Exercises: 13.1, 13.3 (assume that the grammar is an nltk.ContextFreeGrammar)
Advanced exercises: 13.4, 13.5, 13.10 (you can use nltk.RegexpParser for this)
Exercises: 14.5
Advanced exercises: 14.1, 14.3, 14.4
Exercises: 15.1, 15.2, 15.3, 15.4, 15.5
Exercises: 19.2+19.3, 19.4, 19.6
Exercises: 20.1, 20.2, 20.3, 20.4
Exercises: 24.1, 24.2, 24.3