NLTK

Here are some suggested exercises from the NLTK book.

Chapter 1

Easy: 6, 7, 12, 13, 16

Intermediate: 19, 20, 24, 26, 27, 28

Chapter 2

Easy: 4

Intermediate: 8, 15, 16, 17, 18, 22

Difficult: 23, 25

Random trigram generation

In the Nov. 7 lecture, we modified example 2.5 to do random bigram generation. Like this:

import random

def generate_model(cfdist, word, num=15):

for i in range(num):

print word,

word = generate_next(cfdist[word])

def generate_next(fdist):

rnd = random.randrange(fdist.N())

ctr = 0

for word in fdist:

ctr += fdist[word]

if ctr > rnd:

return word

>>> text = nltk.corpus.genesis.words('english-kjv.txt')

>>> bigrams = nltk.bigrams(text)

>>> cfd = nltk.ConditionalFreqDist(bigrams)

>>> generate_model(cfd, 'living')

This exercise is then to modify the two functions to do trigram generation instead. Then you need a CFD created from ((word1,word2), word3) tuples, and the generate_model function need to remember the last two words.

Chapter 3

Easy: 6, 7, 8, 10, 14, 15

Intermediate: 19, 23, 25, 27, 29

Difficult: 39, 41

Word segmentation

Try out the word segmentation implementation in section 3.8, on a bigger example corpus. E.g., try the first N words in the Brown corpus. (Start with a relative small value of N such as 100, then increase successively until it takes too long.)

Chapter 4

Easy: 2, 3, 7, 9, 10

Intermediate: 12, 13, 15, 16, 17, 21

Difficult: 29

Chapter 5

Intermediate: 14, 15, 17, 19, 20, 29

Chapter 6

Easy: 2, 4, 5

Chapter 7

Easy: 1, 2

Intermediate: 4, 5

Difficult: 11, 12

Chapter 8

Easy: 4, 5, 6, 7, 9, 13

Intermediate: 24

Difficult: 30, 35

Chapter 9

Easy: 1, 2

Intermediate: 6, 8, 9, 12

Page updated

Google Sites

Report abuse