Here is a NLTK dry-run of the teeny-weeny corpus exercise:
>>> import nltk
>>> corpus = u"<s> en såg såg en såg en såg såg , en annan sågade sågen sågen såg . </s>".split()
>>> sentence = u"<s> en sågade en såg </s>".split()
>>> vocabulary = set(corpus)
>>> len(vocabulary)
9
>>> cfd = nltk.ConditionalFreqDist(nltk.bigrams(corpus))
# The corpus counts of each bigram in the sentence:
>>> [cfd[a][b] for (a,b) in nltk.bigrams(sentence)]
[1, 0, 0, 3, 0]
# The counts for each word in the sentence:
>>> [cfd[a].N() for (a,b) in nltk.bigrams(sentence)]
[1, 4, 1, 4, 6]
# The MLE probability for each bigram:
>>> [1.0 * cfd[a][b] / cfd[a].N() for (a,b) in nltk.bigrams(sentence)]
[1.0, 0.0, 0.0, 0.75, 0.0]
# There is already a FreqDist method for MLE probability:
>>> [cfd[a].freq(b) for (a,b) in nltk.bigrams(sentence)]
[1.0, 0.0, 0.0, 0.75, 0.0]
# The probability of the sentence is the product of all bigram probabilities:
>>> reduce(lambda x,y:x*y, _)
0.0
# Laplace smoothing of each bigram count:
>>> [1 + cfd[a][b] for (a,b) in nltk.bigrams(sentence)]
[2, 1, 1, 4, 1]
# We need to normalise the counts for each word:
>>> [len(vocabulary) + cfd[a].N() for (a,b) in nltk.bigrams(sentence)]
[10, 13, 10, 13, 15]
# The smoothed Laplace probability for each bigram:
>>> [1.0 * (1+cfd[a][b]) / (len(vocabulary)+cfd[a].N()) for (a,b) in nltk.bigrams(sentence)]
[0.20000000000000001, 0.076923076923076927, 0.10000000000000001, 0.30769230769230771, 0.066666666666666666]
# The smoothed probability of the sentence:
>>> reduce(lambda x,y:x*y, _)
3.1558185404339259e-05
# Or in human-readable form:
>>> print "%.10f" % _
0.0000315582
Here is a more compact dry-run, using NLTK's internal ConditionalProbDist, MLEProbDist and LaplaceProbDist:
# MLEProbDist is the unsmoothed probability distribution:
>>> cpd_mle = nltk.ConditionalProbDist(cfd, nltk.MLEProbDist, bins=len(vocabulary))
# Now we can get the MLE probabilities by using the .prob method:
>>> [cpd_mle[a].prob(b) for (a,b) in nltk.bigrams(sentence)]
[1.0, 0.0, 0.0, 0.75, 0.0]
# LaplaceProbDist is the add-one smoothed ProbDist:
>>> cpd_laplace = nltk.ConditionalProbDist(cfd, nltk.LaplaceProbDist, bins=len(vocabulary))
# Getting the Laplace probabilities is the same as for MLE:
>>> [cpd_laplace[a].prob(b) for (a,b) in nltk.bigrams(sentence)]
[0.20000000000000001, 0.076923076923076927, 0.10000000000000001, 0.30769230769230771, 0.066666666666666666]