Publiceringsdatum: Nov 24, 2011 10:6:10 AM
I have added a hint to part 2 of lab 2. Here it is:
Hint 3: You cannot flatten the list of sentences into a long list of words, because then you will lose the beginnings and ends of sentences. Instead you have to loop over one sentence at the time and update the frequency distribution inside the loop. Here is some pseudo-code of how you can calculate the frequency distribution of the ngrams:
create an empty frequency distribution
for each tagged sentence in the corpus:
create a list of tags from the sentence (which is a list of (word, tag) pairs)
create a list of the tag ngrams from the list of tags
for each ngram in the list of ngrams:
increase the count of the ngram in the frequency distribution
After this you should have a frequency distribution of the ngrams in the corpus. It is this distribution that contains the information you need to print the rows in the statistics table.