1). What is tagging ?
a). Associating a class tag with word e.g. V for verb , NN for noun
2). How to identify words with different parts of speech
a). Regular expressions and list of commonly used parts of speech
3). Constituent Phrases
a). What are they ,how are they formed and why they useful
4). Using Constituent Phrases to extract information from a text file
Part 1 : Write a Python program to tag a text file each of the words in the file and write out a new file consisting of tuples made up of a tag and word. To aid in writing the function you will be provided with the following files:
a) Common verbs with tenses
b) functions that return lists of determiners, adjectives, conjunctions,
prepositions, and various pronouns.
Use regular expressions to search for Proper Nouns, Adverbs, Cardinals, Conditional Adjectives, or any other POS that you feel will be helpful.
Due 3/11 (end of class)
Part 2:
1). Find all occurrences of either two or three consecutive NNP tags.
Concatenate the words associated with each occurrence and add this to a list.
2). Find all occurrences of tag pairs or triplets in the form either NNP CD or NNP NNP
CD. Concatenate the words associated with each occurrence and add this to a list.
3). Using the tagged file that you wrote out find all 5-gram and 6-gram tag occurrences
in the file and determine the count for each one. For the 5-gram and 6-gram tags that
have the maximum count concatenate the words associated with each occurrence of
these tags for both the 5-gram and 6-gram cases.
Due 3/18