1. Tokenizing Text Files
a. What is tokenization?
2. Mapping Text files into Python Dictionaries
3. Coming up with features that characterize a given text:
a. Average sentence length
b. Word counts
c. Frequency Histograms
d. N-grams
4. Building a classifier to characterize a text
5. Comparing Text of two different authors or literary genres
a. How to make a classifier using text features
i. Definition and examples of a distance metric
ii. Making a classifier from the text features