Home
This is the site where we will build the lecture notes for Indiana University's CSCI-B 490, Natural Language Processing.
There won't be slides for the class (probably). What there will be is readings from the textbook and discussions in class! I'll structure discussions around the relevant notes, but if we get off-topic, then we get off-topic. Please consider these notes to be one reference of many: another good reference would be your notes from class.
Topics
Part 1: Introduction
- welcome
- the state of NLP, CL, etc
- a little bit about language
- basic text processing and regular expressions
Part 2: classifying individual things
- a bit about classification
- naive bayes classifiers
- word sense disambiguation
- a bit about corpora and sources of data
- language identification
- logistic regression / "maxent" classifiers
- a very little bit of speech synthesis, homograph disambiguation
Part 3: sequences of things
- language models: n-grams and what they're good for
- POS tagging!
- IN-CLASS EXERCISE!!!
- speech recognition and FSTs
Part 4: structured things
- context free grammars, probabilistic CFGs
- parsing: constituency parsing
- parsing: dependency parsing... with classifiers.
Part 5: some other things
- FSTs for morphology (Mike-style)
- weighted FSTs
- representing semantics
Part 6: machine translation, because MT rules.
- history
- RBMT
- EBMT
- SMT
Also!
suggestions for topics:
- Greg and Karl think it's important to hit the Viterbi algorithm and HMMs!
- and most importantly, make sure everybody really understands n-gram models. Probably as an introduction to why we use probability distributions at all in NLP.