Home

This is the site where we will build the lecture notes for Indiana University's CSCI-B 490, Natural Language Processing.

There won't be slides for the class (probably). What there will be is readings from the textbook and discussions in class! I'll structure discussions around the relevant notes, but if we get off-topic, then we get off-topic. Please consider these notes to be one reference of many: another good reference would be your notes from class.

Topics

Part 1: Introduction

Part 2: classifying individual things

a bit about classification
naive bayes classifiers
word sense disambiguation
- a bit about corpora and sources of data
- language identification
- logistic regression / "maxent" classifiers
- a very little bit of speech synthesis, homograph disambiguation

Part 3: sequences of things

language models: n-grams and what they're good for
POS tagging!
IN-CLASS EXERCISE!!!

speech recognition and FSTs

Part 4: structured things

context free grammars, probabilistic CFGs
- parsing: constituency parsing
- parsing: dependency parsing... with classifiers.

Part 5: some other things

FSTs for morphology (Mike-style)
weighted FSTs
representing semantics

Part 6: machine translation, because MT rules.

history
RBMT
EBMT
SMT

Also!

suggestions for topics:

Greg and Karl think it's important to hit the Viterbi algorithm and HMMs!
and most importantly, make sure everybody really understands n-gram models. Probably as an introduction to why we use probability distributions at all in NLP.

Page updated

Google Sites

Report abuse