Introduction to Natural Language Text Processing, Applications and Issues in NL text processing, Regular Expressions, Word Tokenization, Word Normalization and Stemming, Sentence Segmentation, Probabilistic Language Models –theory and methods for statistical NLP. Bi-grams, tri-grams and N-grams, estimating N-gram probabilities. Text Classification: Naïve Bayes, Multinominal Naïve Bayes. Parts of speech tagging (POS tagging), Information Extraction, NER. Trigram Hidden Markov Model for parameter estimation. Viterbi Algorithm, Natural Language Parsing, Probabilistic CFGs, Parsing with PCFGs, Estimating model parameters, CKY parsing algorithm, Issue with PCFGs, Lexicalized PCFGs
Course Project
Link for example projects [Here]
Useful Links
1. NLP course at John Hopkins University
2. Project Ideas from projects [Link-1] [Link-2]
3. NLP Course at UMASS Link-2
Evaluation Criteria
Quizzes 20%
End Sem 50%
Project 30%