CS 613: Natural Language Processing
IIT Gandhinagar
Autumn 2021
Instructor: Mayank Singh (email: singh.mayank@iitgn.ac.in)
Online Office Hours: Monday - 16:00 - 17:00
Class Schedule: Tuesday and Friday: 15:35 - 17:00
Location: Online (Zoom link will be shared over the email)
Communication Google group: cs613-2021.pvtgroup@iitgn.ac.in
TAs
Jayesh Malaviya, PhD, malaviya_jayesh@iitgn.ac.in
Aarchi Agrawal, M.Tech., aarchia@iitgn.ac.in
Shoaib Alam, M.Tech., shoaibalam@iitgn.ac.in
Varun Jain, B.Tech., varun.jain@iitgn.ac.in
Praveen Venkatesh, B.Tech., praveen.venkatesh@iitgn.ac.in
Harsh Patel, B.Tech., harsh.patel@iitgn.ac.in
Shivam Sahni, B.Tech., shivam.sahni@iitgn.ac.in
Prerequisite (Optional)
Basic Probability & Statistics (ES 331/ MA 202) or equivalent
Basic understanding of Python programming (ES 102/ ES 112) or equivalent
Course Contents
Text processing: Tokenization, Stemming, Spell Correction, etc.
Language Modelling: N-grams, smoothing
Morphology, Parts of Speech Tagging
Syntax: PCFGs, Dependency Parsing
Distributional Semantics, Topic Models
Lexical Semantics, Word Sense Disambiguation
Information Extraction: Relation Extraction, Event Extraction
Applications: Text Classification, Sentiment Analysis, Opinion Mining, Summarization
Deep Learning for NLP, Representation Learning
Practical Sessions (Optional, only for interested students)
Basics of Python for NLP (file handling, case-folding, spell check, split, strip, Regex, find, replace, etc.), NLTK, Anaconda installation, python notebooks, basic Github knowledge (pull, push, fork, merge, etc.). [Colab]
Basics of ML, such as Regression, classification. Test/train/validation, Cross-Validation(Why and How?). Using Numpy and Scikit-learn to train linear regression, SVM/Logistic/Random forest/decision trees for NLP tasks.
Additional sessions can be conducted based on request.
Projects
Will be updated soon.
Lecture Slides and Additional Materials
Basic Text Processing (Word and sentence tokenization, normalization, casefolding, and spelling correction) [Slides, Lecture I video, Lecture II video] [Stanford]
Advanced Text Processing (Regular expressions, Lemmatization, Stemming) [Slides, video]
Statistically Understanding Text (TTR, Zipfs' law, Heaps' law) [Slides, Video]
Distributional Semantics (DSMs, context weighing, similarity metrics) [Slides, Video]
Introduction to basic word embeddings (Word2Vec, Glove) [Slides, Video 1,2,3] [Word2Vec paper, Parameter Learning], [wevi], [Glove Paper]
Traditional language modeling (n-gram models, perplexity) [Slides, Video], [Understanding Probabilities in NLP by Joakim Nivre], [Google N-grams blog, viewer]
Smoothing techniques (Laplace, Good-Turing and Kneser-Ney Smoothing) [Slides, Video]
Neural language models (Bengio et al.) [Slides, Video] [Paper by Bengio et al.]
Computational Morphology (Types, processes, FSA, FST, POS tagging) [Slides, Video]
Sequence Labelling (POS Tagging, limitations, HMM) [Slides, Video 1 and 2]
Sequence-to-Sequence Modelling (RNN, LSTM, GRU, attentions and Transformers) [Slides, Video 1,2,3]
Text Classification (Approaches and Applications) [Slides, Video 1, 2]
Lexical Semantics (Types, wordnet and similarity metrics) [Slides, Video 1 and 2], [IndoWordnet], [Lesk tutorial]
Word Sense Disambiguation [Slides, Video] [Yarowsky's paper], [A nice hands-on]
Information Extraction (NER, Relation extraction) [Slides, Video]
Summarization (Centroid-based, LexRank, TextRank) [Slides, Video], LexRank [Paper, code]
Assignments (All deadlines are 11:59PM IST)
Assignment I: Crawling data (Deadline: 16th August) [Link]
Assignment II: Processing and Understanding data (Deadline: 30th August) [Link]
Assignment III: Language Modelling (Deadline: 13th September) [Link]
Grading Policy & Schedule
Assignments (15%)
Three assignments (each carrying 5 marks).Surprise quizzes (10%)
Four quizzes of 2.5% marks each. These quizzes will assess your grasp of the content covered in the class.Mid-semester (15%) [A Sample Paper]
End-semester (15%)
Attendance (5%)
Ten surprise attendances of 0.5% marks each during (lectures and guest lectures).Project (40%):
Attendance to each weekly meeting: 5%
Project proposal abstract: 5% (Template, Deadline: 23rd August)
Phase-I presentation: 5% (22nd September)
Phase-II presentation: 5% (28th October)
Final Project 3-minute madness video + Poster: 15% ( 5% +10%, 20th November) [See the poster and slides of 2019 NLP course version here]
Final Project Demo: 5% (22th November, Web platform)
Books
[DJ] Daniel Jurafsky and James H. Martin. 2000. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (1st ed.). Prentice Hall PTR, Upper Saddle River, NJ, USA. (Main Textbook)
[CH] Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, USA.
[SEE] Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python (1st ed.). O'Reilly Media, Inc.
[IYA] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. The MIT Press.