CS 613: Natural Language Processing
IIT Gandhinagar
Autumn 2022
Instructor: Mayank Singh (email: singh.mayank@iitgn.ac.in)
Office Hours: Monday 10AM-12PM (For any other day, email me for an appointment)
Class Schedule: Tuesday and Friday 3:30-5:00PM.
Location: AB 7/202
Communication Google group: https://groups.google.com/a/iitgn.ac.in/g/cs613-2022.pvtgroup
Teaching Assistant
Tarun Sharma (sharma_tarun@iitgn.ac.in)
Prerequisite (Optional)
Basic Probability & Statistics (ES 331/ MA 202) or equivalent
Basic understanding of Python programming (ES 102/ ES 112) or equivalent
Course Contents
Text processing: Tokenization, Stemming, Spell Correction, etc.
Language Modelling: N-grams, smoothing
Morphology, Parts of Speech Tagging
Syntax: PCFGs, Dependency Parsing
Distributional Semantics, Topic Models
Lexical Semantics, Word Sense Disambiguation
Information Extraction: Relation Extraction, Event Extraction
Applications: Text Classification, Sentiment Analysis, Opinion Mining, Summarization
Deep Learning for NLP, Representation Learning
Lecture Slides and Additional Materials
Introduction to NLP [slides]
Distributional Semantics [scripts: word-to-doc-binary, word-to-word_binary, word-to-word-non-binary-count, word-to-word-non-binary-tf-idf]
Continuos representations [Word2Vec paper, Parameter Learning, Wevi, Glove Paper]
Language Modelling [LM using NLTK, Smoothing] [Section 3.8 for relation between entropy, cross-entropy and perplexity]
RNNs, LSTM, GRU and Transformers [Slides]
Contextual Language Modelling [BERT paper]
Computational Morphology (Types, processes, FSA, FST) [Slides]
Sequence Labelling (POS Tagging, limitations, HMM) [Slides]
Lexical Semantics (Types, wordnet and similarity metrics) [Slides]
Word Sense Disambiguation [slides]
Text Classification [slides]
Text Summarization [Slides]
Assignments (All deadlines are 11:59PM IST)
Assignment 1 [23 Aug -31 Aug]
Assignment 2 [14 Sep - 30 Oct]
Assignment 3 [26 Oct - 5 Nov]
Assignment 4 [5 Nov - 10 Nov]
Paper Presentations
Presentation 1 [18 Oct - 21 Oct]
Presentation 2 [1 Nov - 4 Nov]
Grading Policy & Schedule
Assignments (40%)
Four assignments (each carrying 10 marks).Surprise quizzes (20%)
Two surprise quizzes of 10% marks each. These quizzes will assess your grasp of the content covered in the class. One before and after the midsem.Mid-semester (20%) [A Sample Paper]
Paper Presentations (20%)
Two paper presentations (each carrying 10 marks).
Books
[DJ] Daniel Jurafsky and James H. Martin. 2000. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (1st ed.). Prentice Hall PTR, Upper Saddle River, NJ, USA. (Main Textbook)
[CH] Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, USA.
[SEE] Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python (1st ed.). O'Reilly Media, Inc.
[IYA] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. The MIT Press.