CS 613: Natural Language Processing
IIT Gandhinagar
Autumn 2023
Instructor: Mayank Singh (email: singh.mayank@iitgn.ac.in)
Office Hours: Monday 10 AM-12 PM (For any other day, email me for an appointment)
Class Schedule: Monday and Wednesday 2:00-3:20 PM.
Location: AB 1/103
Communication Google group: cs613-2023.pvtgroup@iitgn.ac.in
Teaching Assistants
Himanshu Beniwal (himanshubeniwal@iitgn.ac.in)
Pritam Kadasi (pritam.k@iitgn.ac.in)
Dishaben Suthar (dishaben.suthar@iitgn.ac.in)
Jayesh Malaviya (malaviya_jayesh@iitgn.ac.in)
Shriraj Sawant (sawant_shriraj@iitgn.ac.in)
Suraj Jaiswal (jaiswalsuraj@iitgn.ac.in)
Progyan Das (progyan.das@iitgn.ac.in)
Prerequisite (Optional)
Basic Probability & Statistics (ES 331/ MA 202) or equivalent
Basic understanding of Python programming (ES 102/ ES 112) or equivalent
Course Contents
Text processing: Tokenization, Stemming, Spell Correction, etc.
Language Modelling: N-grams, smoothing
Morphology, Parts of Speech Tagging
Syntax: PCFGs, Dependency Parsing
Distributional Semantics, Topic Models
Lexical Semantics, Word Sense Disambiguation
Information Extraction: Relation Extraction, Event Extraction
Applications: Text Classification, Sentiment Analysis, Opinion Mining, Summarization
Deep Learning for NLP, Representation Learning
Lecture Slides and Additional Materials
Introduction to NLP [slides]
Distributional Semantics [scripts: word-to-doc-binary, word-to-word_binary, word-to-word-non-binary-count, word-to-word-non-binary-tf-idf]
Continuous representations [Slides, Word2Vec paper, Parameter Learning, Wevi, Glove Paper]
Language Modelling [Basics of LM Slides, Smoothing Slides, LM using NLTK, Section 3.8 for relation between entropy, cross-entropy, and perplexity, Good Turing Estimate]
Neural Language Model [Slides]
RNNs, LSTM, GRU and Transformers [Slides]
Contextual Word Embeddings, BERT [Slides]
Calculating Parameters [Doc], Let's build GPT: from scratch [Video Link]
Computational Morphology [Slides]
Sequence Labelling (POS Tagging, limitations, HMM) [Slides]
Lexical Semantics (Types, wordnet and similarity metrics) [Slides, NLTK's Wordnet Tutorial]
Word Sense Disambiguation [Slides]
Text Classification [Slides]
Text Summarization [Slides]
Assignments (All deadlines are 11:59 PM IST)
Assignment I [link, deadline: 28th August]
Assignment 2 [link, deadline: 30th September]
Assignment 3 [link, deadline: 16th November]
Paper Presentations
Presentation 1: 5th and 6th October [slots]
Presentation 2: November 8th and 9th [slots]
Grading Policy & Schedule
Three Assignments (30%)
Three assignments (each carrying 10 marks).Three Surprise quizzes (30%)
Three surprise quizzes of 10% marks each. These quizzes will assess your grasp of the content covered in the class.Exam (20%) [A Sample Paper]
One exam during the examination I.Paper Presentations (20%)
Two paper presentations (each carrying 10 marks).
Books
[DJ] Daniel Jurafsky and James H. Martin. 2000. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (1st ed.). Prentice Hall PTR, Upper Saddle River, NJ, USA. (Main Textbook)
[CH] Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, USA.
[SEE] Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python (1st ed.). O'Reilly Media, Inc.
[IYA] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. The MIT Press.