Spring semester 2025 - 26
Announcements !!
Attendance will be handled by TAs Amogh and Rohit. All communication regarding attendance, if any, should be directed to these TAs only.
First class on Monday, Jan 5, 2026
NLP is a very popular course, and there are many more requests for registration than what can be accommodated with the available resources. Hence, registration requests will be approved on ERP gradually in batches, taking into consideration the CGPA and seniority of the students. Please do not send emails to the faculty member / TAs regarding registration to the course; we regret to say that individual emails about registration cannot be answered.
This course will require students to understand several CS research papers. There will be assignments that will involve substantial programming in Python. It is advisable to take this course only if you have the necessary background (see below).
Instructor
Saptarshi Ghosh (saptarshi [at] cse.iitkgp.ac.in)
Teaching Assistants
Koyena Chowdhury (koyenachowdhury02 [at] gmail.com)
Rohit Dutta (rohitdutta2510 [at] gmail.com)
Amogh Joshi (amoghjoshi500 [at] gmail.com)
Class Timings and Venue
Monday 14:00 - 14:55 (1 hour)
Tuesday 16:00 - 17:55 (2 hours)
Classroom: NR123 (Nalanda Complex)
Pre-requisites for the course
Data structures and algorithms
Probability and Statistics
Machine Learning
Basics of Graph algorithms
Programming in Python (there will be multiple programming-based assignments)
Course evaluation
Mid-semester exam (written): 30%
End-semester exam (written): 30%
Internal assessment: 30% (3 programming assignments)
Attendance: 10%
Broad topics
Challenges in NLP
Parts of Speech Tagging
Syntax, Dependency Parsing
Language Modeling: N-grams, smoothing
Distributional Semantics, Word Embeddings
RNNs and seq2seq models
Transformers, Attention, BERT
Large Language Models, Prompt Engineering
Applications and special topics (domain-specific NLP, interpretability, etc.)
Text and Reference Literature
Daniel Jurafsky and James Martin. Speech and Language Processing. https://web.stanford.edu/~jurafsky/slp3/
Christopher Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. https://nlp.stanford.edu/fsnlp/
Goldberg. A Primer on Neural Network Models for Natural Language Processing. https://arxiv.org/abs/1510.00726
Research papers and online materials to be pointed out in class, especially for the recent topics
Plagiarism policy
Plagiarism in any form - copying from other students or from online resources - will be severely penalized. Every assignment should be done individually or in designated groups. Also you should not use / copy any code that is available online.
While you can discuss the concepts and assignments with other students/groups, you should NOT share your code/answers for any assignment with any other student (or with students outside your designated group), until the grading of the assignment is completed. It is your responsibility to ensure that your codes/answers are not available to others.
We will use standard plagiarism detection software to check the similarity of submitted assignments. If we find submissions that are too similar (beyond what can be expected by chance, or due to discussion), all such submissions will be severely penalized. We will NOT attempt to differentiate between who gave the codes and who copied; all involved students will be penalized equally. The minimum penalty for plagiarism in an assignment is a zero on that assignment. There can be more severe penalties for repeat offences, including grade reduction and de-registration from the course.