Spring semester 2024 - 25
Announcements !!
End-semester syllabus will include Word Embeddings and all topics covered after the mid-semester exams.
The course will use the CSE department Moodle (https://moodlecse.iitkgp.ac.in/moodle/). All students need to have an account and join the course Moodle "Natural Language Processing - CS60075". Use the Student Key: NLP-Stu24
First class on Friday, Jan 3, 2025.
NLP is a very popular course, and there are many more requests for registration than what can be accommodated with the available resources. Hence, registration requests will be approved on ERP gradually in batches, taking into consideration the CGPA and seniority of the students. Please do not send emails to the faculty member / TAs regarding registration to the course; we regret to say that individual emails about registration cannot be answered.
This course will require students to understand several CS research papers. There will be assignments that will involve substantial progamming in Python. It is advisable to take this course only if you have the necessary background (see below).
Instructor
Saptarshi Ghosh (saptarshi [at] cse.iitkgp.ac.in)
Teaching Assistants
Koyena Chowdhury (koyenachowdhury02 [at] gmail.com)
Soham Poddar (sohampoddar [at] kgpian.iitkgp.ac.in)
Shounak Paul (shounakpaul95 [at] gmail.com)
Class Timings and Venue
Wednesday 12:00 - 12:55
Thursday 11:00-11:55
Friday 09:00-09:55
Classroom: NC323 (Nalanda Complex)
Pre-requisites for the course
Data structures and algorithms
Probability and Statistics
Machine Learning
Basics of Graph algorithms
Programming in Python (there will be multiple programming-based assignments)
Course evaluation
Mid-semester exam: 30%
End-semester exam: 40%
Internal assessment: 30% (Programming assignments)
Broad topics
Challenges in NLP
Parts of Speech Tagging
Syntax, Dependency Parsing
Language Modeling: N-grams, smoothing
Distributional Semantics, Word Embeddings
RNNs and seq2seq models
Transformers, Attention, BERT
Large Language Models, Prompt Engineering
Applications and special topics (domain-specific NLP, interpretability, etc.)
Text and Reference Literature
Daniel Jurafsky and James Martin. Speech and Language Processing. https://web.stanford.edu/~jurafsky/slp3/
Christopher Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. https://nlp.stanford.edu/fsnlp/
Goldberg. A Primer on Neural Network Models for Natural Language Processing. https://arxiv.org/abs/1510.00726
Research papers and online materials to be pointed out in class, especially for the recent topics
Plagiarism policy
Plagiarism in any form - copying from other students or from online resources - will be severely penalized. Every assignment should be done individually, unless otherwise specified. Also you should not use / copy any code that is available online.
While you can discuss the concepts and assignments with other students, you should NOT share your code/answers for any assignment with any other student, until the grading of the assignment is completed. It is your responsibility to ensure that your codes/answers are not available to others.
We will use standard plagiarism detection software to check the similarity of submitted assignments. If we find submissions that are too similar (beyond what can be expected by chance, or due to discussion among students), all such submissions will be severely penalized. We will NOT attempt to differentiate between who gave the codes and who copied; all involved students will be penalized equally. The minimum penalty for plagiarism in an assignment is a zero on that assignment. There can be more severe penalties for repeat offences.