Natural Language Processing
Course Webpage for the Spring semester 2023-24
Announcements !!
[NEW] Assignment 3 declared on CSE Moodle. Deadline: 10th April EOD 15th April EOD
Assignment 2 declared on CSE Moodle. Deadline: 19th March 26th March EOD
Assignment 1 declared on CSE Moodle. Deadline: 4th February EOD. Marks uploaded on Moodle.
The course will use the CSE department Moodle (https://moodlecse.iitkgp.ac.in/moodle/). All students need to have an account and join the course Moodle. Use the Student Key: Student@NLP
First class on Wednesday, January 3
NLP is a very popular course, and there are usually many more requests for registration than what can be accommodated with the available resources. Hence, registration requests will be approved on ERP gradually in batches, taking into consideration the CGPA and seniority of the students. Please do not send emails to the faculty member / TAs regarding registration to the course; we regret to say that individual emails about registration cannot be answered.
This is a research-oriented course that would require students to understand several CS research papers. There will be a term project / assignments that will involve substantial progamming in Python. It is advisable to take this course only if you have the necessary background (see below).
Instructor
Saptarshi Ghosh (saptarshi@cse.iitkgp.ac.in)
Teaching Assistants
Koyena Chowdhury (koyenachowdhury02@gmail.com)
Soham Poddar (sohampoddar@kgpian.iitkgp.ac.in)
Shounak Paul (shounakpaul95@gmail.com)
Class Timings and Venue
Wednesday 12:00 - 12:55
Thursday 11:00-11:55
Friday 09:00-09:55
Classroom: NC323 (Nalanda Complex)
Pre-requisites for the course
Data structures and algorithms
Probability and Statistics
Basics of Machine Learning
Basics of Graph algorithms
Programming in Python (there will be programming-based assignments / term project)
Course evaluation
Mid-semester exam: 30%
End-semester exam: 40%
Internal assessment: 30% (term-project and / or multiple assignments -- to be decided)
Broad topics
Introduction and challenges in NLP
Empirical laws
Language Modeling: N-grams, smoothing
Parts of Speech Tagging
Syntax, Dependency Parsing
Distributional Semantics, Word Embeddings
RNNs and seq2seq models
Transformers, Attention, BERT
Applications and special topics (LLMs, Prompt Engineering, domain-specific NLP, etc.)
Text and Reference Literature
Daniel Jurafsky and James Martin. Speech and Language Processing. https://web.stanford.edu/~jurafsky/slp3/
Christopher Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. https://nlp.stanford.edu/fsnlp/
Goldberg. A Primer on Neural Network Models for Natural Language Processing. https://arxiv.org/abs/1510.00726
Research papers and online materials to be pointed out in class
Plagiarism policy
Plagiarism in any form -- copying from other students or from online resources -- will be severely penalized. Every assignment should be done individually, unless otherwise specified. Also you should not use / copy any code that is available online.
While you can discuss the concepts and assignments with other students, you should NOT share your code/answers for any assignment / term project with any other student, until the grading of the assignment is completed. It is your responsibility to ensure that your codes/answers are not available to others.
We will use standard plagiarism detection software to check the similarity of submitted assignments. If we find submissions that are too similar (beyond what can be expected by chance, or due to discussion among students), all such submissions will be severely penalized. We will NOT attempt to differentiate between who gave the codes and who copied; all involved students will be penalized equally. The minimum penalty for plagiarism in an assignment is a zero on that assignment. There can be more severe penalties for repeat offences.