CS458 Natural Language Processing
CS458 Natural Language Processing
Class Timing: Wednesday 15.40 at C002; Thursday 14.00 at C002
Hybrid Class: https://meet.google.com/sek-jrsg-rqz
Quiz: 8.15 PM, 12th January
Assignment: Deadline 19th January; Submission Link
Midterm Exam: 11.00 AM - 12.30 PM, 16th February; Syllabus: Lecture 1-12 and 15
Project: Submission Date: 23rd March - 10th April
Credit Structure: 3-0-0-4-4
This course provides an introduction to NLP, focusing on core algorithms and methods used to analyze and understand human language. Students will learn foundational concepts, tools, and techniques for processing text and speech, with applications in translation, summarization, and conversation.
Unit 1: Foundations of NLP
Introduction; Regular Expressions, Tokenization; Minimum Edit Distance; N-gram Language Models
Unit 2: Fundamental Algorithms of NLP
Naive Bayes & Text Classification; Sentiment Analysis
Unit 3: Neural Structures for NLP
Logistic Regression; Vector Semantics and Embeddings; Neural Networks; RNNs and LSTMs; Transformers; Large Language Models; Masked Language Models; Model Alignment, Prompting, and In-Context Learning
Unit 4: NLP Applications
Machine Translation; Question Answering, Information Retrieval, & RAG
Unit 5: Annotating Linguistic Structure
Sequence Labeling for Parts-of-Speech (POS) Tagging and Named Entity Recognition (NER); Context-Free Grammars and Constituency Parsing; Dependency Parsing; Information Extraction: Relations, Events, and Time; Semantic Role Labeling and Argument Structure; Coreference Resolution and Entity Linking; Discourse Coherence
Week 1:
Lecture 1: Introduction; Source: Chapter 1, Jurafsky & Martin
Lecture 2: Regular Expressions, Tokenization; Source: Chapter 2, Jurafsky & Martin
Self-Study 1: Basic text processing using RegExPal
Week 2:
Lecture 3: Minimum Edit Distance; Source: Chapter 2, Jurafsky & Martin
Lecture 4: N-gram Language Models; Source: Chapter 3, Jurafsky & Martin
Self-Study 2: Basic text processing using NLTK and spaCy
Week 3:
Lecture 5: Naive Bayes & Text Classification; Source: Chapter 4, Jurafsky & Martin
Lecture 6: Sentiment Analysis; Source: Chapter 4, Jurafsky & Martin
Self-Study 3: Minimum Edit Distance
Week 4:
Lecture 7: Logistic Regression; Source: Chapter 5, Jurafsky & Martin
Lecture 8: Vector Semantics and Embeddings; Source: Chapter 6, Jurafsky & Martin
Self-Study 4: Language Modelling
Week 5:
Lecture 9: Neural Networks; Source: Chapter 7, Jurafsky & Martin; StatQuest
Lecture 10: Neural Networks; Source: Chapter 7, Jurafsky & Martin; StatQuest
Self-Study 5: Naive Bayes, Text Classification, and Sentiment Analysis
Week 6:
Lecture 11: RNNs; Source: Chapter 8, Jurafsky & Martin; StatQuest
Lecture 12: LSTMs; Source: Chapter 8, Jurafsky & Martin; StatQuest
Self-Study 6: Logistic Regression
Week 7:
Lecture 13: Information Retrieval; Source: Chapter 14, Jurafsky & Martin
Lecture 14: Extra Class
Self-Study 7: Neural Networks
Week 8:
Midsem
Week 9:
Lecture 15: Sentiment Analysis II; Source: Slide 7, Jurafsky & Martin
Lecture 16: Maximum Entropy Classification; Source: Slide 8, Jurafsky & Martin
Self-Study 8: RNNs & LSTMs
Week 10:
Lecture 17: Semantics II; Source: Slide 6, Jurafsky & Martin
Lecture 18: Statistical Natural Language Parsing; Source: Slide 11, Jurafsky & Martin
Self-Study 9: Parts-of-Speech (POS) Tagging
Week 11:
Lecture 19: POS Tagging; Source: Slide 10, Jurafsky & Martin
Lecture 20: Probabilistic Parsing; Source: Slide 12, Jurafsky & Martin
Self-Study 10: Information Extraction
Week 12:
Lecture 21: Lexicalized Parsing; Source: Slide 13, Jurafsky & Martin
Lecture 22: Dependency Parsing; Source: Slide 13, Jurafsky & Martin
Self-Study 11: Named Entity Recognition (NER)
Week 13:
Lecture 23: Information Extraction & Named Entity Recognition; Source: Slide 9, Jurafsky & Martin
Lecture 24: Relation Extraction; Source: Slide 9, Jurafsky & Martin
Self-Study 12: Machine Translation
Week 14:
Lecture 25: Question Answering; Source: Chapter 14, Jurafsky & Martin
Lecture 26: Summarization in Question Answering; Source: Chapter 14, Jurafsky & Martin
Self-Study 13: Discourse & Coherence
Week 15:
Lecture 27: Machine Translation; Source: Chapter 13, Jurafsky & Martin
Lecture 28: RAG; Source: Chapter 14, Jurafsky & Martin
Self-Study 14: Extra Class
Week 16:
Endsem
Extras:
Lecture A: Transformers; Source: Chapter 9, Jurafsky & Martin
Lecture B: Large Language Models; Source: Chapter 10, Jurafsky & Martin
Lecture C: Masked Language Models; Source: Chapter 10, Jurafsky & Martin
Lecture D: Model Alignment, Prompting & In-Context Learning; Source: Chapter 10, Jurafsky & Martin
"Speech and Language Processing" by Daniel Jurafsky and James H. Martin, Prentice Hall, 2024.
"Natural Language Understanding" by James Allen, Benjamin/Cummings Publishing Company, 1987.
"Foundations of Statistical Natural Language Processing" by Christopher D. Manning and Hinrich Schütze, MIT Press, 1999.
"A Primer on Neural Network Models for Natural Language Processing" by Yoav Goldberg, Online.
"Natural Language Processing with Python" by Steven Bird, Ewan Klein, Edward Loper, O'Reilly Media, Inc., 2009.
Attendance (10%)
Quiz (10%)
Theoretical Assignment (10%)
Mid Sem (20%)
Course Project (50%)
Survey (10%)
Implementation (20%)
Analysis (10%)
Innovation (10%)