CS458 Natural Language Processing
CS458 Natural Language Processing
Class Timing: Monday 9.00-10.30 AM and Thursday 9.00-10.30 AM at C303
Self-study Timing: Wednesday 11.00 AM-12.00 PM at C303
Mid Sem Examination: 22 February, 2025 10.00AM-12.00PM
Theoretical Assignments:
TA1: 5 February, 2025
TA 2: 10 April, 2025
Practical Assignments:
PA1: 15 January on Basic Text Processing
PA2: 22 January on Minimum Edit Distance & Language Modelling
PA3: 29 January on Naive Bayes, Text Classification & Sentiment Analysis
PA4 on Neural Networks / Machine Translation / Parts-of-Speech (POS) Tagging: 12 March, 2025
End Sem Examination: 13 April, 2025 (BTech) / 20 April, 2025 (MTech & PhD)
Credit Structure: 3-0-0-4-4
This course provides an introduction to NLP, focusing on core algorithms and methods used to analyze and understand human language. Students will learn foundational concepts, tools, and techniques for processing text and speech, with applications in translation, summarization, and conversation.
Unit 1: Foundations of NLP
Introduction; Regular Expressions, Tokenization; Minimum Edit Distance; N-gram Language Models
Unit 2: Fundamental Algorithms of NLP
Naive Bayes & Text Classification; Sentiment Analysis
Unit 3: Neural Structures for NLP
Logistic Regression; Vector Semantics and Embeddings; Neural Networks; RNNs and LSTMs; Transformers; Large Language Models; Masked Language Models; Model Alignment, Prompting, and In-Context Learning
Unit 4: NLP Applications
Machine Translation; Question Answering, Information Retrieval, & RAG
Unit 5: Annotating Linguistic Structure
Sequence Labeling for Parts-of-Speech (POS) Tagging and Named Entity Recognition (NER); Context-Free Grammars and Constituency Parsing; Dependency Parsing; Information Extraction: Relations, Events, and Time; Semantic Role Labeling and Argument Structure; Coreference Resolution and Entity Linking; Discourse Coherence
Week 1:
Lecture 1: Introduction; Source: Chapter 1, Jurafsky & Martin
Lecture 2: Regular Expressions, Tokenization; Source: Chapter 2, Jurafsky & Martin
Self-Study 1: Basic text processing using RegExPal
Week 2:
Lecture 3: Minimum Edit Distance; Source: Chapter 2, Jurafsky & Martin
Lecture 4: N-gram Language Models; Source: Chapter 3, Jurafsky & Martin
Self-Study 2: Basic text processing using NLTK and spaCy
Week 3:
Lecture 5: Naive Bayes & Text Classification; Source: Chapter 4, Jurafsky & Martin
Lecture 6: Sentiment Analysis; Source: Chapter 4, Jurafsky & Martin
Self-Study 3: Minimum Edit Distance
Week 4:
Lecture 7: Logistic Regression; Source: Chapter 5, Jurafsky & Martin
Lecture 8: Vector Semantics and Embeddings; Source: Chapter 6, Jurafsky & Martin
Self-Study 4: Language Modelling
Week 5:
Lecture 9: Neural Networks; Source: Chapter 7, Jurafsky & Martin
Lecture 10: Neural Networks; Source: Chapter 7, Jurafsky & Martin
Self-Study 5: Naive Bayes, Text Classification, and Sentiment Analysis
Week 6:
Lecture 11: RNNs; Source: Chapter 8, Jurafsky & Martin
Lecture 12: LSTMs; Source: Chapter 8, Jurafsky & Martin
Self-Study 6: Logistic Regression
Week 7:
Lecture 13: Extra Class
Lecture 14: Extra Class
Self-Study 7: Neural Networks
Week 8:
Midsem
Week 9:
Lecture 15: Information Retrieval; Source: Chapter 14, Jurafsky & Martin
Lecture 16: RAG; Source: Chapter 14, Jurafsky & Martin
Self-Study 8: RNNs & LSTMs
Week 10:
Lecture 17: Question Answering; Source: Chapter 14, Jurafsky & Martin
Lecture 18: Machine Translation; Source: Chapter 13, Jurafsky & Martin
Self-Study 9: Machine Translation
Week 11:
Lecture 19: Sentiment Analysis II; Source: Slide 7, Jurafsky & Martin
Lecture 20: Maximum Entropy Classification; Source: Slide 8, Jurafsky & Martin
Self-Study 10: Named Entity Recognition (NER)
Week 12:
Lecture 21: Information Extraction, Named Entity Recognition, & Relation Extraction; Source: Slide 9, Jurafsky & Martin
Lecture 22: POS Tagging; Source: Slide 10, Jurafsky & Martin
Self-Study 11: Parts-of-Speech (POS) Tagging
Week 13:
Lecture 23: Statistical Natural Language Parsing; Source: Slide 11, Jurafsky & Martin
Lecture 24: Probabilistic Parsing; Source: Slide 12, Jurafsky & Martin
Self-Study 12: Context-Free Grammars, Constituency Parsing & Dependency Parsing
Week 14:
Lecture 25: Lexicalized Parsing; Source: Slide 13, Jurafsky & Martin
Lecture 26: Extra Class
Self-Study 13: Information Extraction
Week 15:
Lecture 27: Extra Class
Lecture 28: Extra Class
Self-Study 14: Discourse & Coherence
Week 16:
Endsem
Extras:
Lecture A: Transformers; Source: Chapter 9, Jurafsky & Martin
Lecture B: Large Language Models; Source: Chapter 10, Jurafsky & Martin
Lecture C: Masked Language Models; Source: Chapter 10, Jurafsky & Martin
Lecture D: Model Alignment, Prompting & In-Context Learning; Source: Chapter 10, Jurafsky & Martin
"Speech and Language Processing" by Daniel Jurafsky and James H. Martin, Prentice Hall, 2024.
"Natural Language Understanding" by James Allen, Benjamin/Cummings Publishing Company, 1987.
"Foundations of Statistical Natural Language Processing" by Christopher D. Manning and Hinrich Schütze, MIT Press, 1999.
"A Primer on Neural Network Models for Natural Language Processing" by Yoav Goldberg, Online.
"Natural Language Processing with Python" by Steven Bird, Ewan Klein, Edward Loper, O'Reilly Media, Inc., 2009.
2 Theoretical Assignments (20%); Mid Sem (20%); End Sem (30%), 4 Practical Assignments (5%+5%+5%+15%)