DS651 Introduction to Speech & Natural Language Processing
DS651 Introduction to Speech & Natural Language Processing
Class Timing: Wednesday 6.30-7.30 PM
Credit Structure: 1-0-0-0-1
This course provides a foundational understanding of Speech Processing and Natural Language Processing (NLP). It focuses on the core principles, techniques, and applications. The course covers text processing, language models, speech signal processing, automatic speech recognition (ASR), and text-to-speech synthesis (TTS).
Unit 1: Lexical Processing in NLP (2 Hours)
What is NLP & Speech Processing
Text Normalization: Tokenization, Stemming, Lemmatization.
Text Similarity using MED
Unit 2: Syntactic Processing in NLP (2 Hours)
Context-Free Grammars and Constituency Parsing.
Dependency Parsing.
Unit 3: Semantic Processing in NLP (2 Hours)
Word Senses
Thesaurus-based algorithms
Distributional algorithms
Unit 4: Phonetics & Speech Signal Processing (3 Hours)
Basics of Speech Production & Phonetics.
Feature Extraction: MFCC, Spectrograms, PLP Features.
Deep Learning for Speech: Introduction to Wave2Vec, WavLM.
Unit 5: Automatic Speech Recognition (ASR) & Text-to-Speech (TTS) (3 Hours)
ASR Pipeline: Feature extraction → Acoustic modeling → Decoding.
HMM vs. DNN-based ASR systems.
End-to-End ASR Models: Wave2Vec, Whisper API.
TTS Pipeline: Text preprocessing → Prosody → Synthesis.
Deep Learning-based TTS Models: Tacotron, FastSpeech, WaveNet.
Challenges in Speech Synthesis (Low-Resource Languages, Prosody Control).
Lecture 1: Information Layers of SNLP
Lecture 2: Lexical Processing in NLP: Regular Expression
Lecture 3: Lexical Processing in NLP: Tokenization, Stemming, Lemmatization
Lecture 4: Lexical Processing in NLP: Minimum Edit Distance
Lecture 5: Lexical Processing Applications
Lecture 6: Syntactic Processing in NLP: Constituency Parsing
Lecture 7: Syntactic Processing in NLP: Dependency Parsing
Lecture 10: Speech Feature Extraction
Lecture 11: ASR Systems
Lecture 12: TTS Systems
"Speech and Language Processing" by Daniel Jurafsky and James H. Martin, Prentice Hall, 2024.
"Springer Handbook of Speech Processing" by Jacob Benesty, M. Mohan Sondhi, Yiteng Arden Huang, 2008.
"Natural Language Understanding" by James Allen, Benjamin/Cummings Publishing Company, 1987.
"Foundations of Statistical Natural Language Processing" by Christopher D. Manning and Hinrich Schütze, MIT Press, 1999.
"A Primer on Neural Network Models for Natural Language Processing" by Yoav Goldberg, Online.
"Natural Language Processing with Python" by Steven Bird, Ewan Klein, Edward Loper, O'Reilly Media, Inc., 2009.
2 Theoretical Assignments (14%)
12 Quizzes (36%)
1 End Term (30%)
Classroom Notes (5%)
Activeness in Classes (5%)
Attendance (5%)
X-Factor (5%)