EE6307 - Speech Systems

Course Description:

Speech Systems offers a theoretical and practical understanding of how human speech can be processed by computers. This course is designed to understand the application of statistical and machine learning techniques to speech data. It covers speech recognition, speaker recognition, speech synthesis, and keyword spotting systems. The course involves practicals where the student builds working speech systems using existing toolkits. Details of the state-of-the-art algorithms and their limitations will also be discussed.

Prerequisites:

Basic Calculus, Probability and Random Processes, Linear Algebra, Digital Signal Processing and Machine Learning

Course contents:

Acoustic Theory of Feature Extraction: Acoustic theory of speech production and speech signal processing
Automatic Speech Recognition: Template matching approaches, hidden Markov models, deep acoustic modeling, language modeling
Speaker Recognition: Gaussian mixture modeling, universal background models, minimum divergence criteria, probabilistic LDA, system building
Speech Synthesis: Text analysis, Pronunciation, prosody, waveform generation using unit selection, HTS and wavenets, voice building and modification.

References:

Huang, Acero and Hon, Spoken Language Processing: A Guide to Theory, Algorithm and System Development, Prentice Hall
Rabiner and Schafer, Theory and Applications of Digital Speech Processing, Prentice Hall
CM Bishop, Pattern Recognition and Machine Learning, Springer

Slides