11-751 Speech Recognition and Understanding

This 12-unit LTI course provides an introduction to the theoretical background as well as the experimental practice that has made the field what it is today. We will cover theoretical foundations, essential algorithms, and experimental strategies needed to turn speech into text, and go beyond (i.e. do something really useful, and do it right). We will discuss state-of-the-art research including end-to-end speech recognition with Deep Learning, and show links to related fields, such as machine learning, machine translation, dialog systems, robotics, and user interfaces. A term project will provide students with the opportunity to conduct hands-on research.

This course is primarily for graduate students in LTI, ECE, HCI, and Robotics. Others, for example (undergraduate) students of CS, psychology, or computational linguistics, by prior permission of instructor. No prior experience with speech recognition is necessary, but a solid background in mathematics, computer science, or signal processing will help. The course is suitable for graduate students with some background in computer science, electrical engineering, Human-computer interaction or natural language processing, as well as for advanced undergraduates. 

The course involves written and programming assignments. Some reading of papers may also be required. This course is listed in LTI as 11-751 and in ECE as 18-781. This course combines well with 11-753 Advanced Speech Lab and 11-783 Rich Interaction in Virtual Worlds.

More (up-to-date) information is available on Blackboard or Piazza (sign-up may be required, or contact instructor).


Florian Metze (Pittsburgh) and Ian Lane (SV)

TA(s): Yun Wang

Time and location (Fall Semester 2016)

Pittsburgh: Monday and Wednesday, 4:30 – 5:50 (WeH4623) 
            SV: Monday and Wednesday, 1:30 – 2:50 (B23 211)