Deep Learning for Speech and Language

Deep Learning for Speech and Language

Language and Speech technologies are rapidly evolving thanks to the current advances in Artificial Intelligence. Applications such as Machine Translation or Speech Recognition can be tackled from a neural perspective with novel Architectures that combine Convolutional (CNN) and/or Recurrent (RNN) models with Attention mechanisms. This course overviews the state of the art on Deep Learning for Speech and Language ad introduces the programming skills and techniques required to train these systems.

The course webpage can be found: [2018], [2017]

DLSL Lectures (2018)

dlsl_2018_d1l2_TrainingAnMLP.pdf

Lecture 1.1: Training a Multi Layer Perceptron

Training

Instructor: Xavier Giró-i-Nieto

dlsl_2018_d1l3_CNN-RNN-Attention.pdf

Lecture 1.2: CNN, RNN and Attention models

Architectures

Instructor: Xavier Giró-i-Nieto

dlsl_2018_d1l4_embeddings.pdf

Lecture 1.3: Embeddings

Natural Language Processing

Instructor: Antonio Bonafonte

dlsl_2018_d2l1_LanguageModels.pdf

Lecture 2.1: Language Models

Natural Language Processing

Instructor: Marta R. Costa-Jussà

dlsl_2018_d2l3_NeuralMachineTranslation.pdf

Lecture 2.2: Neural Machine Translation

Natural Language Processing

Instructor: Marta R. Costa-Jussà

dlsl_2018_d2l3_Seq2seqNLP.pdf

Lecture 2.3: Sequence to Sequence (Seq2Seq)

Natural Language Processing

Instructor: Marta R. Costa-Jussà

dlsl_2018_d2l4_LanguageAndVision.pdf

Lecture 2.4: Language and Vision

Multimodal Learning

Instructor: Xavier Giró-i-Nieto

dlsl_2018_d3l1_SpeechRecognition.pdf

Lecture 3.1: Speech Recognition

Speech Processing

Instructor: José A. R. Fonollosa

dlsl_2018_d3l2_SpeakerRecognition.pdf

Lecture 3.2: Speaker Recognition

Speech Processing

Instructor: Javier Hernando

dlsl_2018_d4l1_Text2Speech.pdf

Lecture 4.1: Text 2 Speech

Speech Processing

Instructor: Antonio Bonafonte

dlsl_2018_d4l2 Speech2Speech.pdf

Lecture 4.2: Speech to Speech

Speech Processing

Instructor: Santiago Pascual

dlsl_2018_d4l3_AudioAndVision.pdf

Lecture 4.3: Audio and Vision

Speech Processing

Instructor: Xavier Giró-i-Nieto