If you plan to attend our tutorial, please fill out this form: FORM
Slides for Part 1 are now available: SLIDES
The tutorial will take place on Friday, January 10 from 1.45pm to 5pm in Salon D
There will also be a presentation of the tutorial at the 5 Minute Linguist event on Friday, January 10 from 8.30pm to 10pm in Salon E-F
Also, drop by at our poster on Saturday, January 11, 2025 at 10:45 AM-12:15 PM ET in Franklin Hall B
This tutorial will introduce participants to the basics of how to develop and train realistic deep learning models of human spoken language learning (ciwGAN/fiwGAN), as well as techniques to uncover linguistically meaningful representations in deep neural networks trained on raw speech. These deep generative neural networks can not only model speech with remarkably high fidelity from a fully unsupervised objective, but have also been shown to capture a wide range of linguistic phenomena in their latent and intermediate representations.
While many neural models have shown success at modeling raw speech, we will focus on a particular family of models known as generative adversarial networks (GANs; Goodfellow et al. 2014).These models learn to produce speech in a fully unsupervised manner by pitting two neural networks against each other in a zero-sum game, in which one network must learn to produce realistic speech without access to training data but only with the feedback of a separate network. Additionally, the architecture of these networks is the convolutional neural network (CNNs; LeCun et al. 1989), a biologically-inspired architecture with parallels to the human visual system, but which also have implications for speech.
The goal of this tutorial will be to familiarize participants with the GAN framework and the linguistic relevance of these models’ representations. Participants will learn how to train and interpret these models, allowing them to pursue their own research interests on new datasets and languages of interest.
We will begin with an introduction to the basics behind deep neural networks, with a focus on ciw/fiwGAN, one of the most realistic model of human language learning using deep neural networks. We will go over the motivations behind these models’ architecture, objective, and operation, and discuss their relevance for linguistics and human language learning.
We will then discuss techniques to test the causal relationship between learned representation and outputs of the models (Begus, 2021a) and test how these techniques can model rule-based symbolic computation. We will show that these networks do not just imitate training data; rather in their internal representations, they capture the causal structure of a wide range of linguistic phenomena, including phonological rules, allophonic alternations, and reduplication.
We will introduce several interpretability and visualization techniques (Begus & Zhou, 2022) and show how to interpret these raw neural representations. We will further discuss how to apply traditional linguistic analyses to these representations and show striking similarities between these artificial neural representations and representations of speech in the human auditory system taken from neuroimaging data.
Lastly, we will go over the breadth of languages and linguistic phenomena which the model can and has already been used to study. For example, we will demonstrate how articulation can be implemented in the model (Begus et al. 2023). We will also showcase these models’ ability to capture complex linguistic phenomena, such as vowel harmony in Assamese (Barman et al., 2024) and contrastive nasality in French (Chen & Elsner, 2023).
January 10, 2025, 1:45 PM-3:15 PM ET Salon D
Building realistic models of human language
Introduction to fiwGAN/ciwGAN
Basics of PyTorch/TensorFlow
January 10, 2025, 3:30 PM-5:00 PM ET Salon D
Applications to
Different languages
Neuroscience
Articulatory modeling