ANALYSIS OF SELF-SUPERVISED SPEECH MODELS ON CHILDREN’S SPEECH AND INFANT VOCALIZATIONS

Jialu Li1,2, Mark Hasegawa-Johnson1,2, Nancy L. McElwain2,3

1Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign

2Beckman Institute for Advanced Science and Technology, University of Illinois Urbana-Champaign

3Department of Human Development and Family Studies, University of Illinois Urbana-Champaign


 This site contains supplementary materials for the original paper including:

CCA scores

CCA scores for three pretrained self-supervised learning models

W2V2-LL4300h

W2V2-Base

HuBERT-Base

CCA scores for four fine-tuned W2V2 models

W2V2-Libri100h

W2V2-MyST

W2V2-Libri100h-Pro

W2V2-MyST-Pro

CCA scores for four fine-tuned HuBERT models

HuBERT-Libri100h

HuBERT-MyST

HuBERT-Libri100h-Pro

HuBERT-MyST-Pro

T-SNE Plots

Methods: 

We randomly selected 200 samples for each phone per dataset to draw T-SNE plots. We show sample T-SNE plots below of layers 1, 4, 8, and 12 for W2V2-Libri100h, W2V2-MyST, and W2V2-MyST-Pro for sample vowels (/i/ and /ɑ/) and consonant (/p/ and /d/). W2V2-Libri100h-Pro follows a similar pattern to W2V2-MyST-Pro.


Observations: 

plots of /i/

plots of /ɑ/

plots of /p/

plots of /d/

Code and model weights

You may find code and model weights in the following links:

Paper/BibTex Citation

If you found our paper helpful, please cite us as

@inproceedings{li2024analysis,

  title={Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations},

  author={Li, Jialu and Hasegawa-Johnson, Mark and McElwain, Nancy L},

  booktitle={IEEE Workshop on Self-Supervision in Audio, Speech and Beyond (SASB)},

  year={2024}

}

Contact

Jialu Li (she, her, hers)

Ph.D candidate @ Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign

E-mail: jialuli3@illinois.edu

Homepage: https://sites.google.com/view/jialuli/

Our team: https://littlebeats.hdfs.illinois.edu/team/