Program

The SASB 2023 workshop will be held on the 10th of June 2023 at the same venue from the ICASSP 2023 conference, at the Jupiter Ballroom. Detailed poster session information can be found at the end of this page. Information regarding invited talks is available at its dedicated page.

Morning (8.30 am - 12 pm)

8.30 am - 8.40 am Workshop opening remarks
8.40 am - 9.20 am Keynote Talk - Hung-Yi Lee (Chair Paola Garcia)
- - - - "Advancing Universal Speech Models Through Self-Supervised Learning: Progress, Challenges, and Future Direction" [SLIDES]
9.20 am - 10.30 am Poster Session 1 (Chair Chenda Li)
10 am - 10.30 am Coffee Break
10.30 am - 12 pm Morning Panel (Chair Marcely Zanon Boito)
- - - - Karen Livescu "What Do Self‐Supervised Speech Representation Models Know? A Layer‐Wise Analysis" [SLIDES]
        Themos Stafylakis "Extracting speaker and emotion information from self-supervised speech models" [SLIDES]
        Shinji Watanabe "Attempts to reproduce large pre-trained models on an academic computing scale" [SLIDES]

Afternoon (1.30 pm - 5.30 pm)

1.50 pm - 2.30 pm Keynote Talk - David Harwath (Chair Titouan Parcollet)
- - - - "Multimodal and Multilingual Self-Supervised Learning for Speech and Audio" [SLIDES]
2.30 pm - 3.10 pm Keynote Talk - Ankur Bapna (Chair Jinyu Li)
- - - - "Improving Self-Supervised Models of Speech by learning from text and NLP" [SLIDES]
3.10 pm - 4.10 pm Poster Session 2 (Chair Marcely Zanon Boito)
3.30 pm - 4.00 pm Coffee Break
4.10 pm - 5.40 pm Afternoon Panel (Chair Paola Garcia)
- - - - Odette Scharenborg "Building speech technology for unwritten languages using visual information" [SLIDES]
        Sanjeev Khudanpur "What Will It Take to Get Past the SSL Hype?"
        David Harwath
5.40 pm - 5.45 pm Workshop closing remarks

Poster Session 1 (9.20 am - 10.15 am)

Location: "WP-A" at Jupiter Lobby (right outside Jupiter Ballroom)

A Reference-less Quality Metric for Automatic Speech Recognition via Contrastive-Learning of a Multi-Language Model with Self-Supervision, Yuksel, Kamer A; Ferreira, Thiago; Gündüz, Ahmet; Elbadrashiny, Mohamed; Javadi, Golara
Channel-Aware Pretraining of Joint Encoder-Decoder Self-Supervised Model for Telephonic-Speech ASR, Sukhadia, Vrunda N; Umesh, S
CHAPTER: Exploiting Convolutional Neural Network Adapters for Self-supervised Speech Models, Chen, Zih-Ching; Sung, Yu-Shun; Lee, Hung-yi
Efficient Utilization of Large Pre-Trained Models for Low Resource ASR, Vieting, Peter; Lüscher, Christoph M.; Dierkes, Julian; Schlüter, Ralf; Ney, Hermann
Fine-tuning Strategies for Faster Inference using Speech Self-Supervised Models : A Comparative Study, Zaiem, Salah; Algayres, Robin; Parcollet, Titouan; Essid, Slim; Ravanelli, Mirco
Improving Label-deficient Keyword Spotting Through Self-supervised Pretraining, Bovbjerg, Holger S; Tan, Zheng-Hua
Measuring the Impact of Domain Factors in Self-Supervised Pre-Training, Sanabria, Ramon S; Hsu, Wei-Ning; Baevski, Alexei; Auli, Michael
Phone and speaker spatial organization in self-supervised speech representations, Riera, Pablo E; Cerdeiro, Manuela; Pepino, Leonardo D; Ferrer, Luciana
Specialized semantic enrichment of speech representations, Laperrière, Gaëlle; Nguyen, Ha; ghannay, Sahar; Jabaian, Bassam; Estève, Yannick
UNFUSED : UNsupervised Finetuning Using SElf supervised Distillation, Seth, Ashish; Ghosh, Sreyan; Umesh, S; Manocha, Dinesh

AUTHORS: Please, do not forget to remove your poster before 1.30 pm.

Poster Session 2 (2.50 pm - 3.45 pm)

Location: "WP-A" at Jupiter Lobby (right outside Jupiter Ballroom)

A COMPARATIVE STUDY OF SELF-SUPERVISED SPEECH REPRESENTATIONS IN READ AND SPONTANEOUS TTS, Wang, Siyang; Henter, Gustav Eje; Gustafson, Joakim; Szekely, Eva
A vector quantized masked autoencoder for speech emotion recognition, Sadok, Samir; Leglaive, Simon; SEGUIER, Renaud
AudioSlots: A slot-centric generative model for audio separation, Reddy, Pradyumna; Wisdom, Scott; Greff, Klaus; Hershey, John; Kipf, Thomas
Deep Investigation of Intermediate Representations in Self-Supervised Learning Models for Speech Emotion Recognition, Zhu, Zhi; Sato, Yoshinao
Improving DINO-based self-supervised speaker verification with progressive cluster-aware training, Han, Bing; Huang, Wen; Chen, Zhengyang; Qian, Yanmin
Investigation of the quality of pseudo-labels for the self-supervised speaker verification task, Fathan, Abderrahim; Alam, Jahangir ; kang, woohyun
Pac-HuBERT: Self-Supervised Music Source Separation via Primitive Auditory Clustering and Hidden-Unit BERT, Chen, Ke; Wichern, Gordon; Germain, François G; LeRoux, Jonathan
Self-supervised audio encoder with contrastive pretraining for Respiratory Anomaly Detection, Kulkarni, Shubham; Watanabe, Hideaki; Homma, Fuminori
Self-supervised learning for infant cry analysis, Gorin, Arsenii; Subakan, Cem; Abdoli, Sajjad; Wang, Junhao; Latremouille, Samantha; Onu, Charles C
Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model, Fujita, Kenichi; Ashihara, Takanori; Kanagawa, Hiroki; Moriya, Takafumi; Ijima, Yusuke (best paper award)

AUTHORS: Please, do not forget to remove your poster before leaving the venue.

Page updated

Google Sites

Report abuse