Accepted Papers
The following contributions were accepted to the ICASSP SASB 2023 workshop (alphabetically ordered):
A COMPARATIVE STUDY OF SELF-SUPERVISED SPEECH REPRESENTATIONS IN READ AND SPONTANEOUS TTS, Wang, Siyang; Henter, Gustav Eje; Gustafson, Joakim; Szekely, Eva
A Reference-less Quality Metric for Automatic Speech Recognition via Contrastive-Learning of a Multi-Language Model with Self-Supervision, Yuksel, Kamer A; Ferreira, Thiago; Gündüz, Ahmet; Elbadrashiny, Mohamed; Javadi, Golara
A vector quantized masked autoencoder for speech emotion recognition, Sadok, Samir; Leglaive, Simon; SEGUIER, Renaud
AudioSlots: A slot-centric generative model for audio separation, Reddy, Pradyumna; Wisdom, Scott; Greff, Klaus; Hershey, John; Kipf, Thomas
Channel-Aware Pretraining of Joint Encoder-Decoder Self-Supervised Model for Telephonic-Speech ASR, Sukhadia, Vrunda N; Umesh, S
CHAPTER: Exploiting Convolutional Neural Network Adapters for Self-supervised Speech Models, Chen, Zih-Ching; Sung, Yu-Shun; Lee, Hung-yi
Deep Investigation of Intermediate Representations in Self-Supervised Learning Models for Speech Emotion Recognition, Zhu, Zhi; Sato, Yoshinao
Efficient Utilization of Large Pre-Trained Models for Low Resource ASR, Vieting, Peter; Lüscher, Christoph M.; Dierkes, Julian; Schlüter, Ralf; Ney, Hermann
Fine-tuning Strategies for Faster Inference using Speech Self-Supervised Models : A Comparative Study, Zaiem, Salah; Algayres, Robin; Parcollet, Titouan; Essid, Slim; Ravanelli, Mirco
Improving DINO-based self-supervised speaker verification with progressive cluster-aware training, Han, Bing; Huang, Wen; Chen, Zhengyang; Qian, Yanmin
Improving Label-deficient Keyword Spotting Through Self-supervised Pretraining, Bovbjerg, Holger S; Tan, Zheng-Hua
Investigation of the quality of pseudo-labels for the self-supervised speaker verification task, Fathan, Abderrahim; Alam, Jahangir ; kang, woohyun
Measuring the Impact of Domain Factors in Self-Supervised Pre-Training, Sanabria, Ramon S; Hsu, Wei-Ning; Baevski, Alexei; Auli, Michael
Pac-HuBERT: Self-Supervised Music Source Separation via Primitive Auditory Clustering and Hidden-Unit BERT, Chen, Ke; Wichern, Gordon; Germain, François G; LeRoux, Jonathan
Phone and speaker spatial organization in self-supervised speech representations, Riera, Pablo E; Cerdeiro, Manuela; Pepino, Leonardo D; Ferrer, Luciana
Self-supervised audio encoder with contrastive pretraining for Respiratory Anomaly Detection, Kulkarni, Shubham; Watanabe, Hideaki; Homma, Fuminori
Self-supervised learning for infant cry analysis, Gorin, Arsenii; Subakan, Cem; Abdoli, Sajjad; Wang, Junhao; Latremouille, Samantha; Onu, Charles C
Specialized semantic enrichment of speech representations, Laperrière, Gaëlle; Nguyen, Ha; ghannay, Sahar; Jabaian, Bassam; Estève, Yannick
UNFUSED : UNsupervised Finetuning Using SElf supervised Distillation, Seth, Ashish; Ghosh, Sreyan; Umesh, S; Manocha, Dinesh
Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model, Fujita, Kenichi; Ashihara, Takanori; Kanagawa, Hiroki; Moriya, Takafumi; Ijima, Yusuke