IEEE ICASSP 2024 workshop
Accepted Papers
Seoul, South Korea
Accepted Papers
The following contributions were accepted to the ICASSP SASB 2024 workshop (alphabetically ordered):
A Study on the Impact of Self-Supervised Learning on Automatic Dysarthric Speech Assessment
Xavier Cadet, Ranya Aloufi, Sara Ahmadi-Abhari, Hamed Haddadi
ACOUSTIC-TO-ARTICULATORY INVERSION FOR DYSARTHRIC SPEECH: ARE PRE-TRAINED SELF-SUPERVISED REPRESENTATIONS FAVORABLE?
Sarthak Kumar Maharana, Krishna Kamal Adidam, Shoumik Nandi, Ajitesh Srivastava
Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations
Jialu Li, Mark Hasegawa-Johnson, Nancy McElwain
Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency
Sungho Jeon, Ching-Feng Yeh, Hakan Inan, Wei-Ning Hsu, Rashi Rungta, Yashar Mehdad, Daniel Bikel
Benchmarking Representations for Speech, Music, and Acoustic Events
Moreno La Quatra, Alkis Koudounas, Lorenzo Vaiani, Elena Baralis, Luca Cagliero, Paolo Garza, Sabato Marco Siniscalchi
CochCeps-Augment: A Novel Self-Supervised Contrastive Learning Using Cochlear Cepstrum-Based Masking for Speech Emotion Recognition
Ioannis Ziogas, Hessa Alfalahi, Ahsan Khandoker, Leontios Hadjileontiadis
CROSS-LINGUAL TRANSFER LEARNING FOR LOW-RESOURCE SPEECH TRANSLATION
Sameer Khurana, Nauman Dawalatabad, Antoine Laurent, Luis Vicente, Pablo Gimeno, Victoria Mingote, James Glass
Enhancing GAN-Based Vocoders with Contrastive Learning Under Data-limited Condition
Haoming Guo, Seth Zhao, Jiachen Lian, Gerald Friedland, Gopala Anumanchipalli
EXPLORING FEDERATED SELF-SUPERVISED LEARNING FOR GENERAL-PURPOSE AUDIO UNDERSTANDING
Yasar Abbas UR Rehman, Kin Wai Lau, Yuyang Xie, Lan Ma, Jiajun Shen
INTEGRATING SELF-SUPERVISED SPEECH MODEL WITH PSEUDO WORD-LEVEL TARGETS FROM VISUALLY-GROUNDED SPEECH MODEL
Hung-Chieh Fang, Nai-Xuan Ye, Yi-Jen Shih, Puyuan Peng, Hsuan-Fu Wang, Layne Berry, Hung-yi Lee, David Harwath
Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning
Alain Riou, Stefan Lattner, Gaëtan Hadjeres, Geoffroy Peeters
Investigating Self-Supervised Features for Expressive, Multilingual Voice Conversion
Álvaro Martín-Cortinas, Daniel Sáez-Trigueros, Grzegorz Beringer, Iván Vallés-Pérez, Roberto Barra-Chicote, Biel Tura-Vecino, Adam Gabryś, Thomas Merritt, Piotr Biliński, Jaime Lorenzo-Trueba
INVESTIGATING ZERO-SHOT GENERALIZABILITY ON MANDARIN-ENGLISH CODE-SWITCHED ASR AND SPEECH-TO-TEXT TRANSLATION OF RECENT FOUNDATION MODELS WITH SELF-SUPERVISION AND WEAK SUPERVISION
Chih-Kai Yang, Kuan Po Huang, Ke-Han Lu, Chun-Yi Kuan, Chi-Yuan Hsiao, Hung-yi Lee
Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition
Alexandra Saliba, Yuanchao Li, Ramon Sanabria, Catherine Lai
Low-resource Cross-domain Singing Voice Synthesis via Reduced Self-supervised Speech Representations
Panagiotis Kakoulidis, Nikolaos Ellinas, Georgios Vamvoukakis, Myrsini Christidou, Alexandra Vioni, Georgia Maniati, Junkwang Oh, Gunu Jho, Inchul Hwang, Pirros Tsiakoulis, Aimilios Chalamandaris
Low-Resourced Phonetic and Prosodic Feature Estimation with Self-Supervised-Learning-Based Acoustic Modeling
Kiyoshi Kurihara, Masanori Sano
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Anirudh S. Sundar, Chao-Han Huck Yang, David M. Chan, Shalini Ghosh, Venkatesh Ravichandran, Phani Sankar Nidadavolu
noise robust distillation of self-supervised speech models via correlation metrics
Fabian Ritter-Gutierrez, Kuan-Po Huang, Dianwen Ng, Jeremy Wong, Hung-yi Lee, Eng Siong Chng, Nancy Chen
On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification (best paper award)
Calum Heggan, Sam Budgett, Timothy Hospedales, Mehrdad Yaghoobi
Open Implementation and Study of BEST-RQ for Speech Processing
Ryan Whetten, Titouan Parcollet, Marco Dinarelli, Yannick Estève
PEFT for Speech: Unveiling Optimal Placement, Merging Strategies, and Ensemble Techniques
Tzu-Han Lin, How-Shing Wang, Hao-Yung Weng, Kaung-Chen Peng, Zih-Ching Chen, Hung-yi Lee
Positive and Negative Sampling Strategies for Self-supervised Learning on Audio-Video Data
Shanshan Wang, Soumya Tripathy, Toni Heittola, Annamaria Mesaros
Probing Self-supervised Learning Models with Target Speech Extraction
Junyi Peng, Marc Delcroix, Tsubasa Ochiai, Oldrich Plchot, Takanori Ashihara, Shoko Araki, Honza Černocký
Self-Supervised Learning for Few-Shot Bird Sound Classification
Ilyass Moummad, Romain Serizel, Nicolas Farrugia
SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning
Luca Zampierin, Ghouthi Boukli Hacene, Bac Nguyen, Mirco Ravanelli
SOA: Reducing domain mismatch in SSL Pipeline by Speech Only Adaptation for low resource ASR
Natarajan Balaji Shankar, Ruchao Fan, Abeer Alwan
SPEECHCLIP+: SELF-SUPERVISED MULTI-TASK REPRESENTATION LEARNING FOR SPEECH VIA CLIP AND SPEECH-IMAGE DATA
Hsuan-Fu Wang, Yi-Jen Shih, Heng-Jui Chang, Layne Berry, Puyuan Peng, Hung-yi Lee, Hsin-Min Wang, David Harwath
SYNTHETIC SPEECH DETECTION WITH WAV2VEC 2.0 IN VARIOUS LANGUAGE SETTINGS
Branimir Dropuljić, Miljenko Šuflaj, Andrej Jertec, Leo Obadić
Training Early-Exit Architectures for Automatic Speech Recognition: Fine-Tuning Pre-Trained Models or Training from Scratch
George August Wright, Umberto Cappellazzo, Salah Zaiem, Desh Raj, Lucas Ondel Yang, Daniele Falavigna, Mohamed Nabih Ali, Alessio Brutti
Uncertainty as a Predictor: Leveraging Self-Supervised Learning for Zero-Shot MOS Prediction
Aditya Ravuri, Erica Cooper, Junichi Yamagishi
VICMUS: VARIANCE-INVARIANCE-COVARIANCE REGULARIZATION FOR MUSIC REPRESENTATION LEARNING
Sebastian Löf, Cody Hesse, Carl Thomé, Carlos Lordelo, Jens Ahrens