IEEE ICASSP 2024 workshop
Program
Seoul, South Korea
Program:
The SASB 2024 workshop will be held on the 14th of April 2024 (morning and afternoon) at the same venue from the ICASSP 2024 conference (COEX), Room 104. Detailed poster session information can be found at the end of this page. Information regarding invited talks is available at its dedicated page.
Morning (8.30 am - 12 pm) (9am 12.30 pm)
8.30 am - 8.40 am 9 am - 9.05 am Workshop opening remarks
8.40 am - 9.20 am 9.05 am - 9.45 am Keynote Talk: Nancy F. Chen - A*STAR (Chair Paola Garcia)
"SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning"
9.20am - 10 am 9.50 am - 10.25 am Keynote Talk: Ann Lee - Meta (Chair Chao-Han Huck Yang)
"Self-Supervised Learning in Real-life Speech and Audio Technology"
10 am - 10.30 am Coffee Break
10.10 am - 12 pm 10.30 am 12.30 pm Poster Session 1 (Chair Marcely Zanon Boito)
Afternoon (2 pm - 5.45 pm)
2 pm - 2.40 pm Keynote Talk: Sanjeev Khudanpur - Johns Hopkins University (Chair Shinji Watanabe)
"What Will It Take to Get Past the SSL Hype?"
2.40 pm - 3.50 pm Poster Session 2 (Chair Salima Mdhaffar)
3.30 pm - 4.00 pm Coffee Break
4.00 pm - 5.40 pm Panel (Chair Hung-yi Lee)
Joon Son Chung - KAIST - "Multi-modal learning of audio representations"
Mark A. Hasegawa-Johnson - University of Illinois - "Unsupervised and Self-Supervised Learning in Theory and Practice"
Kyungmin Lee - Samsung Research- "Self-Supervised Learning for the Commercial Voice Recognition Systems"
5.40 pm - 5.45 pm Workshop closing remarks
Poster Session 1 (10.10 am - 12 am)
Location: Room 104 (same than workshop).
Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations
Jialu Li, Mark Hasegawa-Johnson, Nancy McElwain
CochCeps-Augment: A Novel Self-Supervised Contrastive Learning Using Cochlear Cepstrum-Based Masking for Speech Emotion Recognition
Ioannis Ziogas, Hessa Alfalahi, Ahsan Khandoker, Leontios Hadjileontiadis
CROSS-LINGUAL TRANSFER LEARNING FOR LOW-RESOURCE SPEECH TRANSLATION
Sameer Khurana, Nauman Dawalatabad, Antoine Laurent, Luis Vicente, Pablo Gimeno, Victoria Mingote, James Glass
Enhancing GAN-Based Vocoders with Contrastive Learning Under Data-limited Condition
Haoming Guo, Seth Zhao, Jiachen Lian, Gerald Friedland, Gopala Anumanchipalli
INTEGRATING SELF-SUPERVISED SPEECH MODEL WITH PSEUDO WORD-LEVEL TARGETS FROM VISUALLY-GROUNDED SPEECH MODEL
Hung-Chieh Fang, Nai-Xuan Ye, Yi-Jen Shih, Puyuan Peng, Hsuan-Fu Wang, Layne Berry, Hung-yi Lee, David Harwath
Investigating Self-Supervised Features for Expressive, Multilingual Voice Conversion
Álvaro Martín-Cortinas, Daniel Sáez-Trigueros, Grzegorz Beringer, Iván Vallés-Pérez, Roberto Barra-Chicote, Biel Tura-Vecino, Adam Gabryś, Thomas Merritt, Piotr Biliński, Jaime Lorenzo-Trueba
INVESTIGATING ZERO-SHOT GENERALIZABILITY ON MANDARIN-ENGLISH CODE-SWITCHED ASR AND SPEECH-TO-TEXT TRANSLATION OF RECENT FOUNDATION MODELS WITH SELF-SUPERVISION AND WEAK SUPERVISION
Chih-Kai Yang, Kuan Po Huang, Ke-Han Lu, Chun-Yi Kuan, Chi-Yuan Hsiao, Hung-yi Lee
Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition
Alexandra Saliba, Yuanchao Li, Ramon Sanabria, Catherine Lai (presented by a non-author)
Low-resource Cross-domain Singing Voice Synthesis via Reduced Self-supervised Speech Representations
Panagiotis Kakoulidis, Nikolaos Ellinas, Georgios Vamvoukakis, Myrsini Christidou, Alexandra Vioni, Georgia Maniati, Junkwang Oh, Gunu Jho, Inchul Hwang, Pirros Tsiakoulis, Aimilios Chalamandaris
Low-Resourced Phonetic and Prosodic Feature Estimation with Self-Supervised-Learning-Based Acoustic Modeling
Kiyoshi Kurihara, Masanori Sano
noise robust distillation of self-supervised speech models via correlation metrics
Fabian Ritter-Gutierrez, Kuan-Po Huang, Dianwen Ng, Jeremy Wong, Hung-yi Lee, Eng Siong Chng, Nancy Chen
Open Implementation and Study of BEST-RQ for Speech Processing
Ryan Whetten, Titouan Parcollet, Marco Dinarelli, Yannick Estève
PEFT for Speech: Unveiling Optimal Placement, Merging Strategies, and Ensemble Techniques
Tzu-Han Lin, How-Shing Wang, Hao-Yung Weng, Kaung-Chen Peng, Zih-Ching Chen, Hung-yi Lee
Probing Self-supervised Learning Models with Target Speech Extraction
Junyi Peng, Marc Delcroix, Tsubasa Ochiai, Oldrich Plchot, Takanori Ashihara, Shoko Araki, Honza Černocký
SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning
Luca Zampierin, Ghouthi Boukli Hacene, Bac Nguyen, Mirco Ravanelli
SOA: Reducing domain mismatch in SSL Pipeline by Speech Only Adaptation for low resource ASR
Natarajan Balaji Shankar, Ruchao Fan, Abeer Alwan
SYNTHETIC SPEECH DETECTION WITH WAV2VEC 2.0 IN VARIOUS LANGUAGE SETTINGS
Branimir Dropuljić, Miljenko Šuflaj, Andrej Jertec, Leo Obadić
Training Early-Exit Architectures for Automatic Speech Recognition: Fine-Tuning Pre-Trained Models or Training from Scratch
George August Wright, Umberto Cappellazzo, Salah Zaiem, Desh Raj, Lucas Ondel Yang, Daniele Falavigna, Mohamed Nabih Ali, Alessio Brutti
Uncertainty as a Predictor: Leveraging Self-Supervised Learning for Zero-Shot MOS Prediction
Aditya Ravuri, Erica Cooper, Junichi Yamagishi
AUTHORS: Please, do not forget to remove your poster before 1.30 pm.
Poster Session 2 (2.40 pm - 3.50 pm)
Location: Room 104 (same than workshop).
A Study on the Impact of Self-Supervised Learning on Automatic Dysarthric Speech Assessment
Xavier Cadet, Ranya Aloufi, Sara Ahmadi-Abhari, Hamed Haddadi
ACOUSTIC-TO-ARTICULATORY INVERSION FOR DYSARTHRIC SPEECH: ARE PRE-TRAINED SELF-SUPERVISED REPRESENTATIONS FAVORABLE?
Sarthak Kumar Maharana, Krishna Kamal Adidam, Shoumik Nandi, Ajitesh Srivastava (presented by a non-author)
Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency
Sungho Jeon, Ching-Feng Yeh, Hakan Inan, Wei-Ning Hsu, Rashi Rungta, Yashar Mehdad, Daniel Bikel
Benchmarking Representations for Speech, Music, and Acoustic Events
Moreno La Quatra, Alkis Koudounas, Lorenzo Vaiani, Elena Baralis, Luca Cagliero, Paolo Garza, Sabato Marco Siniscalchi
EXPLORING FEDERATED SELF-SUPERVISED LEARNING FOR GENERAL-PURPOSE AUDIO UNDERSTANDING
Yasar Abbas UR Rehman, Kin Wai Lau, Yuyang Xie, Lan Ma, Jiajun Shen
Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning
Alain Riou, Stefan Lattner, Gaëtan Hadjeres, Geoffroy Peeters
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Anirudh S. Sundar, Chao-Han Huck Yang, David M. Chan, Shalini Ghosh, Venkatesh Ravichandran, Phani Sankar Nidadavolu
On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification (best paper award)
Calum Heggan, Sam Budgett, Timothy Hospedales, Mehrdad Yaghoobi
Positive and Negative Sampling Strategies for Self-supervised Learning on Audio-Video Data
Shanshan Wang, Soumya Tripathy, Toni Heittola, Annamaria Mesaros
Self-Supervised Learning for Few-Shot Bird Sound Classification
Ilyass Moummad, Romain Serizel, Nicolas Farrugia
SPEECHCLIP+: SELF-SUPERVISED MULTI-TASK REPRESENTATION LEARNING FOR SPEECH VIA CLIP AND SPEECH-IMAGE DATA
Hsuan-Fu Wang, Yi-Jen Shih, Heng-Jui Chang, Layne Berry, Puyuan Peng, Hung-yi Lee, Hsin-Min Wang, David Harwath
VICMUS: VARIANCE-INVARIANCE-COVARIANCE REGULARIZATION FOR MUSIC REPRESENTATION LEARNING
Sebastian Löf, Cody Hesse, Carl Thomé, Carlos Lordelo, Jens Ahrens
AUTHORS: Please, do not forget to remove your poster before leaving the venue.