IEEE ICASSP 2024 workshop

Program

Seoul, South Korea

Program:

The SASB 2024 workshop will be held on the 14th of April 2024 (morning and afternoon) at the same venue from the ICASSP 2024 conference (COEX), Room 104. Detailed poster session information can be found at the end of this page. Information regarding invited talks is available at its dedicated page.

Morning (8.30 am - 12 pm) (9am 12.30 pm)

8.30 am - 8.40 am 9 am - 9.05 am Workshop opening remarks
8.40 am - 9.20 am 9.05 am - 9.45 am Keynote Talk: Nancy F. Chen - A*STAR (Chair Paola Garcia)

"SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning"

9.20am - 10 am 9.50 am - 10.25 am Keynote Talk: Ann Lee - Meta (Chair Chao-Han Huck Yang)

"Self-Supervised Learning in Real-life Speech and Audio Technology"

10 am - 10.30 am Coffee Break
10.10 am - 12 pm 10.30 am 12.30 pm Poster Session 1 (Chair Marcely Zanon Boito)

Afternoon (2 pm - 5.45 pm)

2 pm - 2.40 pm Keynote Talk: Sanjeev Khudanpur - Johns Hopkins University (Chair Shinji Watanabe)

"What Will It Take to Get Past the SSL Hype?"

2.40 pm - 3.50 pm Poster Session 2 (Chair Salima Mdhaffar)
3.30 pm - 4.00 pm Coffee Break
4.00 pm - 5.40 pm Panel (Chair Hung-yi Lee)

Joon Son Chung - KAIST - "Multi-modal learning of audio representations"
Mark A. Hasegawa-Johnson - University of Illinois - "Unsupervised and Self-Supervised Learning in Theory and Practice"
Kyungmin Lee - Samsung Research- "Self-Supervised Learning for the Commercial Voice Recognition Systems"

5.40 pm - 5.45 pm Workshop closing remarks

Poster Session 1 (10.10 am - 12 am)

Location: Room 104 (same than workshop).

Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations

Jialu Li, Mark Hasegawa-Johnson, Nancy McElwain

CochCeps-Augment: A Novel Self-Supervised Contrastive Learning Using Cochlear Cepstrum-Based Masking for Speech Emotion Recognition

Ioannis Ziogas, Hessa Alfalahi, Ahsan Khandoker, Leontios Hadjileontiadis

CROSS-LINGUAL TRANSFER LEARNING FOR LOW-RESOURCE SPEECH TRANSLATION

Sameer Khurana, Nauman Dawalatabad, Antoine Laurent, Luis Vicente, Pablo Gimeno, Victoria Mingote, James Glass

Enhancing GAN-Based Vocoders with Contrastive Learning Under Data-limited Condition

Haoming Guo, Seth Zhao, Jiachen Lian, Gerald Friedland, Gopala Anumanchipalli

INTEGRATING SELF-SUPERVISED SPEECH MODEL WITH PSEUDO WORD-LEVEL TARGETS FROM VISUALLY-GROUNDED SPEECH MODEL

Hung-Chieh Fang, Nai-Xuan Ye, Yi-Jen Shih, Puyuan Peng, Hsuan-Fu Wang, Layne Berry, Hung-yi Lee, David Harwath

Investigating Self-Supervised Features for Expressive, Multilingual Voice Conversion

Álvaro Martín-Cortinas, Daniel Sáez-Trigueros, Grzegorz Beringer, Iván Vallés-Pérez, Roberto Barra-Chicote, Biel Tura-Vecino, Adam Gabryś, Thomas Merritt, Piotr Biliński, Jaime Lorenzo-Trueba

INVESTIGATING ZERO-SHOT GENERALIZABILITY ON MANDARIN-ENGLISH CODE-SWITCHED ASR AND SPEECH-TO-TEXT TRANSLATION OF RECENT FOUNDATION MODELS WITH SELF-SUPERVISION AND WEAK SUPERVISION

Chih-Kai Yang, Kuan Po Huang, Ke-Han Lu, Chun-Yi Kuan, Chi-Yuan Hsiao, Hung-yi Lee

Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition

Alexandra Saliba, Yuanchao Li, Ramon Sanabria, Catherine Lai (presented by a non-author)

Low-resource Cross-domain Singing Voice Synthesis via Reduced Self-supervised Speech Representations

Panagiotis Kakoulidis, Nikolaos Ellinas, Georgios Vamvoukakis, Myrsini Christidou, Alexandra Vioni, Georgia Maniati, Junkwang Oh, Gunu Jho, Inchul Hwang, Pirros Tsiakoulis, Aimilios Chalamandaris

Low-Resourced Phonetic and Prosodic Feature Estimation with Self-Supervised-Learning-Based Acoustic Modeling

Kiyoshi Kurihara, Masanori Sano

noise robust distillation of self-supervised speech models via correlation metrics

Fabian Ritter-Gutierrez, Kuan-Po Huang, Dianwen Ng, Jeremy Wong, Hung-yi Lee, Eng Siong Chng, Nancy Chen

Open Implementation and Study of BEST-RQ for Speech Processing

Ryan Whetten, Titouan Parcollet, Marco Dinarelli, Yannick Estève

PEFT for Speech: Unveiling Optimal Placement, Merging Strategies, and Ensemble Techniques

Tzu-Han Lin, How-Shing Wang, Hao-Yung Weng, Kaung-Chen Peng, Zih-Ching Chen, Hung-yi Lee

Probing Self-supervised Learning Models with Target Speech Extraction

Junyi Peng, Marc Delcroix, Tsubasa Ochiai, Oldrich Plchot, Takanori Ashihara, Shoko Araki, Honza Černocký

SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning

Luca Zampierin, Ghouthi Boukli Hacene, Bac Nguyen, Mirco Ravanelli

SOA: Reducing domain mismatch in SSL Pipeline by Speech Only Adaptation for low resource ASR

Natarajan Balaji Shankar, Ruchao Fan, Abeer Alwan

SYNTHETIC SPEECH DETECTION WITH WAV2VEC 2.0 IN VARIOUS LANGUAGE SETTINGS

Branimir Dropuljić, Miljenko Šuflaj, Andrej Jertec, Leo Obadić

Training Early-Exit Architectures for Automatic Speech Recognition: Fine-Tuning Pre-Trained Models or Training from Scratch

George August Wright, Umberto Cappellazzo, Salah Zaiem, Desh Raj, Lucas Ondel Yang, Daniele Falavigna, Mohamed Nabih Ali, Alessio Brutti

Uncertainty as a Predictor: Leveraging Self-Supervised Learning for Zero-Shot MOS Prediction

Aditya Ravuri, Erica Cooper, Junichi Yamagishi

AUTHORS: Please, do not forget to remove your poster before 1.30 pm.

Poster Session 2 (2.40 pm - 3.50 pm)

Location: Room 104 (same than workshop).

A Study on the Impact of Self-Supervised Learning on Automatic Dysarthric Speech Assessment

Xavier Cadet, Ranya Aloufi, Sara Ahmadi-Abhari, Hamed Haddadi

ACOUSTIC-TO-ARTICULATORY INVERSION FOR DYSARTHRIC SPEECH: ARE PRE-TRAINED SELF-SUPERVISED REPRESENTATIONS FAVORABLE?

Sarthak Kumar Maharana, Krishna Kamal Adidam, Shoumik Nandi, Ajitesh Srivastava (presented by a non-author)

Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency

Sungho Jeon, Ching-Feng Yeh, Hakan Inan, Wei-Ning Hsu, Rashi Rungta, Yashar Mehdad, Daniel Bikel

Benchmarking Representations for Speech, Music, and Acoustic Events

Moreno La Quatra, Alkis Koudounas, Lorenzo Vaiani, Elena Baralis, Luca Cagliero, Paolo Garza, Sabato Marco Siniscalchi

EXPLORING FEDERATED SELF-SUPERVISED LEARNING FOR GENERAL-PURPOSE AUDIO UNDERSTANDING

Yasar Abbas UR Rehman, Kin Wai Lau, Yuyang Xie, Lan Ma, Jiajun Shen

Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning

Alain Riou, Stefan Lattner, Gaëtan Hadjeres, Geoffroy Peeters

Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification

Anirudh S. Sundar, Chao-Han Huck Yang, David M. Chan, Shalini Ghosh, Venkatesh Ravichandran, Phani Sankar Nidadavolu

On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification (best paper award)

Calum Heggan, Sam Budgett, Timothy Hospedales, Mehrdad Yaghoobi

Positive and Negative Sampling Strategies for Self-supervised Learning on Audio-Video Data

Shanshan Wang, Soumya Tripathy, Toni Heittola, Annamaria Mesaros

Self-Supervised Learning for Few-Shot Bird Sound Classification

Ilyass Moummad, Romain Serizel, Nicolas Farrugia

SPEECHCLIP+: SELF-SUPERVISED MULTI-TASK REPRESENTATION LEARNING FOR SPEECH VIA CLIP AND SPEECH-IMAGE DATA

Hsuan-Fu Wang, Yi-Jen Shih, Heng-Jui Chang, Layne Berry, Puyuan Peng, Hung-yi Lee, Hsin-Min Wang, David Harwath

VICMUS: VARIANCE-INVARIANCE-COVARIANCE REGULARIZATION FOR MUSIC REPRESENTATION LEARNING

Sebastian Löf, Cody Hesse, Carl Thomé, Carlos Lordelo, Jens Ahrens

AUTHORS: Please, do not forget to remove your poster before leaving the venue.

Page updated

Google Sites

Report abuse