SPARKS Workshop
Workshop on Speech Foundation Models and their Performance Benchmarks
The Gaia Hotel, Beitou, Taipei (台北市北投區大地酒店)
Scenic View Ballroom (7F) 環景宴會廳
2023/12/16
Transportation
1️⃣ By MRT (Metro) and Walking
Take the MRT to the Xinbeitou MRT Station
Leave the station from Exit 1
Follow the video below for a 10-minute walk
2️⃣ Walk from Asia Pacific Hotel for 15 Minutes
3️⃣ By Shuttle Bus
Morning Shuttle: Departs at 09:20 from Asia Pacific Hotel to The Gaia Hotel.
Evening Shuttle: Departs at 17:35 from The Gaia Hotel to Asia Pacific Hotel.
Description & Motivation
As artificial intelligence continues evolving, foundation models have taken centre stage, playing crucial roles in Computer Vision (CV) and Natural Language Processing (NLP). These models, which include renowned examples such as BERT and GPT for NLP and SimCLR and BYOL for CV, have significantly advanced machine capabilities in these areas. An equivalent surge in popularity has been observed in the realm of speech foundation models, where self-supervised foundation models have showcased their proficiency across a diverse range of tasks [2]. This enthusiasm for foundation models has been echoed in numerous academic events. Workshops at ICML 2020 [3], NeurIPS 2020 [4], AAAI 2022 [5], and ICASSP 2023 [6] have all embraced the promise of these models, eliciting positive feedback and widespread participation [7].
Building on this momentum, this workshop aims to fill a particular niche. Instead of following the broad approach of its predecessors, this workshop will zoom in on a crucial aspect of the field: the benchmarks of speech foundation models. In today's landscape, several benchmarks are being used to assess these models' performance. SUPERB [8] and SUPERB-SG [9] examine a wide range of speech tasks, while SLUE [10][11] concentrates on spoken language understanding. However, the scope of these benchmarks is primarily limited to English. To facilitate a more inclusive analysis, LeBenchmark [12] and IndicSUPERB [13] assess foundation models on French and Indian languages, respectively. XTREME-S [14] and ML-SUPERB [15] evaluate speech foundation models on over 100 languages. Notably, several work-in-progress benchmarks exist, including those for audio-visual foundation models and instruction-finetuning of speech foundation models. The drive to improve and enhance these benchmarks remains a dynamic and ongoing conversation.
Therefore, we have decided to dedicate this workshop to providing a forum for the community to come together and exchange ideas about the technology of speech foundation models, focusing on how to evaluate them accurately. By fostering these discussions, we hope to catalyze further progress in this exciting and rapidly evolving field.
Paper Submission
The workshop will concentrate on soliciting papers using the benchmarks above, but submissions addressing the broader themes of speech foundation models technology are also allowed. We particularly encourage the submission of papers originally intended for the ML-SUPERB challenge but not accepted for ASRU 2023. We also invite papers that either propose a new benchmark for the evaluation of speech foundation models or critique the current benchmarks. The format for papers should conform to that of ASRU 2023. Submissions may be up to 6 pages, but shorter lengths are acceptable.
Paper Submission Deadline: October 19th
Paper Acceptance Notification: November 9th
Time & Location
Venue: The Gaia Hotel, Beitou, Taipei (台北市北投區大地酒店)
Scenic View Ballroom (7F) 環景宴會廳
No. 1, Qiyan Rd, Beitou District, Taipei City
台北市北投區奇岩路一號
National Taiwan University (NTU)
2023/12/16
References
[1] The acronym "SPARKS" is from the first two letters of the first word, "Speech," and the final three letters of the last word, "Benchmarks."
[2] Abdelrahman Mohamed, Hung-yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, Shinji Watanabe, "Self-Supervised Speech Representation Learning: A Review," in IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1179-1210, Oct. 2022
[3] https://icml-sas.gitlab.io/
[4] https://neurips-sas-2020.github.io/
[5] https://aaai-sas-2022.github.io/
[6] https://sites.google.com/view/icassp-sasb-2023/
[7] Most organizers are either on the organization committee or serving as invited speakers of the previous workshops.
[8] Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee, SUPERB: Speech Processing Universal PERformance Benchmark. Proc. Interspeech 2021, 1194-1198
[9] Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy Liu, Cheng-I Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, and Hung-yi Lee. 2022. SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8479–8492
[10] Suwon Shon, Ankita Pasad, Felix Wu, Pablo Brusco, Yoav Artzi, Karen Livescu, Kyu J. Han, "SLUE: New Benchmark Tasks For Spoken Language Understanding Evaluation on Natural Speech," ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022, pp. 7927-7931
[11] Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan S Sharma, Wei-Lun Wu, Hung-yi Lee, Karen Livescu, and Shinji Watanabe. 2023. SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8906–8937, Toronto, Canada. Association for Computational Linguistics.
[12] Solène Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli, Titouan Parcollet, Alexandre Allauzen, Yannick Estève, Benjamin Lecouteux, François Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier, LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech. Proc. Interspeech 2021, 1439-1443
[13] Tahir Javed, Kaushal Santosh Bhogale, Abhigyan Raman, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra, “IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian Languages”, AAAI, vol. 37, no. 11, pp. 12942-12950, Jun. 2023.
[14] Alexis Conneau, Ankur Bapna, Yu Zhang, Min Ma, Patrick von Platen, Anton Lozhkov, Colin Cherry, Ye Jia, Clara Rivera, Mihir Kale, Daan van Esch, Vera Axelrod, Simran Khanuja, Jonathan Clark, Orhan Firat, Michael Auli, Sebastian Ruder, Jason Riesa, Melvin, XTREME-S: Evaluating Cross-lingual Speech Representations. Proc. Interspeech 2022, 3248-3252
[15] Jiatong Shi, Dan Berrebbi, William Chen, Ho-Lam Chung, En-Pei Hu, Wei Ping Huang, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe, ML-SUPERB: Multilingual Speech Universal PERformance Benchmark, Proc. Interspeech 2023