Publications

For an updated list of my publications, please see my Google Scholar.

2026

L. Della Libera, C. Subakan, M. Ravanelli, "FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation", In proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026, [pdf], [code].

M. Cervera, F. Paissan, M. Ravanelli, C. Subakan, "Virtual Consistency for Audio Editing", In proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026, [pdf].

M. Elrashid, A. Deschênes, C. Subakan, M. Ravanelli, R. Georges, M. Morin, "Toward Faithful Explanations in Acoustic Anomaly Detection", In proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026, [pdf].

A. J. Zimmer, V. Ravi, P. Espinoza-Lopez, G. P. Kafentzis, M. Ravanelli, S. A. Rahimi, M. Pai, C. Ugarte-Gil, S. Grandjean Lapierre, "Cough Acoustic Analysis Using Artificial Intelligence for COVID-19 Detection: A Comparative Study of Patient Cohorts from Lima, Peru and Montreal, Canada", Annals of Epidemiology, 2026, [pdf].

2025

L. Della Libera, F. Paissan, C. Subakan, M. Ravanelli, "Focalcodec: Low-bitrate speech coding via focal modulation networks", In proceedings of NeurIPS 2025, [pdf], [code], [pretrained model], [demo].

P. Mousavi, G. Maimon, A. Moumen, D. Petermann, J. Shi, H. Wu, H.i Yang, A. Kuznetsova, A. Ploujnikov(*), R. Marxer, B. Ramabhadran, B. Elizalde, L. Lugosch, J. Li, C. Subakan, P. Woodland, M. Kim, H-Y. Lee, S.i Watanabe, Y. Adi, M. Ravanelli, “Discrete Audio Tokens: More Than a Survey!”, Transactions on Machine Learning Research (TMLR), 2025, [pdf]

L. Della Libera, J. Andreoli, D. Dalle Pezze, M. Ravanelli, G. Antonio Susto, "Bayesian Deep Learning for Remaining Useful Life Estimation via Stein Variational Gradient Descent", IEEE Transactions on Automation Science and Engineering, 2025, [pdf].

A. K. Z. Tehrani, A. Tang, M. Ravanelli, G. Cloutier, I. Rafati, B. N. Nguyen, Q.-H. Trinh, I. Rosado-Mendez, H. Rivaz, "From Speech to Sonography: Spectral Networks for Ultrasound Microstructure Classification", IEEE Transactions on Biomedical Engineering (TBME), 2025, [pdf].

D. Borra, E. Magosso, M. Ravanelli, "A protocol for trustworthy EEG decoding with neural networks Engineering and Applications", Neural Networks, Vol 182, 2025, [pdf].

S. Zaiem, Y. Kemiche, T. Parcollet, S. Essid, M. Ravanelli, "Speech self-supervised representations benchmarking: A case for larger probing heads", Computer Speech & Language, Volume 89, January 2025, [pdf].

E. Mancini, F. Paissan, M. Ravanelli, C. Subakan, "LMAC-TD: Producing Time Domain Explanations for Audio Classifiers", In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), [pdf].

Y. Wang, P. Mousavi, A. Ploujnikov, M. Ravanelli, "What Are They Doing? Joint Audio-Speech Co-Reasoning", In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025, [pdf].

P. Plantinga, B. Cordelle, D. Louër, M. Ravanaelli, D. Klein, "Does Language Matter for Early Detection of Parkinson's Disease from Speech?", In Proceedings of the IEEE 35th International Workshop on Machine Learning for Signal Processing (MLSP), 2025, [pdf].

F. Öncel, E. Penaloza, H. Wu, S. Gupta, M. Ravanelli, L. Charlin, "Audio Prototypical Network for Controllable Music Recommendation", In Proceedings of the IEEE 35th International Workshop on Machine Learning for Signal Processing (MLSP), 2025, [pdf].

P. Mousavi, S. Gupta, C. Subakan, M. Ravanelli, "Listen: Learning soft token embeddings for neural audio LLMs", In Proceedings of Interspeech, 2025, [pdf].

Y. Wang, A. Alhmoud, S. Alsahly, M. Alqurishi, M. Ravanelli, "Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down", In Proceedings of Interspeech, 2025, [pdf].

E. Mancini, F. Paissan, P. Torroni, M. Ravanelli, C. Subakan, "Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech", ICASSP 2025 (SPADE Workshop). [pdf].

2024

M. Ravanelli, T. Parcollet, A. Moumen, S. de Langen, C. Subakan, P. Plantinga, Y. Wang, P. Mousavi, L. Della Libera, A. Ploujnikov, F. Paissan, D. Borra, S. Zaiem, Z. Zhao, S. Zhang, G. Karakasidis, S.-L. Yeh, P. Champion, A. Rouhe, R. Braun, F. Mai, J. Zuluaga-Gomez, S. M. Mousavi, A. Nautsch, H. Nguyen, X. Liu, S. Sagar, J. Duret, S. Mdhaffar, G. Laperrière, M. Rouvier, R. De Mori, Y. Estève, "Open-Source Conversational AI with SpeechBrain 1.0", Journal of Machine Learning Research, vol. 25, no. 333, pp. 1-11, 2024. [pdf] [code]

F. Paissan, M. Ravanelli, C. Subakan, "Listenable Maps for Audio Classifiers", In Proc. of the International Conference on Machine Learning (ICML), oral session (1.5% acceptance) [pdf] [code]

D. Borra, F. Paissan, M. Ravanelli, "SpeechBrain-MOABB: An open-source Python library for benchmarking deep neural networks applied to EEG signals", Computers in Biology and Medicine. [pdf] [code]

L. Della Libera, P. Mousavi, S. Zaiem, C. Subakan, M. Ravanelli, "CL-MASR: A continual learning benchmark for multilingual ASR", IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024. [pdf] [code]

G. A. D’Inverno, S. Brugiapaglia, M. Ravanelli, "Generalization Limits of Graph Neural Networks in Identity Effects Learning", Neural Networks, 2025. [pdf].

F. Öncel, M. Bethge, B. Ermis, M. Ravanelli, C. Subakan, Ç. Yıldız, "Adaptation Odyssey in LLMs: Why Does Additional Pretraining Sometimes Fail to Improve?", Proc. of EMNLP 2024. [pdf]

P. Mousavi, J. Duret, S. Zaiem, L. Della Libera, A. Ploujnikov, Cem Subakan, M. Ravanelli, "How Should We Extract Discrete Audio Tokens from Self-Supervised Models?", Proc. of Interspeech, 2024. [pdf] [code]

U. Cappellazzo, D. Falavigna, A. Brutti, M. Ravanelli, "Parameter-efficient transfer learning of audio spectrogram transformers", 2024 IEEE 34th International Workshop on Machine Learning for Signal Processing (MLSP). [pdf]

A. Tur, A. Moumen, M. Ravanelli, "Progres: Prompted generative rescoring on asr n-best", 2024 IEEE Spoken Language Technology Workshop (SLT)[pdf]

S. Gupta, M. Ravanelli, P. Germain, C. Subakan, "Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice", Proc. of Interspeech, 2024. [pdf]

F. Paissan, L. Della Libera, Z. Wang, M. Ravanelli, P. Smaragdis, C. Subakan, "Audio Editing with Non-Rigid Text Prompts", Proc. of Interspeech, 2024. [pdf]

L. Della Libera, C. Subakan, M. Ravanelli, "Focal Modulation Networks for Interpretable Sound Classification", Proc. of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024. [pdf] [code]

L. Della Libera, C. Subakan, M. Ravanelli, S. Cornell, F. Lepoutre, F. Grondin, "Resource-Efficient Separation Transformer", Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024 [pdf] [code]

L. Zampierin, G. B. Hacene, B. Nguyen, M. Ravanelli, "SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning", Proc. of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 [pdf]

S.M. Mousavi, G. Roccabruna, S. Alghisi, M. Rizzoli, M. Ravanelli, G. Riccardi, "Are LLMs Robust for Spoken Dialogues?", In Proc. of the International Workshop on Spoken Dialogue Systems Technology (IWSDS), 2024 [pdf] [code]

S. Mdhaffar, F. Bougares, R. De Mori, S. Zaiem, M. Ravanelli, Y. Estève, "TARIC-SLU: A Tunisian Benchmark Dataset for Spoken Language Understanding", Proc. of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING), 2024. [pdf]

D. Borra, M. Ravanelli, "Explaining Network Decision Provides Insights on the Causal Interaction Between Brain Regions in a Motor Imagery Task", 2024 IAPR Workshop on Artificial Neural Networks in Pattern Recognition (ANNPR 2024).

D. Borra, M. Fraternali, M. Ravanelli, E. Magosso, "Multi-modal Decoding of Reach-to-Grasping from EEG and EMG via Neural Networks", 2024 IAPR Workshop on Artificial Neural Networks in Pattern Recognition (ANNPR 2024).

S. Gupta, I. N. Gomez-Sarmiento, F. A. Mezdari, M. Ravanelli, C. Subakan, "Dynamic HumTrans: Humming Transcription Using CNNs and Dynamic Programming", 2024 IAPR Workshop on Artificial Neural Networks in Pattern Recognition (ANNPR 2024).

2023

C. Subakan, M Ravanelli, S Cornell, F Grondin, M Bronzi, "Exploring self-attention mechanisms for speech separation", IEEE/ACM Transactions on Audio, Speech, and Language Processing 31, 2169-2180, Vol 12, 2023 [pdf]

Y. Wang, M. Ravanelli, A. Yacoubi, "Speech emotion diarization: Which emotion appears when?", In Proc. of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2023, [pdf] [code]

S. Sagar, M. Ravanelli, B. Kiefer, I. Kruijff-Korbayová, J. van Genabith, "RescueSpeech: A German Corpus for Speech Recognition in Search and Rescue Domain", In Proc. of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2023, [pdf] [code]

J. Hwang, M. Hira, C. Chen, X. Zhang, Z. Ni, G. Sun, P. Ma, R. Huang, V. Pratap, Y. Zhang, A. Kumar, C.-Y. Yu, C. Zhu, C. Liu, J. Kahn, M. Ravanelli, P. Sun, S. Watanabe, Y. Shi, Y. Tao, "TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch", In Proc. of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2023, [pdf] [code]

S. Zaiem, Y. Kemiche, T. Parcollet, S. Essid, M. Ravanelli, "Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?", In Proc. of aInterspeech 2023, [pdf] [code]

A. Sarfi, Z. Karimpour, M. Chaudhary, N. Khalid, M. Ravanelli, S. Mudur, E. Belilovsky, "Simulated Annealing in Early Layers Leads to Better Generalization", Proc. of the IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) [pdf]

S. Zaiem, R. Algayres, T. Parcollet, S. Essid, M. Ravanelli, "Fine-tuning strategies for faster inference using speech self-supervised models: a comparative study", Proc. of ICASSP 2023 [pdf]

D. Beaini, S. Huang, J. A. Cunha, G. Moisescu-Pareja, O. Dymov, S. Maddrell-Mander, C. McLean, F. Wenkel, L. Müller, J. Hussein Mohamud, A. Parviz, M. Craig, M. Koziarski, J. Lu, Z. Zhu, C. Gabellini, K. Klaser, J. Dean, C. Wognum, M. Sypetkowski, G. Rabusseau, R. Rabbany, J. Tang, C. Morris, I. Koutis, M. Ravanelli, G. Wolf, P. Tossou, H. Mary, T. Bois, A. Fitzgibbon, B. Banaszewski, C. Martin, D. Masters, "Towards foundational models for molecular learning on large-scale multi-task datasets", Proc. of ICLR 2023 [pdf]

2022

A. Ploujnikov, M. Ravanelli, "SoundChoice: Grapheme-to-Phoneme Models with Semantic Disambiguation", Proc. of Interspeech 2022 [pdf][code][model]

Z. Wang, C. Subakan, X. Jiang, J. Wu, E. Tzinis, M. Ravanelli, P. Smaragdis, "Learning Representations for New Sound Classes With Continual Self-Supervised Learning", IEEE Signal Processing Letters, Vol. 29, pp 2607-2611 [pdf]

C. Subakan, M. Ravanelli, S. Cornell, F. Grondin, "REAL-M: Towards speech separation on real mixtures", Proc. of ICASSP 2022 [pdf][code][model]

S.-W. Fu, C. Yu, K.-H. Hung, M. Ravanelli, Y. Tsao, "Metricgan-u: Unsupervised speech enhancement/dereverberation based only on noisy/reverberated speech", Proc. of ICASSP 2022 [pdf][code]

2021

M. Ravanelli and T. Parcollet and P. Plantinga and A. Rouhe and S. Cornell and L. Lugosch and C. Subakan and N. Dawalatabad and A. Heba and J. Zhong and J-C. Chou and S-L. Yeh and S-W. Fu and C-F Liao and E. Rastorgueva and F. Grondin and W. Aris and H. Na and Y. Gao and R. De Mori and Y. Bengio, "SpeechBrain: A General-Purpose Speech Toolkit", arXiv, 2021 [pdf] [website] [code]

J. M. Mayor-Torres, M. Ravanelli, S. E. Medina-DeVilliers, M. D. Lerner, G. Riccardi, "Interpretable SincNet-based Deep Learning for Emotion Recognition from EEG brain activity", accepted at the International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2021 [pdf]

S-W. Fu, C. Yu, T-A. Hsieh, P. Plantinga, M. Ravanelli, X. Lu, Y. Tsao, "MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement", accepted at Interspeech 2021 [pdf] [code]

N. Dawalatabad, M. Ravanelli, F. Grondin, J. Thienpondt, B. Desplanques, H. Na, "ECAPA-TDNN Embeddings for Speaker Diarization", accepted at Interspeech 2021 [pdf] [code]

T. Parcollet, M. Ravanelli, "The Energy and Carbon Footprint of Training End-to-End Speech Recognizers", accepted at Interspeech 2021 [pdf]

L. Lugosch, P. Papreja, M. Ravanelli, A. Heba, T. Parcollet, "Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers", ArVix, 2021 [pdf] [code]

C. Subakan, M. Ravanelli, S. Cornell, M. Bronzi, J. Zhong, “Attention is All You Need in Speech Separation”, accepted at ICASSP 2021, [pdf] [code]

A. Lamb, D. He, A. Goyal, G. Ke, C-F Liao, M. Ravanelli, Y. Bengio, "Transformers with Competitive Ensembles of Independent Mechanisms", 2021 [pdf]

2020

M. Ravanelli, J. Zhong, S. Pascual, P. Swietojanski, J. Monteiro, J. Trmal, Y. Bengio "Multi-task self-supervised learning for Robust Speech Recognition", Proc. of ICASSP 2020 [pdf] [code]

L. Lugosch, B. Meyer, D. Nowrouzezahrai, M. Ravanelli, "Using Speech Synthesis to Train End-to-End Spoken Language Understanding Models", Proc. of ICASSP 2020 [pdf] [code]

X. Qiu, T. Parcollet, M. Ravanelli, N. Lane, M. Morchid, "Quaternion Neural Networks for Multi-channel Distant Speech Recognition", In Proc. of Interspeech 2020. [pdf]

2019

S. Pascual, M. Ravanelli, J. Serrà, A. Bonafonte, Y. Bengio " Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks", in Proc. of Interspeech 2019. [pdf] [code] [video]

M. Ravanelli, Y. Bengio, "Learning Speaker Representations with Mutual Information", in Proc. of Interspeech 2019. [pdf] [video]

L. Lugosch, M. Ravanelli, P. Ignoto, V. S. Tomar, Y. Bengio, "Speech Model Pre-training for End-to-End Spoken Language Understanding", in Proc. of Interspeech 2019. [pdf] [code]

M. Ravanelli, T. Parcollet, Y. Bengio, "The PyTorch-Kaldi Speech Recognition Toolkit", in Proc. of ICASSP 2019. [pdf] [code] [video]

T. Parcollet, M. Ravanelli, M. Morchid, G. Linarès, C. Trabelsi, R. De Mori, Y. Bengio, "Quaternion Recurrent Neural Networks", in Proc. of ICLR 2019. [pdf]

2018

M. Ravanelli, Y.Bengio, "Interpretable Convolutional Filters with SincNet", in Proc. of NIPS@IRASL 2018. [pdf] [code] [video]

T. Parcollet, M. Ravanelli, M. Morchid, G. Linarès, R. De Mori, “Speech Recognition with Quaternion Neural Networks”, in Proc. of NIPS@IRASL 2018 [pdf]

M. Ravanelli, Y. Bengio, "Speaker Recognition from raw waveform with SincNet", in Proc. of SLT 2018 [pdf] [code] [video]

M. Ravanelli, D. Serdyuk, Y. Bengio,"Twin Regularization for online speech recognition", In Proc. of Interspeech 2018. [pdf] [video]

M. Ravanelli, M. Omologo, "Automatic context window composition for distant speech recognition", Speech Communication, 2018. [published] [preprint] [bib]

M. Ravanelli, P. Brakel, M. Omologo, Y. Bengio, "Light Gated Recurrent Units for Speech Recognition", in IEEE Transactions on Emerging Topics in Computational Intelligence, 2018. [pdf] [bib]

2017

M. Ravanelli, "Deep Learning for Distant Speech Recognition", PhD Thesis, Unitn 2017 [pdf]

M. Ravanelli, P. Brakel, M. Omologo, Y. Bengio, "Improving speech recognition by revising gated recurrent units", in Proceedings of Interspeech 2017 [pdf] [bib]

M. Ravanelli, P. Brakel, M. Omologo, Y. Bengio, "A network of deep neural networks for distant speech recognition", in Proceedings of ICASSP 2017 (best IBM student paper award) [pdf] [bib] [video]

2016

M. Ravanelli, P. Brakel, M. Omologo, Y. Bengio, "Batch-normalized joint training for DNN-based distant speech recognition", in Proceedings of STL 2016 [pdf] [bib]

M. Ravanelli, P. Svaizer, M. Omologo, "Realistic Multi-Microphone Data Simulation for Distant Speech Recognition", in Proceedings of Interspeech 2016. [pdf] [bib]

M. Matassoni, M.Ravanelli, S. Jalalvand, A. Brutti, "The FBK system for the CHiME-4 challenge" In Proceedings of the CHiME 4 challenge 2016. [pdf]

2015

M. Ravanelli, L. Cristoforetti, R. Gretter, M. Pellin, A. Sosi, M. Omologo, "The DIRHA-English corpus and related tasks for distant-speech recognition in domestic environments", in Proceedings of ASRU 2015. [pdf] [bib]

M. Ravanelli, B. Elizalde, J. Bernd, G. Friedland, "Insights into Audio-Based Multimedia Event Classification with Neural Networks", in Proceedings of ACM@MMCOMMONS. [pdf]

M. Ravanelli, M. Omologo, "Contaminated speech training methods for robust DNN-HMM distant speech recognition", in Proceedings of INTERSPEECH 2015, Dresden, pp. 756-759. [pdf]

E. Zwyssig, M. Ravanelli, P. Svaizer, M. Omologo, "A multi-channel corpus for distant-speech interaction in presence of known Interferences", in Proceedings of ICASSP 2015, Brisbane, Australia.

2014

M. Ravanelli, M. Omologo, "On the selection of the impulse responses for distant-speech recognition based on contaminated speech training", in Proceedings of INTERSPEECH 2014, Singapore, pp. 1028-1032. [pdf]

M. Matassoni, R. Astudillo, A. Katsamanis, M. Ravanelli, “The DIRHA-GRID corpus: baseline and tools for multi-room distant speech recognition using distributed microphones”, in Proceedings of INTERSPEECH 2014, Singapore, pp.1613-1617. [pdf] [short example1] [short example2]

A Brutti, M. Ravanelli, M Omologo, "SASLODOM: Speech Activity detection and Speaker LOcalization in DOMestic environments" Proceedings of Evalita 2014, Pisa, Italy. [pdf]

M. Ravanelli, V.H. Do, A. Janin, "TANDEM-Bottleneck Feature Combination using Hierarchical Deep Neural Networks", in Proceedings of the International Symposium on Chinese Spoken Language Processing, ISCSLP-2014, Singapore. [pdf]

M. Ravanelli, B. Elizalde, K. Ni, G. Friedland, "Audio Concept Classification with Hierarchical Deep Neural Networks", in Proceeding of the European Signal Processing Conference, EUSIPCO 2014, Lisbon, Portugal. [pdf]

B. Elizalde, M. Ravanelli, K. Ni, D. Borth, G. Friedland, "Audio-Concept Features and Hidden Markov Models for Multimedia Event Detection", in Proceedings of the Interspeech Workshop on Speech, Language and Audio in Multimedia, SLAM 2014, Penang, Malaysia. [pdf]

A. Brutti, M. Ravanelli, P. Svaizer, M. Omologo, "A speech event detection and localization task for multiroom environments", in Proceedings of the 4th Joint Workshop on Hands-free Speech Communication and Microphone Array, HSCMA 2014, Nancy, France.

L. Cristoforetti, M. Ravanelli, M. Omologo, A. Sosi, A. Abad, M. Hagmueller, P. Maragos, "The DIRHA simulated corpus", in Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland. [pdf] [short example] [6 multi-channel examples]

2013

A. Sosi, F. Brugnara, L. Cristoforetti, M. Matassoni, M. Ravanelli, M. Omologo, "Embedding speech recognition to control lights", in Proceedings of INTERSPEECH 2013, Lion, France. [pdf] [video]

B. Elizalde, M. Ravanelli, G. Friedland, "Audio Concept Ranking for Video Event Detection on User-Generated Content", in Proceedings of the Interspeech Workshop on Speech, Language and Audio in Multimedia, SLAM 2013, Marseille, France.[pdf]

M. Ravanelli, A. Sosi, M. Matassoni, M. Omologo, M. Benetti, G. Pedrotti "Distant Talking Speech Recognition in Surgery Room: the DOMHOS project", in Proceedings of AISV 2013, Venice, Italy. [pdf] [awarded]

M. Ravanelli, A. Sosi, P. Svaizer, M.Omologo, "Impulse response estimation for robust speech recognition in a reverberant environment", in Proceeding of the European Signal Processing Conference, EUSIPCO 2012, Bucharest, Romania.[pdf]

Page updated

Google Sites

Report abuse