Publications
For an updated list of my publications, please see my Google Scholar.
2024
2024
S. Gupta, M. Ravanelli, P. Germain, C. Subakan, "Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice", Proc. of Interspeech, 2024. [pdf]
S. Gupta, M. Ravanelli, P. Germain, C. Subakan, "Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice", Proc. of Interspeech, 2024. [pdf]
F. Paissan, L. Della Libera, Z. Wang, M. Ravanelli, P. Smaragdis, C. Subakan, "Audio Editing with Non-Rigid Text Prompts", Proc. of Interspeech, 2024. [pdf]
F. Paissan, L. Della Libera, Z. Wang, M. Ravanelli, P. Smaragdis, C. Subakan, "Audio Editing with Non-Rigid Text Prompts", Proc. of Interspeech, 2024. [pdf]
L. Zampierin, G. B. Hacene, B. Nguyen, M. Ravanelli, "SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning", Proc. of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 [pdf]
L. Zampierin, G. B. Hacene, B. Nguyen, M. Ravanelli, "SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning", Proc. of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 [pdf]
S. Mdhaffar, F. Bougares, R. De Mori, S. Zaiem, M. Ravanelli, Y. Estève, "TARIC-SLU: A Tunisian Benchmark Dataset for Spoken Language Understanding", Proc. of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING), 2024. [pdf]
S. Mdhaffar, F. Bougares, R. De Mori, S. Zaiem, M. Ravanelli, Y. Estève, "TARIC-SLU: A Tunisian Benchmark Dataset for Spoken Language Understanding", Proc. of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING), 2024. [pdf]
2023
2023
C. Subakan, M Ravanelli, S Cornell, F Grondin, M Bronzi, "Exploring self-attention mechanisms for speech separation", IEEE/ACM Transactions on Audio, Speech, and Language Processing 31, 2169-2180, Vol 12, 2023 [pdf]
C. Subakan, M Ravanelli, S Cornell, F Grondin, M Bronzi, "Exploring self-attention mechanisms for speech separation", IEEE/ACM Transactions on Audio, Speech, and Language Processing 31, 2169-2180, Vol 12, 2023 [pdf]
J. Hwang, M. Hira, C. Chen, X. Zhang, Z. Ni, G. Sun, P. Ma, R. Huang, V. Pratap, Y. Zhang, A. Kumar, C.-Y. Yu, C. Zhu, C. Liu, J. Kahn, M. Ravanelli, P. Sun, S. Watanabe, Y. Shi, Y. Tao, "TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch", In Proc. of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2023, [pdf] [code]
J. Hwang, M. Hira, C. Chen, X. Zhang, Z. Ni, G. Sun, P. Ma, R. Huang, V. Pratap, Y. Zhang, A. Kumar, C.-Y. Yu, C. Zhu, C. Liu, J. Kahn, M. Ravanelli, P. Sun, S. Watanabe, Y. Shi, Y. Tao, "TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch", In Proc. of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2023, [pdf] [code]
A. Sarfi, Z. Karimpour, M. Chaudhary, N. Khalid, M. Ravanelli, S. Mudur, E. Belilovsky, "Simulated Annealing in Early Layers Leads to Better Generalization", Proc. of the IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) [pdf]
A. Sarfi, Z. Karimpour, M. Chaudhary, N. Khalid, M. Ravanelli, S. Mudur, E. Belilovsky, "Simulated Annealing in Early Layers Leads to Better Generalization", Proc. of the IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) [pdf]
S. Zaiem, R. Algayres, T. Parcollet, S. Essid, M. Ravanelli, "Fine-tuning strategies for faster inference using speech self-supervised models: a comparative study", Proc. of ICASSP 2023 [pdf]
S. Zaiem, R. Algayres, T. Parcollet, S. Essid, M. Ravanelli, "Fine-tuning strategies for faster inference using speech self-supervised models: a comparative study", Proc. of ICASSP 2023 [pdf]
D. Beaini, S. Huang, J. A. Cunha, G. Moisescu-Pareja, O. Dymov, S. Maddrell-Mander, C. McLean, F. Wenkel, L. Müller, J. Hussein Mohamud, A. Parviz, M. Craig, M. Koziarski, J. Lu, Z. Zhu, C. Gabellini, K. Klaser, J. Dean, C. Wognum, M. Sypetkowski, G. Rabusseau, R. Rabbany, J. Tang, C. Morris, I. Koutis, M. Ravanelli, G. Wolf, P. Tossou, H. Mary, T. Bois, A. Fitzgibbon, B. Banaszewski, C. Martin, D. Masters, "Towards foundational models for molecular learning on large-scale multi-task datasets", Proc. of ICLR 2023 [pdf]
D. Beaini, S. Huang, J. A. Cunha, G. Moisescu-Pareja, O. Dymov, S. Maddrell-Mander, C. McLean, F. Wenkel, L. Müller, J. Hussein Mohamud, A. Parviz, M. Craig, M. Koziarski, J. Lu, Z. Zhu, C. Gabellini, K. Klaser, J. Dean, C. Wognum, M. Sypetkowski, G. Rabusseau, R. Rabbany, J. Tang, C. Morris, I. Koutis, M. Ravanelli, G. Wolf, P. Tossou, H. Mary, T. Bois, A. Fitzgibbon, B. Banaszewski, C. Martin, D. Masters, "Towards foundational models for molecular learning on large-scale multi-task datasets", Proc. of ICLR 2023 [pdf]
2022
2022
Z. Wang, C. Subakan, X. Jiang, J. Wu, E. Tzinis, M. Ravanelli, P. Smaragdis, "Learning Representations for New Sound Classes With Continual Self-Supervised Learning", IEEE Signal Processing Letters, Vol. 29, pp 2607-2611 [pdf]
Z. Wang, C. Subakan, X. Jiang, J. Wu, E. Tzinis, M. Ravanelli, P. Smaragdis, "Learning Representations for New Sound Classes With Continual Self-Supervised Learning", IEEE Signal Processing Letters, Vol. 29, pp 2607-2611 [pdf]
2021
2021
M. Ravanelli and T. Parcollet and P. Plantinga and A. Rouhe and S. Cornell and L. Lugosch and C. Subakan and N. Dawalatabad and A. Heba and J. Zhong and J-C. Chou and S-L. Yeh and S-W. Fu and C-F Liao and E. Rastorgueva and F. Grondin and W. Aris and H. Na and Y. Gao and R. De Mori and Y. Bengio, "SpeechBrain: A General-Purpose Speech Toolkit", arXiv, 2021 [pdf] [website] [code]
M. Ravanelli and T. Parcollet and P. Plantinga and A. Rouhe and S. Cornell and L. Lugosch and C. Subakan and N. Dawalatabad and A. Heba and J. Zhong and J-C. Chou and S-L. Yeh and S-W. Fu and C-F Liao and E. Rastorgueva and F. Grondin and W. Aris and H. Na and Y. Gao and R. De Mori and Y. Bengio, "SpeechBrain: A General-Purpose Speech Toolkit", arXiv, 2021 [pdf] [website] [code]
J. M. Mayor-Torres, M. Ravanelli, S. E. Medina-DeVilliers, M. D. Lerner, G. Riccardi, "Interpretable SincNet-based Deep Learning for Emotion Recognition from EEG brain activity", accepted at the International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2021 [pdf]
J. M. Mayor-Torres, M. Ravanelli, S. E. Medina-DeVilliers, M. D. Lerner, G. Riccardi, "Interpretable SincNet-based Deep Learning for Emotion Recognition from EEG brain activity", accepted at the International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2021 [pdf]
T. Parcollet, M. Ravanelli, "The Energy and Carbon Footprint of Training End-to-End Speech Recognizers", accepted at Interspeech 2021 [pdf]
T. Parcollet, M. Ravanelli, "The Energy and Carbon Footprint of Training End-to-End Speech Recognizers", accepted at Interspeech 2021 [pdf]
A. Lamb, D. He, A. Goyal, G. Ke, C-F Liao, M. Ravanelli, Y. Bengio, "Transformers with Competitive Ensembles of Independent Mechanisms", 2021 [pdf]
A. Lamb, D. He, A. Goyal, G. Ke, C-F Liao, M. Ravanelli, Y. Bengio, "Transformers with Competitive Ensembles of Independent Mechanisms", 2021 [pdf]
2020
2020
X. Qiu, T. Parcollet, M. Ravanelli, N. Lane, M. Morchid, "Quaternion Neural Networks for Multi-channel Distant Speech Recognition", In Proc. of Interspeech 2020. [pdf]
X. Qiu, T. Parcollet, M. Ravanelli, N. Lane, M. Morchid, "Quaternion Neural Networks for Multi-channel Distant Speech Recognition", In Proc. of Interspeech 2020. [pdf]
2019
2019
T. Parcollet, M. Ravanelli, M. Morchid, G. Linarès, C. Trabelsi, R. De Mori, Y. Bengio, "Quaternion Recurrent Neural Networks", in Proc. of ICLR 2019. [pdf]
T. Parcollet, M. Ravanelli, M. Morchid, G. Linarès, C. Trabelsi, R. De Mori, Y. Bengio, "Quaternion Recurrent Neural Networks", in Proc. of ICLR 2019. [pdf]
2018
2018
T. Parcollet, M. Ravanelli, M. Morchid, G. Linarès, R. De Mori, “Speech Recognition with Quaternion Neural Networks”, in Proc. of NIPS@IRASL 2018 [pdf]
T. Parcollet, M. Ravanelli, M. Morchid, G. Linarès, R. De Mori, “Speech Recognition with Quaternion Neural Networks”, in Proc. of NIPS@IRASL 2018 [pdf]
M. Ravanelli, M. Omologo, "Automatic context window composition for distant speech recognition", Speech Communication, 2018. [published] [preprint] [bib]
M. Ravanelli, M. Omologo, "Automatic context window composition for distant speech recognition", Speech Communication, 2018. [published] [preprint] [bib]
2017
2017
2016
2016
M. Matassoni, M.Ravanelli, S. Jalalvand, A. Brutti, "The FBK system for the CHiME-4 challenge" In Proceedings of the CHiME 4 challenge 2016. [pdf]
M. Matassoni, M.Ravanelli, S. Jalalvand, A. Brutti, "The FBK system for the CHiME-4 challenge" In Proceedings of the CHiME 4 challenge 2016. [pdf]
2015
2015
M. Ravanelli, B. Elizalde, J. Bernd, G. Friedland, "Insights into Audio-Based Multimedia Event Classification with Neural Networks", in Proceedings of ACM@MMCOMMONS. [pdf]
M. Ravanelli, B. Elizalde, J. Bernd, G. Friedland, "Insights into Audio-Based Multimedia Event Classification with Neural Networks", in Proceedings of ACM@MMCOMMONS. [pdf]
M. Ravanelli, M. Omologo, "Contaminated speech training methods for robust DNN-HMM distant speech recognition", in Proceedings of INTERSPEECH 2015, Dresden, pp. 756-759. [pdf]
M. Ravanelli, M. Omologo, "Contaminated speech training methods for robust DNN-HMM distant speech recognition", in Proceedings of INTERSPEECH 2015, Dresden, pp. 756-759. [pdf]
E. Zwyssig, M. Ravanelli, P. Svaizer, M. Omologo, "A multi-channel corpus for distant-speech interaction in presence of known Interferences", in Proceedings of ICASSP 2015, Brisbane, Australia.
E. Zwyssig, M. Ravanelli, P. Svaizer, M. Omologo, "A multi-channel corpus for distant-speech interaction in presence of known Interferences", in Proceedings of ICASSP 2015, Brisbane, Australia.
2014
2014
M. Ravanelli, M. Omologo, "On the selection of the impulse responses for distant-speech recognition based on contaminated speech training", in Proceedings of INTERSPEECH 2014, Singapore, pp. 1028-1032. [pdf]
M. Ravanelli, M. Omologo, "On the selection of the impulse responses for distant-speech recognition based on contaminated speech training", in Proceedings of INTERSPEECH 2014, Singapore, pp. 1028-1032. [pdf]
M. Matassoni, R. Astudillo, A. Katsamanis, M. Ravanelli, “The DIRHA-GRID corpus: baseline and tools for multi-room distant speech recognition using distributed microphones”, in Proceedings of INTERSPEECH 2014, Singapore, pp.1613-1617. [pdf] [short example1] [short example2]
M. Matassoni, R. Astudillo, A. Katsamanis, M. Ravanelli, “The DIRHA-GRID corpus: baseline and tools for multi-room distant speech recognition using distributed microphones”, in Proceedings of INTERSPEECH 2014, Singapore, pp.1613-1617. [pdf] [short example1] [short example2]
A Brutti, M. Ravanelli, M Omologo, "SASLODOM: Speech Activity detection and Speaker LOcalization in DOMestic environments" Proceedings of Evalita 2014, Pisa, Italy. [pdf]
A Brutti, M. Ravanelli, M Omologo, "SASLODOM: Speech Activity detection and Speaker LOcalization in DOMestic environments" Proceedings of Evalita 2014, Pisa, Italy. [pdf]
M. Ravanelli, V.H. Do, A. Janin, "TANDEM-Bottleneck Feature Combination using Hierarchical Deep Neural Networks", in Proceedings of the International Symposium on Chinese Spoken Language Processing, ISCSLP-2014, Singapore. [pdf]
M. Ravanelli, V.H. Do, A. Janin, "TANDEM-Bottleneck Feature Combination using Hierarchical Deep Neural Networks", in Proceedings of the International Symposium on Chinese Spoken Language Processing, ISCSLP-2014, Singapore. [pdf]
M. Ravanelli, B. Elizalde, K. Ni, G. Friedland, "Audio Concept Classification with Hierarchical Deep Neural Networks", in Proceeding of the European Signal Processing Conference, EUSIPCO 2014, Lisbon, Portugal. [pdf]
M. Ravanelli, B. Elizalde, K. Ni, G. Friedland, "Audio Concept Classification with Hierarchical Deep Neural Networks", in Proceeding of the European Signal Processing Conference, EUSIPCO 2014, Lisbon, Portugal. [pdf]
B. Elizalde, M. Ravanelli, K. Ni, D. Borth, G. Friedland, "Audio-Concept Features and Hidden Markov Models for Multimedia Event Detection", in Proceedings of the Interspeech Workshop on Speech, Language and Audio in Multimedia, SLAM 2014, Penang, Malaysia. [pdf]
B. Elizalde, M. Ravanelli, K. Ni, D. Borth, G. Friedland, "Audio-Concept Features and Hidden Markov Models for Multimedia Event Detection", in Proceedings of the Interspeech Workshop on Speech, Language and Audio in Multimedia, SLAM 2014, Penang, Malaysia. [pdf]
A. Brutti, M. Ravanelli, P. Svaizer, M. Omologo, "A speech event detection and localization task for multiroom environments", in Proceedings of the 4th Joint Workshop on Hands-free Speech Communication and Microphone Array, HSCMA 2014, Nancy, France.
A. Brutti, M. Ravanelli, P. Svaizer, M. Omologo, "A speech event detection and localization task for multiroom environments", in Proceedings of the 4th Joint Workshop on Hands-free Speech Communication and Microphone Array, HSCMA 2014, Nancy, France.
L. Cristoforetti, M. Ravanelli, M. Omologo, A. Sosi, A. Abad, M. Hagmueller, P. Maragos, "The DIRHA simulated corpus", in Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland. [pdf] [short example] [6 multi-channel examples]
L. Cristoforetti, M. Ravanelli, M. Omologo, A. Sosi, A. Abad, M. Hagmueller, P. Maragos, "The DIRHA simulated corpus", in Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland. [pdf] [short example] [6 multi-channel examples]
2013
2013
A. Sosi, F. Brugnara, L. Cristoforetti, M. Matassoni, M. Ravanelli, M. Omologo, "Embedding speech recognition to control lights", in Proceedings of INTERSPEECH 2013, Lion, France. [pdf] [video]
A. Sosi, F. Brugnara, L. Cristoforetti, M. Matassoni, M. Ravanelli, M. Omologo, "Embedding speech recognition to control lights", in Proceedings of INTERSPEECH 2013, Lion, France. [pdf] [video]
B. Elizalde, M. Ravanelli, G. Friedland, "Audio Concept Ranking for Video Event Detection on User-Generated Content", in Proceedings of the Interspeech Workshop on Speech, Language and Audio in Multimedia, SLAM 2013, Marseille, France.[pdf]
B. Elizalde, M. Ravanelli, G. Friedland, "Audio Concept Ranking for Video Event Detection on User-Generated Content", in Proceedings of the Interspeech Workshop on Speech, Language and Audio in Multimedia, SLAM 2013, Marseille, France.[pdf]
M. Ravanelli, A. Sosi, M. Matassoni, M. Omologo, M. Benetti, G. Pedrotti "Distant Talking Speech Recognition in Surgery Room: the DOMHOS project", in Proceedings of AISV 2013, Venice, Italy. [pdf] [awarded]
M. Ravanelli, A. Sosi, M. Matassoni, M. Omologo, M. Benetti, G. Pedrotti "Distant Talking Speech Recognition in Surgery Room: the DOMHOS project", in Proceedings of AISV 2013, Venice, Italy. [pdf] [awarded]
M. Ravanelli, A. Sosi, P. Svaizer, M.Omologo, "Impulse response estimation for robust speech recognition in a reverberant environment", in Proceeding of the European Signal Processing Conference, EUSIPCO 2012, Bucharest, Romania.[pdf]
M. Ravanelli, A. Sosi, P. Svaizer, M.Omologo, "Impulse response estimation for robust speech recognition in a reverberant environment", in Proceeding of the European Signal Processing Conference, EUSIPCO 2012, Bucharest, Romania.[pdf]