Publications

For an updated list of my publications,  please see my Google Scholar

2023

S. Zaiem, Y. Kemiche, T. Parcollet, S. Essid, M. Ravanelli, "Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?", accepted at Interspeech 2023 [pdf]

S. Zaiem, R. Algayres, T. Parcollet, S. Essid, M. Ravanelli, "Fine-tuning strategies for faster inference using speech self-supervised models: a comparative study", Proc. of ICASSP 2023 [pdf]

A. Sarfi, Z. Karimpour, M. Chaudhary, N. Khalid, M. Ravanelli, S. Mudur, E. Belilovsky, "Simulated Annealing in Early Layers Leads to Better Generalization", Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) [pdf]

C. Subakan, M. Ravanelli, S. Cornell, F. Grondin, M. Bronzi, "Exploring Self-Attention Mechanisms for Speech Separation", IEEE/ACM Transactions on Audio, Speech, and Language Processing [pdf]

2022

A. Ploujnikov, M. Ravanelli, "SoundChoice: Grapheme-to-Phoneme Models with Semantic Disambiguation", Proc. of Interspeech 2022 [pdf][code][model]

Z. Wang, C. Subakan, X. Jiang, J. Wu, E. Tzinis, M. Ravanelli, P. Smaragdis, "Learning Representations for New Sound Classes With Continual Self-Supervised Learning", IEEE Signal Processing Letters, Vol. 29, pp 2607-2611 [pdf]

C. Subakan, M. Ravanelli, S. Cornell, F. Grondin, "REAL-M: Towards speech separation on real mixtures", Proc. of ICASSP 2022 [pdf][code][model]

S.-W. Fu, C. Yu, K.-H. Hung, M. Ravanelli, Y. Tsao, "Metricgan-u: Unsupervised speech enhancement/dereverberation based only on noisy/reverberated speech",  Proc. of ICASSP 2022 [pdf][code]

2021

M. Ravanelli and T. Parcollet and P. Plantinga and A. Rouhe and S. Cornell and L. Lugosch and C. Subakan and N. Dawalatabad and A. Heba and J. Zhong and J-C. Chou and S-L. Yeh and S-W. Fu and C-F Liao and E. Rastorgueva and F. Grondin and W. Aris and H. Na and Y. Gao and R. De Mori and Y. Bengio, "SpeechBrain: A General-Purpose Speech Toolkit", arXiv, 2021 [pdf] [website] [code]

J. M. Mayor-Torres, M. Ravanelli, S. E. Medina-DeVilliers, M. D. Lerner, G. Riccardi, "Interpretable SincNet-based Deep Learning for Emotion Recognition from EEG brain activity", accepted at the International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2021 [pdf] 

S-W. Fu, C. Yu, T-A. Hsieh, P. Plantinga, M. Ravanelli, X. Lu, Y. Tsao, "MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement", accepted at Interspeech 2021 [pdf] [code] 

N. Dawalatabad, M. Ravanelli, F. Grondin, J. Thienpondt, B. Desplanques, H. Na, "ECAPA-TDNN Embeddings for Speaker Diarization",  accepted at Interspeech 2021 [pdf] [code]

T. Parcollet, M. Ravanelli, "The Energy and Carbon Footprint of Training End-to-End Speech Recognizers", accepted at Interspeech 2021 [pdf]

L. Lugosch, P. Papreja, M. Ravanelli, A. Heba, T. Parcollet, "Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers", ArVix, 2021 [pdf] [code]

C. Subakan, M. Ravanelli, S. Cornell, M. Bronzi, J. Zhong,  “Attention is All You Need in Speech Separation”,  accepted at ICASSP 2021, [pdf] [code]

A. Lamb, D. He, A. Goyal, G. Ke, C-F Liao, M. Ravanelli, Y. Bengio, "Transformers with Competitive Ensembles of Independent Mechanisms",  2021 [pdf]

2020

M. Ravanelli, J. Zhong, S. Pascual, P. Swietojanski, J. Monteiro, J. Trmal, Y. Bengio  "Multi-task self-supervised learning for Robust Speech Recognition", Proc. of ICASSP 2020  [pdf]  [code]

L. Lugosch, B. Meyer, D. Nowrouzezahrai, M. Ravanelli, "Using Speech Synthesis to Train End-to-End Spoken Language Understanding Models", Proc. of ICASSP 2020 [pdf] [code]

X. Qiu, T. Parcollet, M. Ravanelli, N. Lane, M. Morchid, "Quaternion Neural Networks for Multi-channel Distant Speech Recognition", In Proc. of Interspeech 2020. [pdf]

2019

S. Pascual, M. Ravanelli, J. Serrà, A. Bonafonte, Y. Bengio " Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks",  in Proc. of Interspeech 2019. [pdf] [code] [video]

M. Ravanelli, Y. Bengio, "Learning Speaker Representations with Mutual Information",  in Proc. of Interspeech 2019. [pdf] [video]

L. Lugosch, M. Ravanelli, P. Ignoto, V. S. Tomar, Y. Bengio, "Speech Model Pre-training for End-to-End Spoken Language Understanding", in Proc. of Interspeech 2019. [pdf] [code]

M. Ravanelli, T. Parcollet, Y. Bengio, "The PyTorch-Kaldi Speech Recognition Toolkit", in Proc. of ICASSP 2019. [pdf] [code] [video]

T. Parcollet, M. Ravanelli, M. Morchid, G. Linarès, C. Trabelsi, R. De Mori, Y. Bengio, "Quaternion Recurrent Neural Networks", in Proc. of ICLR 2019. [pdf]

2018

M. Ravanelli, Y.Bengio, "Interpretable Convolutional Filters with SincNet", in Proc. of NIPS@IRASL 2018. [pdf] [code] [video]

T. Parcollet, M. Ravanelli, M. Morchid, G. Linarès, R. De Mori, “Speech Recognition with Quaternion Neural Networks”, in Proc. of NIPS@IRASL 2018 [pdf]

M. Ravanelli, Y. Bengio, "Speaker Recognition from raw waveform with SincNet", in Proc. of  SLT 2018 [pdf] [code] [video]

M. Ravanelli, D. Serdyuk, Y. Bengio,"Twin Regularization for online speech recognition", In Proc. of Interspeech 2018. [pdf] [video]

M. Ravanelli, M. Omologo, "Automatic context window composition for distant speech recognition", Speech Communication, 2018. [published] [preprint] [bib]

M. Ravanelli, P. Brakel, M. Omologo, Y. Bengio, "Light Gated Recurrent Units for Speech Recognition", in IEEE Transactions on Emerging Topics in Computational Intelligence, 2018.  [pdf] [bib]

2017

M. Ravanelli, "Deep Learning for Distant Speech Recognition", PhD Thesis, Unitn 2017 [pdf]

M. Ravanelli, P. Brakel, M. Omologo, Y. Bengio, "Improving speech recognition by revising gated recurrent units", in Proceedings of Interspeech 2017 [pdf] [bib]

M. Ravanelli, P. Brakel, M. Omologo, Y. Bengio, "A network of deep neural networks for distant speech recognition", in Proceedings of ICASSP 2017 (best IBM student paper award) [pdf] [bib] [video]

2016

M. Ravanelli, P. Brakel, M. Omologo, Y. Bengio, "Batch-normalized joint training for DNN-based distant speech recognition", in Proceedings of STL 2016 [pdf] [bib]

M. Ravanelli, P. Svaizer, M. Omologo, "Realistic Multi-Microphone Data Simulation for Distant Speech Recognition",  in Proceedings of Interspeech 2016. [pdf] [bib]

M. Matassoni, M.Ravanelli, S. Jalalvand, A. Brutti, "The FBK system for the CHiME-4 challenge"  In Proceedings of the CHiME 4 challenge 2016. [pdf]

2015

M. Ravanelli, L. Cristoforetti, R. Gretter, M. Pellin, A. Sosi, M. Omologo, "The DIRHA-English corpus and related tasks for distant-speech recognition in domestic environments", in Proceedings of ASRU 2015. [pdf] [bib]

M. Ravanelli, B. Elizalde, J. Bernd, G. Friedland,  "Insights into Audio-Based Multimedia Event Classification with Neural Networks", in Proceedings of ACM@MMCOMMONS. [pdf]

M. Ravanelli, M. Omologo, "Contaminated speech training methods for robust DNN-HMM distant speech recognition", in Proceedings of  INTERSPEECH 2015, Dresden, pp. 756-759. [pdf]

E. Zwyssig, M. Ravanelli, P. Svaizer, M. Omologo, "A multi-channel corpus for distant-speech interaction in presence of known Interferences", in Proceedings of  ICASSP 2015,  Brisbane, Australia.

2014

M. Ravanelli, M. Omologo, "On the selection of the impulse responses for distant-speech recognition based on contaminated speech training", in Proceedings of  INTERSPEECH 2014, Singapore, pp. 1028-1032. [pdf]

M. Matassoni, R. Astudillo, A. Katsamanis, M. Ravanelli, “The DIRHA-GRID corpus: baseline and tools for multi-room distant speech recognition using distributed microphones”, in Proceedings of  INTERSPEECH 2014, Singapore, pp.1613-1617. [pdf] [short example1] [short example2] 

A Brutti, M. Ravanelli, M Omologo, "SASLODOM: Speech Activity detection and Speaker LOcalization in DOMestic environments" Proceedings of Evalita 2014, Pisa, Italy. [pdf]

M. Ravanelli, V.H.  Do, A. Janin, "TANDEM-Bottleneck Feature Combination using Hierarchical Deep Neural Networks",  in Proceedings of the International Symposium on Chinese Spoken Language Processing, ISCSLP-2014, Singapore. [pdf]

M. Ravanelli, B. Elizalde,  K. Ni, G. Friedland, "Audio Concept Classification with Hierarchical Deep Neural Networks",  in Proceeding of the European Signal Processing Conference, EUSIPCO 2014, Lisbon, Portugal. [pdf] 

B. Elizalde, M. Ravanelli, K. Ni, D. Borth, G. Friedland, "Audio-Concept Features and Hidden Markov Models for Multimedia Event Detection",  in Proceedings of the Interspeech Workshop on Speech, Language and Audio in Multimedia, SLAM 2014, Penang, Malaysia. [pdf]

A. Brutti, M. Ravanelli, P. Svaizer, M. Omologo, "A speech event detection and localization task for multiroom environments",  in Proceedings of the 4th Joint Workshop on Hands-free Speech Communication and Microphone Array, HSCMA 2014, Nancy, France.

L. Cristoforetti, M. Ravanelli, M. Omologo, A. Sosi, A. Abad, M. Hagmueller, P. Maragos, "The DIRHA simulated corpus",  in Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland. [pdf] [short example] [6 multi-channel examples]

2013

A. Sosi, F. Brugnara, L. Cristoforetti, M. Matassoni, M. Ravanelli,  M. Omologo, "Embedding speech recognition to control lights",  in Proceedings of  INTERSPEECH 2013, Lion, France. [pdf] [video] 

B. Elizalde, M. Ravanelli, G. Friedland, "Audio Concept Ranking for Video Event Detection on User-Generated Content",  in Proceedings of the Interspeech Workshop on Speech, Language and Audio in Multimedia, SLAM 2013, Marseille, France.[pdf]  

M. Ravanelli, A. Sosi, M. Matassoni, M. Omologo, M. Benetti, G. Pedrotti  "Distant Talking Speech Recognition in Surgery Room: the DOMHOS project",  in Proceedings of  AISV 2013, Venice, Italy. [pdf] [awarded]

M. Ravanelli, A. Sosi, P. Svaizer, M.Omologo, "Impulse response estimation for robust speech recognition in a reverberant environment",   in Proceeding of the European Signal Processing Conference, EUSIPCO 2012, Bucharest, Romania.[pdf]