Publications

My Ph.D. Thesis

Auditory Representation Learning (PDF)

Journal Publications

Q Wang, H. B. Sailor, KA Lee, K Ma, KH Goh, WF Boh, “Using Twitter Dataset for Social Listening in Singapore”, IEEE access [Data release paper].
Kamble, M., Sailor, H., Patil, H., & Li, H. (2020). Advances in anti-spoofing: From the perspective of ASVspoof challenges. APSIPA Transactions on Signal and Information Processing.
Hardik B. Sailor and Hemant A. Patil, "Auditory feature representation using convolutional restricted Boltzmann machine and Teager energy operator for speech recognition", The Journal of the Acoustical Society of America Express Letters (JASA-EL), Volume: 141, Issue: 6, June 2017. (JASA-EL online link) (CODE)
Hardik B. Sailor and Hemant A. Patil, "Novel unsupervised auditory filterbank learning using convolutional RBM for speech recognition," in IEEE/ACM Transactions on Audio, Speech and Language Processing (IEEE TASLP), Volume: 24, Issue: 12, Page(s): 2341 - 2353, December 2016. IEEE Xplore link. (Detailed and extended version of our ICASSP 2016 paper ) (CODE)

Conference Publications

Qiongqiong Wang Hardik B. Sailor, Tianchi Liu, Wenyu Zhang, Muhammad Huzaifah, Nattadaporn Lertcheva, Shuo Sun, Nancy F. Chen, Jinyang Wu, AiTi Aw "Benchmarking Contextual and Paralinguistic Reasoning in Speech-LLMs: A Case Study with In-the-Wild Data", accepted at EMNLP 2025 Findings.
Qiongqiong Wang*, Hardik Sailor*, Jeremy Wong, Tianchi Liu, Shuo Sun, Wenyu Zhang, Muhammad Huzaifah, Nancy Chen and Ai Ti Aw, “Incorporating Contextual Paralinguistic Understanding in Large Speech-Language Models”, to appear in IEEE ASRU 2025. [Equal contribution]
Jeremy Wong, Muhammad Huzaifah, Hardik Sailor, Shuo Sun, Kye Min Tan, Bin Wang, Qiongqiong Wang, Wenyu Zhang, Xunlong Zou, Nancy Chen and Ai Ti Aw, “Diversity and complementarity of speech encoders across diverse tasks in a multi-modal large language model”, to appear in IEEE ASRU 2025.
Zihan Pan, Hardik Sailor and Jinyang Wu, “MoLEx: Mixture of LoRA Experts in Speech Self-Supervised Models for Audio Deepfake Detection”, to appear in IEEE ASRU 2025.
Qiongqiong Wang, Hardik B Sailor, Tianchi Liu, and Ai Ti Aw, “Contextual paralinguistic data creation for multi-modal Speech-LLM: Data condensation and spoken QA generation,” to appear in Proc. Interspeech 2025.
MERaLiON Team, "Towards a speech foundation model for Singapore and beyond." arXiv. https://arxiv.org/abs/2412.11538, 2024
.T Liu, I Kukanov, Z Pan, Q Wang, H. B Sailor, K A Lee, “Quantifying Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing,” IEEE SLT 2024.
A Guragain, T Liu, Z Pan, H. B Sailor, Q Wang, “Speech Foundation Model Ensembles for Singing Voice Deepfake Detection,” IEEE SLT 2024.
Z Pan, T Liu, H. B Sailor, Q Wang, "Attentive Merging of Hidden Embeddings from Pre-trained Speech Model for Anti-spoofing Detection", Interspeech 2024.
A. Kachhi, A. Therattil, A. T. Patil, Hardik B. Sailor and H. A. Patil, "Teager Energy Cepstral Coefficients For Classification of Dysarthric Speech Severity-Level," 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Chiang Mai, Thailand, 2022.
S. S. Chaturvedi, Hardik B. Sailor and H. A. Patil, "Noisy Student Teacher Training with Self Supervised Learning for Children ASR," 2022 IEEE International Conference on Signal Processing and Communications (SPCOM), Bangalore, India, 2022
Pratap Singh, V., Hardik Sailor, Bhattacharya, S. and Pandey, A., 2022. Spectral Modification Based Data Augmentation For Improving End-to-End ASR For Children's Speech. Interspeech 2022.
Kiran Praveen, Hardik Sailor, Abhishek Pandey, "Warped Ensembles: A Novel Technique for improving CTC Based End-To-End Speech Recognition" accepted in IEEE ASRU 2021.
Hardik B. Sailor, Kiran Praveen T, Vikas Agrawal, Abhinav Jain, Abhishek Pandey, "SRI-B End-to-End System for Multilingual and Code-Switching ASR Challenges for Indian Languages", in Interspeech 2021, Czech Republic.
Dipesh K. Singh, Preet P. Amin, Hardik B. Sailor, and Hemant A. Patil1, Data Augmentation Using CycleGAN for End-to-End Children ASR, accepted in EUSIPCO 2021, Dublin, Ireland.
Hardik B. Sailor and Thomas Hain, "Multilingual Speech Recognition Using Language-Specific Phoneme Recognition as Auxiliary Task for Indian Languages ", Interspeech 2020, Shanghai, China.
Hardik B. Sailor, Salil Deena, Md Asif Jalal, Rasa Lileikyte, Thomas Hain, Unsupervised Adaptation of Acoustic Models for ASR using Utterance-level Embeddings from Squeeze-and-Excitation Networks", accepted in IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2019, Singapore.
Nirmesh J. Shah, Hardik B. Sailor, and Hemant A. Patil, "Whether To Pretrain DNN or Not?: An Empirical Analysis for Voice Conversion", INTERSPEECH 2019, Graz, Austria.
Ankur Patil, Siva Krishna Maddala, Mehak Piplani, Aditya Sai Pulikonda, Hardik B. Sailor and Hemant Patil, "DA-IICT/IIITV System for the 5th CHiME 2018 Challenge", accepted in 5th CHiME 2018 Challenge, a satellite event of INTERSPEECH 2018.
Hardik B. Sailor, "Auditory Representation Learning", accepted in FOURTH DOCTORAL CONSORTIUM at INTERSPEECH 2018, Hyderabad, India. (CODE)
Hardik B. Sailor and Hemant A. Patil, "Neural Networks-based Automatic Speech Recognition for Agricultural Commodity in Gujarati Language," accepted in 6th Int. Workshop on Spoken Language Technologies for Under-resourced Languages(SLTU) (Satellite event of INTERSPEECH 2018), Gurugram, India on 29-31 August 2018.
Hardik B. Sailor, Ankur T. Patil and Hemant A. Patil, "Advances in Low Resource ASR: A Deep Learning Perspective," accepted in 6th Int. Workshop on Spoken Language Technologies for Under-resourced Languages(SLTU) (Satellite event of INTERSPEECH 2018), Gurugram, India on 29-31 August 2018.
Hardik B. Sailor, Maddala Venkata Siva Krishna, Diksha Chhabra, Ankur Patil, Madhu Kamble and Hemant Patil, "DA-IICT/IIITV System for Low Resource Speech Recognition Challenge 2018" accepted in INTERSPEECH 2018, Hyderabad, India, September 2018 (Received ISCA grant). (PDF)
Hardik B. Sailor, Madhu Kamble and Hemant Patil, "Auditory Filterbank Learning for Temporal Modulation Features in Replay Spoof Speech Detection", accepted in INTERSPEECH 2018, Hyderabad, India, September 2018 (Received ISCA grant) (CODE). (PDF)
Hardik B. Sailor and Hemant A. Patil, "Auditory Filterbank Learning Using ConvRBM for Infant Cry Classification",accepted in INTERSPEECH 2018, Hyderabad, India, September 2018 (Received ISCA grant) (CODE). (PDF)
Hardik B. Sailor and Hemant. A. Patil, “Representation learning for speech recognition system in agricultural commodity for Gujarati”, accepted in Global Conference on Cyberspace (GCCS), Organized by MeitY, Govt. of India under National e-Governance Division (NeGD), New Delhi, India, 23-24 November 2017 (Poster presentation).
Hardik B. Sailor, Dharmesh Agrawal and Hemant A. Patil,"Unsupervised Filterbank Learning Using Convolutional Restricted Boltzmann Machine for Environmental Sound Classification," in INTERSPEECH 2017, Stockholm, Sweden, August 20-24, 2017. PDF of INTERSPEECH 2017 Proceeding version (CODE)
Hardik B. Sailor, Madhu Kamble and Hemant A. Patil, "Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection," in INTERSPEECH 2017, Stockholm, Sweden, August 20-24, 2017. PDF of INTERSPEECH 2017 proceeding version (CODE)
Dharmesh M. Agrawal, Hardik B. Sailor, Meet H. Soni, and Hemant A. Patil, "Novel TEO-based Gammatone Features for Environmental Sound Classification", accepted in EUSIPCO 2017, Kos Island, Greece.
Hardik B. Sailor and Hemant A. Patil, "Unsupervised Deep Auditory Model Using Stack of Convolutional RBMs For Speech Recognition", accepted in INTERSPEECH 2016, San Francisco, California. Pdf of INTERSPEECH 2016 proceeding version
Avni Rajpal, Tanvina Patel, Hardik Sailor, Maulik Madhavi, Hemant Patil and Hiroya Fujisaki, " Native Language Identification Using Spectral and Source-Based Features", accepted in INTERSPEECH 2016, San Francisco, California. PDF
Hardik B. Sailor and Hemant A. Patil, "Unsupervised Learning of Temporal Receptive Fields Using Convolutional RBM For ASR Task," accepted in 24th European Signal Processing Conference (EUSIPCO), Hilton Budapest, Hungary, 29 August- 02 September, 2016. PDF of proceeding version
Mohammadi Zaki, Hardik B. Sailor and Hemant A. Patil, "Analysis of Hierarchical Bottleneck Framework for Improved Phoneme Recognition," accepted in International Conference on Signal Processing and Communications (SPCOM) , IISc Bangalore, India, 12-15 June, 2016. PDF
Hardik B. Sailor and Hemant A. Patil, “Filterbank Learning Using Convolutional Restricted Boltzmann Machine For Speech Recognition”, in Proc. Int. Conf. Acoust., Speech and Signal Process., (ICASSP) 2016, Shanghai, China. IEEE Xplore link (CODE)
Anshu Chittora, Hemant A. Patil and Hardik B. Sailor, “Spectro-temporal Analysis of HIE and Asthma Infant Cries Using Auditory Spectrogram,” in International Conference on BioSignal Analysis, Processing and System (ICBAPS 2015) Kuala Lumpur Malaysia, on 26-28 May 2015.
Hardik B. Sailor, Maulik C. Madhavi and Hemant A. Patil, "Significance of Phase-Based Features for Person Recognition using Humming ", in PerMIn '15, Saha Institute of Nuclear Physics (SINP), Kolkata, West Bengal, India, February 26-27, 2015.
Hardik B. Sailor and Hemant A. Patil, “Fusion of Magnitude and Phase-based Features for Objective Evaluation of TTS Voice”, 9th International Symposium on Chinese Spoken Language Processing (ISCSLP) 2014, Singapore .
Nirmesh J. Shah, Hemant Patil, Maulik Madhvi, Hardik Sailor and Tanvina Patel, “Deterministic Annealing EM Algorithm for Developing TTS System in Gujarati", 9th International Symposium on Chinese Spoken Language Processing (ISCSLP) 2014, Singapore .
Nirmesh J. Shah, Bhavik B. Vachhani, Hardik B. Sailor and Hemant A. Patil, “Effectiveness of Phonetic Segmentation Algorithms for Speech Synthesis”, accepted for publications in Proc. Int. Conf. Acoust., Speech and Signal Process., ICASSP’14, Florence, Italy, May 4-9, 2014.
Talesara, Swati, Patil, Hemant A., Patel, Tanvina, Sailor Hardik and Shah, Nirmesh, “ A novel Gaussian filter-based automatic labeling of speech data for TTS system in Gujarati language,” in International conference on Asian Language Processing (IALP), Urmuki, China, August 17-19, 2013.
Hemant Patil, Tanvina Patel, Swati Talesara, Nirmesh Shah, Hardik Sailor, Bhavik Vachhani, Janaki Akhani, Bhargav Kankariya, Yashesh Gaur and Vibha Prajapati, “Algorithm for Speech Segmentation at Syllable-Level for Tex-to-Speech Synthesis System in Gujarati”, in the Oriental International Committee for the Co-Ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA) Conference, Gurgaon, India, November 25-27, 2013.
[TTS Consortium paper] Hemant A Patil, Tanvina B Patel, Nirmesh J Shah, Hardik B Sailor, Raghava Krishnan, G R Kasthuri, T. Nagarajan, Lilly Christina, Naresh Kumar, Veera Raghavendra, S P Kishore, S R M Prasanna, Nagaraj Adiga, Sanasam Ranbir Singh, Konjengbam Anand, Pranaw Kumar, Bira Chandra Singh, S L Binil Kumar, T G Bhadran, T Sajini, Arup Saha, Tulika Basu, K Sreenivasa Rao, N P Narendra, Anil Kumar Sao, Rakesh Kumar, Pranhari Talukdar, Purnendu Acharyaa, Somnath Chandra, Swaran Lata and Hema Murthy, “A Syllable-Based framework for Unit Selection Synthesis in 13 Indian Languages”, in the Oriental International Committee for the Co-Ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA) Conference, Gurgaon, India, November 25-27, 2013.

Google Scholar Publication citations

Google Sites

Report abuse