Welcome !

Maulik Madhavi has been associated with Speech signal processing-field since 2010. He received Ph.D. degree (Information and Communication Systems) from Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar, India in 2017. He received M.Tech (ICT) degree (Communication Systems specialization) from DA-IICT, Gandhinagar, India. He was a part of Department of Electronics and Information Technology (DeITY), India sponsored consortium project, “Development of Prosodically Guided Phonetic Engine for Searching Speech Databases in Indian Languages” during April 2012- June 2014 (2 years and three months). During his masters and doctoral studies at DA-IICT, he was teaching assistant/tutor at DA-IICT for eight different courses (August 2009-April 2012, July 2014-May 2017). He was a research Fellow (Dec 2017-April 2021) at National University of Singapore (NUS). He has also mentored 7 NUS graduate student for final year project (FYP) and 1 Master student. Please refer to this link for relevant materials. He was involved in several research projects spoken dialogue system for autonomous vehicle, speech recognition for health care. Currently, he is Video Analytics Researcher at NCS Pte Ltd (April 2021-). He is currently working on several algorithms related to video analytics.

He received IAPR (International Association for Pattern Recognition) Travel Scholarship for presenting our joint paper in International Conference on Biometrics, ICB'12, Delhi, India. . His research interests are spoken information retrieval, applications of spoken language technology, spoken language understanding, and dialogue system.

Research interest: Speech signal processing, speech information retrieval, dialogue understanding, technology for spoken language processing, feature indexing for large scale search.

Tools: Bash, C/C++, Python, MATLAB, Git, Docker, kaldi, Deep learning frameworks (TensorFlow, PyTorch)

E-mail: maulikmadhavi[AT] gmail[DOT]com

Research Projects

Autonomous Bus Chatbot


  • To assist onboard passenger for navigation, emergency query request

  • Android application that is connected to spoken dialogue system

  • ASR interaface and chatbot interface

Skills involvement:

  • API access

  • Server-client communication

  • Python- Flask server

  • Android UI-Client (basic Java under android studio)

  • Git/Github for version controll across team

  • Flask server

Wakeup-word in Android UI



Skills involvement:

  • Deep learning on mobile

  • Porting trained model on computer to mobile

  • Customizing the wakeupword with hundred of examples

Speech Interface-Chatbot



  • Android speech interface to interact health chatbot

  • Automatic speech recognition for medical calls

Skills involvement:

  • Server-client communication

  • Conversational data extraction and evaluating them

  • ASR model adaptation in Kaldi: Acoustic model and langauge model

  • Python- Flask server

  • Android UI-Client (basic Java under android studio)

Research Publications

International Journals

  1. M. C. Madhavi and H. A. Patil, “Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection,” in Computer Speech & Language, Elsevier, vol. 58, pp. 175-202, November 2019.

  2. M. C. Madhavi, and H. A. Patil, ''Design of Mixture of GMMs for Query-by-Example Spoken Term Detection," in Computer Speech & Language, Elsevier, vol. 52, pp. 41-55, November 2018.

  3. H. A. Patil, and M. C. Madhavi, ''Combining from Magnitude and Phase Information using VTEO for Person Recognition using Humming,'' in special issue of Recent advances in speaker and language recognition and characterization Computer Speech and Language, Elsevier, vol. 52, pp. 225-256, November 2018.

  4. M. C. Madhavi, and H. A. Patil, ''Partial Matching and Search Space Reduction for QbE-STD,'' in Computer Speech & Language, Elsevier, vol. 45, pp. 58-82, September 2017.

  5. H. A. Patil, M. C. Madhavi, K. K. Parhi, ''Static and dynamic information derived from and system features for person recognition from humming,'' I. J. Speech Technology, vol. 15, no. 3, pp. 393-406, 2012.

Book Chapters

    1. M. C. Madhavi, and H. A. Patil, ''Spoken Keyword Retrieval using Source and System Features,'' Int. Conf. on Pattern Recognition and Machine Intelligence (PReMI), Kolkata, India, Dec. 05 - 08, 2017.

    2. M. C. Madhavi, S. Sharma, H. A. Patil, ''VTLN Using Different Warping Functions for Template Matching,'' Machine Intelligence and Big Data in Industry, Springer International Publishing, D. Ryżko, P. Gawrysiak, M. Kryszkiewicz, H. Rybiński, (Eds.), pp. 111-121, 2016.

    3. M. C. Madhavi, S. Sharma, and H. A. Patil, ''Vocal tract length normalization features for audio search,'' in Int. Conf. Text, Speech, and Dialogue, TSD, P. Král, V. Matoušek (Eds.), Pilsen,Czech Republic, pp. 387-395, 2015.

    4. Y. Gaur, M. C. Madhavi, and H. A. Patil, ''Speaker recognition using sparse representation via superimposed features,'' in P. Maji et. al. (Eds.), Lecture Notes in Computer Science (LNCS), vol 8251, pp.140-147, Springer-Verlag, Berlin Heidelberg, Germany, 2013

    5. H. A. Patil, M. C. Madhavi, R. Jain, and A. Jain, ''Combining from temporal and spectral features for person recognition from humming,'' in Malay K. Kundu et al. (Eds.) PerMIn, Lecture Notes in Computer Science (LNCS), vol. 7143, pp. 321-328, Springer-Verlag, 2012.

International Conferences


    1. B. Sharma, M. Madhavi, X. Zhou, H. Li, "Exploring teacher-student learning approach for multi-lingual speech-to-intent classification,'' in IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec. 2021.

    2. R. Das, M. Madhavi, H. Li, "Diagnosis of COVID-19 using Auditory Acoustic Cues,'' in Proc. Interspeech, Brno, Czech Republic, Aug-Sep. 2021, pp. 921-925.

    3. Y. Jiang, B. Sharma, M. Madhavi, H. Li, "Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification,'' in Proc. Interspeech, Brno, Czech Republic, Aug-Sep. 2021, pp. 4713-4717.

    4. X. Qian, M. Madhavi, Z. Pan, J. Wang, H. Li, "Multi-target DoA estimation with an audio-visual fusion mechanism," in 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, June 2021, pp. 4280-4284.

    5. B. Sharma, M. Madhavi, H. Li, "Leveraging Acoustic and Linguistic Embeddings from Pretrained Speech and Language Models for Intent Classification," in 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, June 2021, pp. 7498-7502.

    6. Y. Ong, M. Madhavi, and K. Chan, “OPENNLU: Open-Source Web-Interface NLU Toolkit for Development of Conversational Agent”, in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2020, Auckland, New Zealand, December 2020, pp. 381-385.

    7. N. Shah, Sreeraj R, M. Madhavi, N. Shah, and H. Patil, “Query-by-Example Spoken Term Detection using Generative Adversarial Network”, in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2020, Auckland, New Zealand, December 2020, pp. 644-648 .

    8. W. Lin, M. Madhavi, R. Das and H. Li, “Transformer-based Arabic Dialect Identification,” in Proc. International Conference on Asian Language Processing (IALP), Kuala Lumpur, Malaysia, December 2020, pp. 192-196.

    9. T. Liu, R. Das, M. Madhavi, S. Shen and H. Li, “Speaker-Utterance Dual Attention for Speaker and Utterance Verification,” in Interspeech, Shanghai, China, October 2020, pp. 4293-4297.

    10. R. Sheelvant, B. Sharma, M. Madhavi, R. Das, S.R.M. Prasanna and H. Li “RSL2019: A Realistic Speech Localization Corpus” in Proc. International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (COCOSDA), Cebu City, Philippines, October 2019.

    11. T. Liu, M. Madhavi, R. Das and Haizhou Li “A Unified Framework for Speaker and Utterance Verification” INTERSPEECH 2019, Graz, Austria, September 2019, pp. 4320-4324 . [link] [recipe]

    12. M. Madhavi, T. Zhan, H. Li and M. Yuan, “First Leap Towards Development of Dialogue System for Autonomous Bus”, In. Proc. International Workshop on Spoken Dialogue Systems Technology (IWSDS), Sicily, Italy, April 2019, pp. 1-6.

    13. R. Das, M. Madhavi, and H. Li, "Compensating Utterance Information in Fixed Phrase Speaker Verification," in Asia Pacific Signal and Information Processing Association (APSIPA), 12-15 Nov. 2018, Honolulu, Hawaii, USA.

    14. P. Tapkir, M. Kamble, H. Patil, and M. Madhavi, "Replay Spoof Detection using Power Function Based Features," in Asia Pacific Signal and Information Processing Association (APSIPA), 12-15 Nov. 2018, Honolulu, Hawaii, USA.

    15. N. Shah, M. Madhavi, and H. Patil, "Unsupervised Vocal Tract Length Warped Posterior Features for Non-Parallel Voice Conversion, " INTERSPEECH 2018, Hyderabad, India, 2-6 September 2018, pp. 1968-1972.

    16. M. C. Madhavi, and H. A. Patil, ''VTLN-Warped Gaussian for QbE-STD,'' in 25th European Signal Process. Conf., EUSIPCO, Kos island, Greece, Aug. 28-Sep. 2, 2017, pp. 563-567.

    17. M. C. Madhavi, and H. A. Patil, ''Two Stage Zero-resource Approaches for QbE-STD,'' in Ninth International Conference on Advances in Pattern Recognition (ICAPR-2017), Kolkata, India, December 28-30, 2017.

    18. M. C. Madhavi, and H. A. Patil, ''Combining evidences from detection sources for query-by-example spoken term detection,'' in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Kuala Lumpur, Malaysia, December 12-15, 2017, pp. 563-568.

    19. A. Rajpal, T. B. Patel, H. B. Sailor, M. C. Madhavi, H. A. Patil, and H. Fujisaki, ''Native Language Identification Using Spectral and Source-Based Features,'' in 17th Proc. Annual Conf. of Int. Speech Communication Association (ISCA), INTERSPEECH, San Francisco, USA, 8-12 Sept. 2016, pp. 2383-2387.

    20. M. C. Madhavi, and H. A. Patil, ''Modification in Sequential Dynamic Time Warping for Fast Computation of Query-by-Example Spoken Term Detection Task,'' in Int. Conf. on Signal Processing and Communications (SPCOM), IISc Bangalore, India, June 12-15, 2016, pp. 1-6.

    21. M. C. Madhavi, H. A. Patil, and B. B. Vachhani, ''Spectral transition measure for detection of obstruents," in 23rd European Signal Process. Conf., EUSIPCO, Nice, France, Aug 31 - Sept. 4, 2015, pp. 330-334.

    22. H. B. Sailor, M. C. Madhavi, and H. A. Patil, ''Significance of phase-based features for person recognition using humming,'' in 2nd Int. Conf. on Perception and Machine Intelligence (PerMin), C-DAC, Kolkata, Feb. 26-27, 2015, pp. 99-103.

    23. B. Vachhani, K. D. , M. C. Madhavi and H. A. Patil, ''A spectral transition measure based MELCEPSTRAL features for obstruent detection,'' in Int. Conf. on Asian Lang. Process. (IALP '14), Kuching, Malaysia, 2014, pp. 50-53.

    24. S. Sharma, M. C. Madhavi and H. A. Patil, ''Vocal Tract Length Normalization for Vowel Recognition in Low Resource Languages,'' in Int. Conf. on Asian Lang. Process. (IALP '14), Kuching, Malaysia, 2014, pp. 54-57.

    25. M. C. Madhavi, S. Sharma, and H. A. Patil, ''Development of language resources for speech application in Gujarati and Marathi,'' in Int. Conf. on Asian Lang. Process., (IALP), Kuching, Malaysia, 2014, pp. 115-118.

    26. A. Undhad , H. Patil, and M. C. Madhavi, ''Exploiting speech source information for vowel landmark detection for low resource language,'' in the 9th Int. Symposium on Chinese Spoken Language Processing, ISCSLP'14, Singapore, Sep. 12-14, 2014, pp. 546-550.

    27. N. Shah, H. Patil, M. Madhavi, H. Sailor and T. Patel, ''Deterministic Annealing EM Algorithm for Developing TTS System in Gujarati,'' in the 9th Int. Symposium on Chinese Spoken Language Processing, ISCSLP'14, Singapore, Sep. 12-14, 2014, pp. 526-530.

    28. M. C. Madhavi, and H. A. Patil, ''Exploiting Variable length Teager Energy Operator in features for person recognition from humming,'' in the 9th Int. Symposium on Chinese Spoken Language Processing, ISCSLP'14, Singapore, Sep. 12-14, 2014, pp. 624-628.

    29. S. Sharma, M. C. Madhavi and H. A. Patil, ''Development of Vocal Tract Length Normalized Phonetic Engine for Gujarati and Marathi Languages,'' in The 17th Oriental COCOSDA'14, Phuket, Thailand, Sept. 10-12, 2014.

    30. K. D. , B. B. Vachhani, M. C. Madhavi, N. H. Chhayani, and H. A. Patil. ''Development of speech corpora in Gujarati and Marathi for phonetic transcription,'' in Int. Conf. Oriental COCOSDA held jointly with 2013 Conf. on Asian Spoken Lang. Research and Evaluation (O-COCOSDA/), 2013, Gurgaon, India, pp. 1-6. 2013.

    31. H. A. Patil, M. C. Madhavi, K. D. Malde, and B. B. Vachhani, ''Phonetic Transcription of Fricatives and Plosives for Gujarati and Marathi Languages, '' in Int. Conf. on Asian Lang. Process. (IALP), Hanoi, Vietnam, November 13-15, 2012, pp. 177-180.

    32. H. A. Patil, M. C. Madhavi, and N. H. Chhayani, "Person Recognition Using Humming, and Speech, '' in Int. Conf. on Asian Lang. Process. (IALP), Hanoi, Vietnam, November 13-15, 2012, pp. 149-152.

    33. H. A. Patil, and M. C. Madhavi, ''Significance of magnitude and phase information via VTEO for humming based biometrics,'' in Proc. Int. Conf. on Biometrics (ICB), New Delhi, India, 2012, pp. 372-377.

    34. H. A. Patil, M. C. Madhavi, and K. K. Parhi,''Combining Evidence from Spectral and Source-Like Features for Person Recognition from Humming,'' in 12th Proc. Annual Conf. of Int. Speech Communication Association (ISCA), INTERSPEECH, Florence, Italy, August 27-31, 2011, pp. 369-372.