Rohan Kumar Das received Ph.D degree from Indian Institute of Technology (IIT) Guwahati in the year 2017. His Ph.D. work focused on speaker verification using short utterances from the perspective of practical application oriented systems. Prior to his research in the field of speech processing, he was a Project Scientist at Assam Science Technology and Environment Council from 2010 to 2011. After completing doctoral studies, he worked as a Data Scientist in a multinational company called Kovid Research Labs (now acquired by Kaliber Labs) and was involved in speech analytics based application services in 2017. Later that year, he joined Human Language Technology Laboratory, National University of Singapore as a Research Fellow and led the speaker verification group's research till March 2021. Currently, he is working as a Research and Development (R & D) Manager at Fortemedia, Singapore division to lead the research and product development unit.
He was one of the organizers for special sessions on “The Attacker’s Perspective on Automatic Speaker Verification”, “Far-Field Speaker Verification Challenge 2020” in Interspeech 2020, the Voice Conversion Challenge 2020 and Face-voice Association in Multilingual Environments (FAME) Challenge 2024 in ACM Multimedia 2024. He served as Publication Chair of IEEE Automatic Speech Recognition Understanding (ASRU) Workshop 2019, one of the Chairs of Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020 and one of the Special Session Chairs of International Symposium on Chinese Spoken Language Processing (ISCSLP) 2022. He has been awarded several travel fellowships from organizations such as IEEE Signal Processing Society, International Speech Communication Association, Microsoft Research India, Xerox Research Centre India and Science and Engineering Research Board (SERB), Government of India to present research works in top tier conferences such as ICASSP and Interspeech. He has published over 100 research papers in peer reviewed journals and conferences. He is a Senior Member of IEEE, a member of ISCA and APSIPA.
Research Interests: Speech & audio signal processing, speaker verification, anti-spoofing, social signal processing, machine learning and pattern recognition.
E-mail: ecerohan [AT] gmail.com, rohankd [AT] fortemedia.com
Google Scholar LinkedIn ResearchGate ORCID Scopus Publon dblp Semantic Scholar GitHub
News:
[2024-05-24] 1 paper accepted at Interspeech 2024 to be held in Kos Island, Greece during 1-5 September 2024. Check the details in the Conference Publications section.
[2024-04-17] Co-organizing Face-voice Association in Multilingual Environments (FAME) 2024 challenge as a part of ACM Multimedia 2024 Grand Challenges.
[2024-04-16] 1 paper accepted at The Speaker and Language Recognition Workshop (Odyssey 2024) to be held in Quebec, Canada during 18-21 June 2024. Check the details in the Conference Publications section.
[2024-02-02] 1 paper accepted at IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2024 Satellite Workshop on “Deep Neural Network Model Compression” to be held in Seoul, South Korea on 14 April 2024. Check the details in the Conference Publications section.
[2023-12-14] 1 paper accepted at IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2024 conference to be held in Seoul, South Korea during 14-19 April 2024. Check the details in the Conference Publications section.
Special Sessions/Challenges Organized:
Co-organizer of special session on “The Attacker’s Perspective on Automatic Speaker Verification” in Interspeech 2020, Shanghai, China during 25-29 October 2020. Check the details in this link ! [Overview Paper]
Co-organizer of a challenge and special session on “Far-Field Speaker Verification Challenge 2020” in Interspeech 2020, Shanghai, China during 25-29 October 2020. Check the details in this link ! [Evaluation Plan] [Overview Paper] [Database]
Co-organizer of “Voice Conversion Challenge 2020” and a Satellite Workshop of Interspeech 2020 on 30 October 2020. Check the details in this link ! [Summary Paper] [Objetive Assessment Paper] [Database] [Listening Test Data]
Co-organizer of special session on “Data Augmentation in Speech Technologies” in International Symposium on Chinese Spoken Language Processing (ISCSLP) 2022, Singapore during 11-14 December 2022. Check the details in this link !
Co-organizer of Face-voice Association in Multilingual Environments (FAME) 2024 challenge as a part of ACM Multimedia 2024 Grand Challenges.
Journal Publications:
Tanmay Khandelwal, Rohan Kumar Das, and Eng Siong Chng, “Sound Event Detection: A Journey Through DCASE Challenge Series”, in APSIPA Transactions on Signal and Information Processing, vol. 13, no. 1, 2024. [post-print]
Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki and Haizhou Li, “Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs”, in IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 31, pp. 1706-1719, 2023. [pre-print] [post-print]
Tianchi Liu, Rohan Kumar Das, Kong Aik Lee and Haizhou Li, “Neural Acoustic-Phonetic Approach for Speaker Verification with Phonetic Attention Mask”, in IEEE Signal Processing Letters, vol. 29, pp. 782-786, 2022. [post-print]
Longting Xu, Daiyu Huang, Syed Faham Ali Zaidi, Abdul Rauf and Rohan Kumar Das, “Graph Fourier Transform based Audio Zero-watermarking”, in IEEE Signal Processing Letters, vol. 28, pp. 1943-1947, 2021. [pre-print] [post-print]
Jichen Yang, Hongji Wang, Rohan Kumar Das and Yanmin Qian, “Modified Magnitude-phase Spectrum Information for Spoofing Detection”, in IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 29, pp. 1065-1078, 2021. [post-print]
Jichen Yang, Rohan Kumar Das and Haizhou Li, “Significance of Subband Features for Synthetic Speech Detection”, in IEEE Transactions on Information Forensics and Security, vol. 15, pp. 2160-2170, 2020. [pre-print] [post-print]
Jichen Yang, Rohan Kumar Das and Nina Zhou, “Extraction of Octave Spectra Information for Spoofing Attack Detection”, in IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 27, Issue 12, pp. 2373-2384, December 2019. [post-print] [codes]
Jichen Yang and Rohan Kumar Das, “Long-term High Frequency Features for Synthetic Speech Detection”, in Digital Signal Processing, Elsevier, vol. 97, February 2020. [post-print]
Hrishikesh Dutta, Rohan Kumar Das, Sukumar Nandi and S. R. M. Prasanna, “An Overview of Digital Audio Steganography”, in IETE Technical Review, vol. 37, Issue 6, pp. 632-6650, December 2020. [pre-print] [post-print]
Jichen Yang and Rohan Kumar Das, “Improving Anti-spoofing with Octave Spectrum and Short-term Spectral Statistics Information”, in Applied Acoustics, Elsevier, vol. 157, January 2020. [post-print]
Rohan Kumar Das and S. R. M. Prasanna, “Investigating Text-independent Speaker Verification Systems Under Varied Data Conditions”, in Circuits, Systems and Signal Processing, Springer, vol. 38, Issue 8, pp. 3778-3801, August 2019. [post-print]
Jichen Yang and Rohan Kumar Das, “Low Frequency Frame-wise Normalization over Constant-Q Transform for Playback Speech Detection”, in Digital Signal Processing, Elsevier, vol. 89, pp. 30-39, June 2019. [post-print]
Rohan Kumar Das, Sarfaraz Jelil and S. R. M. Prasanna, “Exploring Text-constraint Models and Source Information for Long-enrollment with Short-test Speaker Verification”, in Circuits, Systems and Signal Processing, Springer, vol. 38, Issue 4, pp. 1175-1792, April 2019. [post-print]
Rohan Kumar Das, and S. R. M. Prasanna, “Speaker Verification from Short Utterance Perspective: A Review”, in IETE Technical Review, vol. 35, Issue 6, pp. 599-617, December 2018. [pre-print] [post-print]
Rohan Kumar Das, Sarfaraz Jelil and S. R. M. Prasanna, “Multi-style Speaker Recognition Database in Practical Conditions”, International Journal of Speech Technology, Springer, vol. 21, Issue 3, pp. 409-419, September 2018. [post-print] [database]
Rohan Kumar Das, Bidisha Sharma and S. R. M. Prasanna, “Significance of Duration Modification for Speaker Verification under Mismatch Speech Tempo Condition”, International Journal of Speech Technology, Springer, vol. 21, Issue 3, pp. 401-408, September 2018. [post-print]
Rohan Kumar Das, Akhil Babu Manam and S. R. M. Prasanna, “Exploring Kernel Discriminant Analysis for Speaker Verification with Limited Test Data”, in Pattern Recognition Letters (PRL), Elsevier, vol. 98, pp. 26-31, October 2017. [pre-print] [post-print]
Rohan Kumar Das, Sarfaraz Jelil and S. R. M. Prasanna, “Development of Multi-Level Speech based Person Authentication System”, Journal of Signal Processing Systems, Springer, vol. 88, Issue 3, pp. 259-271, September 2017. [post-print]
R. Sharma, S. R. M. Prasanna, Ramesh K. Bhukya and Rohan Kumar Das, “Analysis of the Intrinsic Mode Functions for Speaker Information”, Speech Communication, Elsevier, vol. 91, pp. 1-16, July 2017. [post-print]
Rohan Kumar Das and S. R. M. Prasanna, “Exploring Different Attributes of Source Information for Speaker Verification with Limited Test Data” in Journal of Acoustic Society of America (JASA), vol. 140, no. 1, pp. 184-190, July 2016. [post-print]
Debmalya Chakrabarty, S. R. M. Prasanna and Rohan Kumar Das, “Development and Evaluation of Online Text-independent Speaker Verification System for Remote Person Authentication”, International Journal of Speech Technology, Springer, vol. 16, Issue 1, pp. 75-88, March 2013. [post-print]
Haris B. C., Gayadhar Pradhan, Abhinav Misra, S. R. M. Prasanna, Rohan Kumar Das, and Rohit Sinha, “Multivariability Speaker Recognition Database in Indian Scenario”, International Journal of Speech Technology, Springer, vol. 15, pp. 441-453, December 2012. [pre-print][post-print] [database]
Book Chapters:
Rohan Kumar Das and S. R. M. Prasanna, “Speaker Verification for Variable Duration Segments and the Effect of Session Variability”, Lecture Notes in Electrical Engineering, Springer, vol. 347, Chapter 16, pp. 193-200, 2015. [post-print]
Conference/Workshop Publications:
2024
Tianchi Liu, Lin Zhang, Rohan Kumar Das, Yi Ma, Ruijie Tao, Haizhou Li, “How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?”, Interspeech 2024, Kos Island, Greece, September 2024. [pre-print]
Yang Xiao, Han Yin, Jisheng Bai and Rohan Kumar Das, “FMSG-JLESS Submission for DCASE 2024 Task4 on Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels”, in DCASE 2024 Challenge, Tech. Rep., July 2024. [post-print]
Mingrui He, Longting Xu, Han Wang, Mingjun Zhang and Rohan Kumar Das, “Device Feature based on Graph Fourier Transformation with Logarithmic Processing For Detection of Replay Speech Attacks”, in Proc. The Speaker and Language Recognition Workshop (Odyssey 2024), Quebec, Canada, June 2024. [pre-print] [post-print]
Yang Xiao and Rohan Kumar Das, “Dual Knowledge Distillation for Efficient Sound Event Detection”, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2024 Satellite Workshop on Deep Neural Network Model Compression, Seoul, South Korea, April 2024. [pre-print]
Jichen Yang, Fangfan Chen, Rohan Kumar Das, Zhengyu Zhu and Shunsi Zhang, “Adaptive-avg-pooling based Attention Vision Transformer for Face Anti-spoofing”, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2024, Seoul, South Korea, April 2024, pp. 3875-3879. [pre-print] [post-print]
2023
Tanmay Khandelwal and Rohan Kumar Das, “Exploring Multi-Task Learning with Weighted Soft Label Loss for Sound Event Detection with Soft Labels”, in Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge Workshop, September 2023. [post-print]
Tanmay Khandelwal and Rohan Kumar Das, “Cross-dimensional Interaction with Inverted Residual Triplet Attention for Low-complexity Sound Event Detection”, in Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge Workshop, September 2023. [post-print]
Tanmay Khandelwal and Rohan Kumar Das, “A Multi-Task Learning Framework for Sound Event Detection using High-level Acoustic Characteristics of Sounds”, in Proc. Interspeech 2023, Dublin, Ireland, August 2023, pp. 1214-1218. [pre-print] [recipe] [post-print]
Tanmay Khandelwal, Rohan Kumar Das, Andrew Koh and Eng Siong Chng, “Leveraging Audio-Tagging Assisted Sound Event Detection using Weakified Strong Labels and Frequency Dynamic Convolutions”, in Proc. IEEE Statistical Signal Processing Workshop 2023, Hanoi, Vietnam, July 2023, pp. 329-333. [pre-print] [post-print]
Yang Xiao, Tanmay Khandelwal and Rohan Kumar Das, “FMSG Submission for DCASE 2023 Challenge Task 4 on Sound Event Detection with Weak Labels and Synthetic Soundscapes”, in DCASE 2023 Challenge, Tech. Rep., June 2023. [post-print]
2022
Tanmay Khandelwal and Rohan Kumar Das, “Dynamic Thresholding on FixMatch with Weak and Strong Data Augmentations for Sound Event Detection”, in Proc. International Symposium on Chinese Spoken Language Processing (ISCSLP) 2022, Singapore, December 2022, pp. 428-432. [post-print] [presentation video]
Rohith Mars and Rohan Kumar Das, “On the Use of Absolute Threshold of Hearing-based Loss for Full-band Speech Enhancement”, in Proc. International Symposium on Chinese Spoken Language Processing (ISCSLP) 2022, Singapore, December 2022, pp. 458-462. [post-print]
Tanmay Khandelwal, Rohan Kumar Das, and Eng Siong Chng, “Is Your Baby Fine at Home? Baby Cry Sound Detection in Domestic Environments”, in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC), Chiang Mai, Thailand, November 2022, pp. 275-280. [post-print] [database]
Rohith Mars and Rohan Kumar Das, “A Device Classification-aided Multi-task Framework for Low-complexity Acoustic Scene Classification”, in Proc. Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge Workshop, November 2022. [post-print]
Tanmay Khandelwal, Rohan Kumar Das, Andrew Koh and Eng Siong Chng, “FMSG-NTU Submission for DCASE 2022 Task 4 on Sound Event Detection in Domestic Environments”, in DCASE 2022 Challenge, Tech. Rep., June 2022. [post-print]
Longting Xu, Mianxin Tian, Xing Guo, Zhiyong Shan, Jie Jia, Yiyuan Peng, Jichen Yang and Rohan Kumar Das, “A Novel Feature Based on Graph Signal Processing for Detection of Physical Access Attacks”, in Proc. The Speaker and Language Recognition Workshop (Odyssey 2022), Beijing, China, June 2022, pp. 107-111. [post-print] [codes]
Teck Kai Chan and Rohan Kumar Das, “Cross-stitch Network with Adaptive Loss Weightage for Sound Event Localization and Detection”, in Proc. L3DAS22: Machine Learning for 3D Audio Signal Processing, May 2022, pp. 11-15. [post-print]
Tianchi Liu, Rohan Kumar Das, Kong Aik Lee and Haizhou Li, “MFA: TDNN with Multi-scale Frequency-channel Attention for Text-independent Speaker Verification with Short Utterances”, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2022, Singapore, May 2022, pp. 7517-7521. [pre-print][post-print]
Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki and Haizhou Li, “Self-supervised Speaker Recognition with Loss-gated Learning”, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2022, Singapore, May 2022, pp. 6142-6146. [pre-print] [post-print]
2021
Rohan Kumar Das, Ruijie Tao and Haizhou Li, “HLT-NUS Submission for 2020 NIST Conversational Telephone Speech SRE”, in NIST SRE Workshop 2021, December 2021. [pre-print] [recipe]
Kong Aik Lee, Tomi Kinnunen, Daniele Colibro, Claudio Vair, Andreas Nautsch, Hanwu Sun, Liang He, Tianyu Liang, Qiongqiong Wang, Mickael Rouvier, Pierre-Michel Bousquet, Rohan Kumar Das, Ignacio Vinals Bailo, Meng Liu, Héctor Deldago, Xuechen Liu, Md Sahidullah, Sandro Cumani, Boning Zhang, Koji Okabe, Hitoshi Yamamoto, Ruijie Tao, Haizhou Li, Alfonso Ortega Giménez, Longbiao Wang, Luis Buera, “I4U System Description for NIST SRE’20 CTS Challenge”, in NIST SRE Workshop 2021, December 2021. [post-print]
Protima Nomo Sudro, Rohan Kumar Das, Rohit Sinha and S. R. M. Prasanna, “Significance of Data Augmentation for Improving Cleft Lip and Palate Speech Recognition”, in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2021, Tokyo, Japan, December 2021, pp. 484-490. [pre-print] [post-print]
Rohan Kumar Das, “Known-unknown Data Augmentation Strategies for Detection of Logical Access, Physical Access and Speech Deepfake Attacks: ASVspoof 2021”, in Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge Workshop, September 2021, pp. 29-36. [post-print]
Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou and Haizhou Li, “Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection”, in Proc. ACM Multimedia 2021, Chengdu, China, October 2021, pp. 3927-3935. [pre-print] [post-print] [recipe]
Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou and Haizhou Li, “NUS-HLT Report for ActivityNet Challenge 2021 AVA (Speaker)”, in International Challenge on Activity Recognition (ActivityNet) Workshop, CVPR 2021, June 2021. [post-print]
Rohan Kumar Das, Maulik Madhavi and Haizhou Li, “Diagnosis of COVID-19 using Auditory Acoustic Cues”, in Proc. Interspeech 2021, Brno, Czech Republic, August 2021, pp. 921-925. [post-print]
Rohan Kumar Das, Jichen Yang and Haizhou Li, “Data Augmentation with Signal Companding for Detection of Logical Access Attacks” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2021, Toronto, Ontario, Canada, June 2021, pp. 6349-6353. [pre-print] [post-print]
Protima Nomo Sudro, Rohan Kumar Das, Rohit Sinha and S. R. M. Prasanna, “Enhancing the Intelligibility of Cleft Lip and Palate Speech using Cycle-consistent Adversarial Networks” in Proc. IEEE Spoken Language Technology (SLT) 2021, Shenzhen, China, January 2021, pp. 720-727. [pre-print] [post-print]
Meidan Ouyang, Rohan Kumar Das, Jichen Yang and Haizhou Li, “Capsule Network based End-to-end System for Detection of Replay Attacks”, in Proc. International Symposium on Chinese Spoken Language Processing (ISCSLP) 2021, Hong Kong, January 2021, pp. 1-5. [post-print]
2020
Rohan Kumar Das, Ruijie Tao, Jichen Yang, Wei Rao, Cheng Yu and Haizhou Li, “HLT-NUS Submission for NIST 2019 Multimedia Speaker Recognition Evaluation”, in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2020, Auckland, New Zealand, December 2020, pp. 605-609. [pre-print] [post-print]
Rohan Kumar Das and Haizhou Li, “Classification of Speech with and without Face Mask using Acoustic Features” in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2020, Auckland, New Zealand, December 2020, pp. 747-752. [pre-print] [post-print]
Biswajit Dev Sarma and Rohan Kumar Das, “Emotion Invariant Speaker Embeddings for Speaker Identification with Emotional Speech” in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2020, Auckland, New Zealand, December 2020, pp. 610-615. [pre-print] [post-print]
Wanqiu Lin, Maulik Madhavi, Rohan Kumar Das and Haizhou Li, “Transformer-based Arabic Dialect Identification,” in Proc. International Conference on Asian Language Processing (IALP) 2020, Kuala Lumpur, Malaysia, December 2020, pp. 192-196. [pre-print] [recipe] [post-print]
Rohan Kumar Das, Xiaohai Tian, Tomi Kinnunen and Haizhou Li, “The Attacker's Perspective on Automatic Speaker Verification: An Overview” in Proc. Interspeech 2020, Shanghai, China, October 2020, pp. 4213-4217. [pre-print] [post-print]
Zhenzong Wu, Rohan Kumar Das, Jichen Yang and Haizhou Li, “Light Convolutional Neural Network with Feature Genuinization for Detection of Synthetic Speech Attacks” in Proc. Interspeech 2020, Shanghai, China, October 2020, pp. 1101-1105. [pre-print] [post-print]
Ruijie Tao, Rohan Kumar Das and Haizhou Li, “Audio-visual Speaker Recognition with a Cross-modal Discriminative Network” in Proc. Interspeech 2020, Shanghai, China, October 2020, pp. 2242-2246. [pre-print] [post-print]
Tianchi Liu, Rohan Kumar Das, Maulik Madhavi, Shengmei Shen and Haizhou Li, “Speaker-Utterance Dual Attention for Speaker and Utterance Verification” in Proc. Interspeech 2020, Shanghai, China, October 2020, pp. 4293-4297. [pre-print] [post-print]
Xiaoyi Qin, Ming Li, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, Haizhou Li, “The INTERSPEECH 2020 Far-Field Speaker Verification Challenge” in Proc. Interspeech 2020, Shanghai, China, October 2020, pp. 3456-3460. [pre-print] [post-print][database]
Zhao Yi, Wen-Chin Huang, Xiaohai Tian, Junichi Yamagishi, Rohan Kumar Das, Tomi Kinnunen, Zhenhua Ling and Tomoki Toda, “Voice Conversion Challenge 2020 – Intra-lingual Semiparallel and Cross-lingual Voice Conversion –” in Proc. ISCA Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, October 2020, pp. 80-98. [pre-print] [post-print] [database]
Rohan Kumar Das, Tomi Kinnunen, Wen-Chin Huang, Zhenhua Ling, Junichi Yamagishi, Zhao Yi, Xiaohai Tian and Tomoki Toda, “Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions” in Proc. ISCA Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, October 2020, pp. 99-120. [pre-print] [post-print]
Xiaohai Tian, Rohan Kumar Das and Haizhou Li, “Black-box Attacks on Automatic Speaker Verification using Feedback-controlled Voice Conversion” in Proc. The Speaker and Language Recognition Workshop (Odyssey 2020), Tokyo, Japan, November 2020, pp. 159-164. [pre-print] [post-print]
Xiaoxue Gao, Xiaohai Tian, Yi Zhou, Rohan Kumar Das and Haizhou Li, “Personalized Singing Voice Generation Using WaveRNN” in Proc. The Speaker and Language Recognition Workshop (Odyssey 2020), Tokyo, Japan, November 2020, pp. 252-258. [samples] [post-print]
Rohan Kumar Das and Haizhou Li, “On the Importance of Vocal Tract Constriction for Speaker Characterization: The Whispered Speech Study” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2020, Barcelona, Spain, May 2020, pp. 7119-7123. [pre-print] [post-print]
Rohan Kumar Das, Jichen Yang and Haizhou Li, “Assessing the Scope of Generalized Countermeasures for Anti-spoofing” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2020, Barcelona, Spain, May 2020, pp. 6589-6593. [pre-print] [post-print]
Xuehao Zhou, Xiaohai Tian, Grandee Lee, Rohan Kumar Das and Haizhou Li, “End-to-end Code-switching TTS with Cross-lingual Language Model” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2020, Barcelona, Spain, May 2020, pp. 7614-7618. [post-print] [samples]
2019
Rohan Kumar Das, Jichen Yang and Haizhou Li, “Long Range Acoustic and Deep Features Perspective on ASVspoof 2019” in Proc. IEEE Automatic Speech Recognition Understanding (ASRU) Workshop 2019, Sentosa Island, Singapore, December 2019, pp. 1018-1025. [pre-print] [post-print]
Yi Zhou, Xiaohai Tian, Emre Yılmaz, Rohan Kumar Das and Haizhou Li, “A Modularized Neural Network with Language-specific Output Layers for Cross-lingual Voice Conversion” in Proc. IEEE Automatic Speech Recognition Understanding (ASRU) Workshop 2019, Sentosa Island, Singapore, December 2019, pp. 160-167. [pre-print] [post-print] [samples]
Rohan Kumar Das, Jichen Yang and Haizhou Li, “Speaker Clustering with Penalty Distance for Speaker Verification with Multi-Speaker Speech” in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2019, Lanzhou, China, November 2019, pp. 1630-1635. [pre-print] [post-print]
Yitong Liu, Rohan Kumar Das and Haizhou Li, “Multi-band Spectral Entropy Information for Detection of Replay Attacks” in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2019, Lanzhou, China, November 2019, pp. 838-843. [pre-print] [post-print]
Yi Zhou, Xiaohai Tian, Rohan Kumar Das and Haizhou Li, “Many-to-many Cross-lingual Voice Conversion with a Jointly Trained Speaker Embedding Network” in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2019, Lanzhou, China, November 2019, pp. 1282-1287. [pre-print] [post-print] [samples]
Xiaoxue Gao, Xiaohai Tian, Rohan Kumar Das, Yi Zhou and Haizhou Li, “Speaker-independent Spectral Mapping for Speech-to-Singing Conversion” in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2019, Lanzhou, China, November 2019, pp. 159-164. [pre-print] [post-print] [samples]
Rohan Sheelvant, Bidisha Sharma, Maulik Madhavi, Rohan Kumar Das, S. R. M. Prasanna and Haizhou Li, “RSL2019: A Realistic Speech Localization Corpus” in Proc. Oriental COCOSDA 2019, Cebu City, Philippines, October 2019, pp. 1-6. [pre-print] [post-print] [database]
Rohan Kumar Das and Haizhou Li, “Instantaneous Phase and Long-term Acoustic Cues for Orca Activity Detection” in Proc. Interspeech 2019, Graz, Austria, September 2019, pp. 2418-2422. [post-print]
Rohan Kumar Das, Jichen Yang and Haizhou Li, “Long Range Acoustic Features for Spoofed Speech Detection” in Proc. Interspeech 2019, Graz, Austria, September 2019, pp. 1058-1062. [post-print]
Bidisha Sharma, Rohan Kumar Das and Haizhou Li, “On the Importance of Audio-source Separation for Singer Identification in Polyphonic Music” in Proc. Interspeech 2019, Graz, Austria, September 2019, pp. 2020-2024. [post-print]
Bidisha Sharma, Rohan Kumar Das and Haizhou Li, “Multi-level Adaptive Speech Activity Detector for Speech in Naturalistic Environments” in Proc. Interspeech 2019, Graz, Austria, September 2019, pp. 2015-2019. [post-print] [codes]
Tianchi Liu, Maulik Madhavi, Rohan Kumar Das and Haizhou Li, “A Unified Framework for Speaker and Utterance Verification” in Proc. Interspeech 2019, Graz, Austria, September 2019, pp. 4320-4324. [post-print] [recipe]
Kong Aik Lee, Ville Haütamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Héctor Delgado and Massimiliano Todisco, “I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences” in Proc. Interspeech 2019, Graz, Austria, September 2019, pp. 1497-1501. [post-print]
Jibin Wu, Zihan Pan, Malu Zhang, Rohan Kumar Das, Yansong Chua and Haizhou Li, “Robust Sound Recognition: A Neuromorphic Approach” in Proc. Interspeech 2019, Graz, Austria, September 2019, pp. 3667-3668. [post-print] [demo video]
Sarfaraz Jelil, Abhishek Shrivastava, Rohan Kumar Das, S. R. M. Prasanna and Rohit Sinha, “SpeechMarker: A Voice based Multi-level Attendance Application” in Proc. Interspeech 2019, Graz, Austria, September 2019, pp. 3665-3666. [post-print] [demo video]
Yi Zhou, Xiaohai Tian, Haihua Xu, Rohan Kumar Das and Haizhou Li, “Cross-Lingual Voice Conversion with Bilingual Phonetic Posteriorgram and Average Modeling” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2019, Brighton, United Kingdom, May 2019, pp. 6790-6794. [pre-print] [post-print] [samples]
2018
Longting Xu, Rohan Kumar Das, Emre Yılmaz, Jichen Yang and Haizhou Li, “Generative x-vectors for Text-independent Speaker Verification” in Proc. IEEE Spoken Language Technology (SLT) 2018, Athens, Greece, December 2018, pp. 1014-1020. [pre-print] [post-print]
Rohan Kumar Das and Haizhou Li, “Instantaneous Phase and Excitation Source Features for Detection of Replay Attacks” in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2018, Honolulu, Hawaii, USA, November 2018, pp. 1030-1037. [pre-print] [post-print]
Rohan Kumar Das, Maulik Madhavi and Haizhou Li, “Compensating Utterance Information in Fixed Phrase Speaker Verification” in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2018, Honolulu, Hawaii, USA, November 2018, pp. 1708-1712. [pre-print] [post-print]
Jichen Yang, Rohan Kumar Das and Haizhou Li, “Extended Constant-Q Cepstral Coefficients for Detection of Spoofing Attacks” in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2018, Honolulu, Hawaii, USA, November 2018, pp. 1024-1029. [pre-print] [post-print]
Rohan Kumar Das and S. R. M. Prasanna, “Investigating Text-independent Speaker Verification from Practically Realizable System Perspective” in Proc. Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2018, Honolulu, Hawaii, USA, November 2018, pp. 1483-1487. [pre-print] [post-print]
Kantheti Srinivas, Rohan Kumar Das and Hemant A. Patil, “Combining Phase-based Features for Replay Spoof Detection” in Proc. International Symposium on Chinese Spoken Language Processing (ISCSLP) 2018, Taipei, Taiwan, November 2018, pp. 151-155. [post-print]
Xiaoxue Gao, Berrak Sisman, Rohan Kumar Das and Karthika Vijayan, “NUS-HLT Spoken Lyrics and Singing (SLS) Corpus” in Proc. International on Orange Technologies (ICOT) 2018, Bali, Indonesia, October 2018, pp. 1-6. [pre-print][post-print]
Biswajit Dev Sarma, Rohan Kumar Das, Abhishek Dey and Risto Haukioja, “Analysis of Speech Emotions in Realistic Environments” in Proc. Speech, Music and Mind (SMM) 2018, a satellite event of Interspeech 2018, Hyderabad, India, September 2018, pp. 11-15. [post-print]
2017
Anupama Paul, Deepshikha Mahanta, Rohan Kumar Das, Ramesh K. Bhukya and S. R. M. Prasanna, “Presence of Speech Region Detection using Vowel-like Regions and Spectral Slope Information” in Proc. INDICON 2017, IIT Roorkee, December 2017, pp. 1-5. [pre-print] [post-print]
Rohan Kumar Das, “Incorporating Source Features, Acoustic-phonetic Information and Suitable Pattern Recognition Approach for Limited Test Data Speaker Verification” in Proc. 3rd Doctoral Consortium, Interspeech 2017, KTH Sweden, Stockholm, Sweden, August 2017. [post-print]
Sarfaraz Jelil, Rohan Kumar Das, S. R. M. Prasanna and Rohit Sinha, “Spoof Detection Using Source, Instantaneous Frequency and Cepstral Features” in Proc. Interspeech 2017, Stockholm, Sweden, August 2017, pp. 22-26. [post-print]
Nagendra Kumar, Rohan Kumar Das, Sarfaraz Jelil, Dhanush B. K., H. Kashyap, K. Sri Rama Murty, Sriram Ganapathy, Rohit Sinha and S. R. M. Prasanna, “IITG-Indigo System for NIST 2016 SRE Challenge” in Proc. Interspeech 2017, Stockholm, Sweden, August 2017, pp. 2859-2863. [post-print]
Sarfaraz Jelil, Rohan Kumar Das, S. R. M. Prasanna and Rohit Sinha, “Role of Voice Activity Detection Methods for the Speakers in the Wild Challenge” in Proc. 23rd National Conference on Communications (NCC) 2017, IIT Madras, March 2017, pp. 1-6. [post-print]
2016
Kuruvachan K. George, Rohan Kumar Das, Sarfaraz Jelil, K. Arun Das, C. Santhosh Kumar, S. R. M. Prasanna and Ashish Panda, “AMRITATCS-IITGUWAHATI Combined System for the Speakers in the Wild (SITW) Speaker Recognition Challenge” in Proc. IEEE TENCON 2016, Singapore, November 2016, pp. 2842-2846. [post-print] [slides]
Akhil Babu Manam, Tummala Sai Revanth, Rohan Kumar Das and S. R. M. Prasanna, “Speaker Verification using Acoustic Factor Analysis with Phonetic Content Compensation in Limited and Degraded Test Conditions” in Proc. IEEE TENCON 2016, Singapore, November 2016, pp. 1402-1406. [post-print]
Salil Mamodiya, Lav Kumar, Rohan Kumar Das and S. R. M. Prasanna, “Exploring Acoustic Factor Analysis for Limited Test Data Speaker Verification” in Proc. IEEE TENCON 2016, Singapore, November 2016, pp. 1397-1401. [post-print]
Rohan Kumar Das and S. R. M. Prasanna, “Text-independent Speaker Verification with Limited Test Data from the Perspective of Practical Systems” in Proc. 2nd Doctoral Consortium, Interspeech 2016, ICSI, Berkeley, California, September 2016. [post-print]
Rohan Kumar Das, Sarfaraz Jelil and S. R. M. Prasanna, “Exploring Session Variability and Template Aging in Speaker Verification for Fixed Phrase Short Utterances” in Proc. Interspeech 2016, San Francisco, September 2016, pp. 445-449. [post-print]
Rohan Kumar Das, Sarfaraz Jelil and S. R. M. Prasanna, “Significance of Constraining Text in Limited Data Text-independent Speaker Verification” in Proc. International Conference on Signal Processing and Communications (SPCOM) 2016, IISc Bangalore, June 2016, pp. 1-5. [post-print]
Anupama Paul, Rohan Kumar Das, Rohit Sinha and S. R. M. Prasanna, “Countermeasure to Handle Record and Replay Attacks in Practical Speaker Verification Systems” in Proc. International Conference on Signal Processing and Communications (SPCOM) 2016, Bangalore, June 2016, pp. 1-5. [post-print]
Deepshikha Mahanta, Anupama Paul, Ramesh K. Bhukya, Rohan Kumar Das, Rohit Sinha and S. R. M. Prasanna, “Warping Path and Gross Spectrum Information for Speaker Verification under Degraded Condition” in Proc. 22nd National Conference on Communications (NCC) 2016, IIT Guwahati, March 2016, pp. 1-6. [post-print]
2015
Ashutosh Pandey, Rohan Kumar Das, Nagaraj Adiga, Naresh Gupta and S. R. M. Prasanna, “Significance of Glottal Activity Detection for Speaker Verification in Degraded and Limited Data Condition” in Proc. IEEE TENCON 2015, Macao, November 2015, pp. 1-6. [post-print]
Sarfaraz Jelil, Rohan Kumar Das, Rohit Sinha and S. R. M. Prasanna, “Speaker Verification Using Gaussian Posteriorgrams on Fixed Phrase Short Utterances” in Proc. Interspeech 2015, Germany, September 2015, pp. 1042-1046. [post-print]
Sarfaraz Jelil, Rohan Kumar Das, Khwairakpam Amitab, Fidalizia Pyrtuh, L. Joyprakash Singh and S. R. M. Prasanna, “Exploring Speaker Modeling Techniques for Short Pass-Phrase Based Person Authentication System” in Proc. International Conference on Computing and Communication Systems (I3CS’15), NEHU Shillong, April 2015.
Rohan Kumar Das, Debadatta Pati and S. R. M. Prasanna, “Different Aspects of Source Information for Limited Data Speaker Verification”, in Proc. 21st National Conference on Communications (NCC) 2015, IIT Bombay, February 2015, pp. 1-6. [post-print]
2014
Rohan Kumar Das, Abhiram B., S. R. M. Prasanna and A. G. Ramakrishnan, “Combining Source and System Information for Limited Data Speaker Verification”, in Proc. Interspeech 2014, Singapore, September 2014, pp. 1836-1840. [post-print]
Ramesh K., S. R. M. Prasanna and Rohan Kumar Das, “Significance of Glottal Activity Detection and Glottal Signatures for Text-Dependent Speaker Verification”, in Proc. International Conference on Signal Processing and Communications (SPCOM) 2014, IISc Bangalore, July 2014, pp. 1-5. [post-print]
Subhadeep Dey, Sujit Barman, Ramesh K. Bhukya, Rohan Kumar Das, Haris B. C., S. R. M. Prasanna and Rohit Sinha, “Speech Biometric Based Attendance System”, in Proc. 20th National Conference on Communications (NCC) 2014, IIT Kanpur, February 2014, pp. 1-6. [post-print]
Thesis:
Rohan Kumar Das, “Speaker Verification using Sufficient Train and Limited Test Data”, Ph.D. Thesis, September 2017. [link]
Miscellaneous:
Rohan Kumar Das and Haizhou Li, “IEEE Automatic Speech Recognition Understanding (ASRU) Workshop 2019 Program Book”, 2019. [link]