List of Publications

Journals

Alam, J.,, Praveen, R.G. (2025). On the Use of Cross-Attentive Fusion Techniques for Audio-Visual Speaker Verification. In: Passban, P., Way, A., Rezagholizadeh, M. (eds) Enhancing LLM Performance. Machine Translation: Technologies and Applications, vol 7. Springer, Cham. https://doi.org/10.1007/978-3-031-85747-8_9
Fathan, A., Alam, J., (2025). An Efficient Clustering Algorithm for Self-Supervised Speaker Recognition. In: Passban, P., Way, A., Rezagholizadeh, M. (eds) Enhancing LLM Performance. Machine Translation: Technologies and Applications, vol 7. Springer, Cham. https://doi.org/10.1007/978-3-031-85747-8_10
Sultana, S., Hossain, A. B. M. A., & Alam, J., “COVID‑19 detection from optimized features of breathing audio signals using explainable ensemble machine learning,” Results in Control and Optimization, 18, 100538, 2025. https://doi.org/10.1016/j.rico.2025.100538
Rajasekhar, G. P., and Alam, J., "Incongruity-Aware Cross-Modal Attention for Audio-Visual Fusion in Dimensional Emotion Recognition," in the IEEE Journal of Selected Topics in Signal Processing (JSTSP), Vol. 18 (03), pp. 444-458, April, 2024.
Fathan, A. and Alam, J., "An Analytic Study on Clustering Driven Self-Supervised Speaker Verification," accepted for publication in the Pattern Recognition Letters, Elsevier, January, 2024. https://www.sciencedirect.com/science/article/abs/pii/S0167865524000242
Kang, W., Alam, J., and Fathan, A., "l-mix: a latent-level instance mixup regularization for robust self-supervised speaker representation learning," in the IEEE Journal of Selected Topics in Signal Processing (JSTSP), August, 2022, doi: 10.1109/JSTSP.2022.3196562. online: https://ieeexplore.ieee.org/document/9850385
Monteiro, J., Alam, J., Falk, T., "Multi-level Self-attentive TDNN: A General and Efficient Approach to Summarize Speech Into Discriminative Utterance-level Representations," in Speech Communication (Elsevier), vol. 140, pp. 42-49, May, 2022. https://doi.org/10.1016/j.specom.2022.03.008
J. Monteiro, I. Albuquerque, Alam, J., T. Falk, "TEMPLE: defining versatile TEMPlate LEarners via prototypical classifiers with learned similarities," submitted to IEEE Transactions on Artificial Intelligence (TAI), December, 2024.
Montiero, J, Alam, J., and Falk, T., "Generalized End-to-End Detection of Spoofing Attacks to Automatic Speaker Recognizers," in Computer Speech & Language journal, Vol. 63, September 2020.
Dahmane, M., Alam, J., St-Charles, P., Lalonde, M., Heffner, K., and Foucher, S., "A Multi-modal Non-Intrusive Stress Monitoring from the Pleasure-Arousal Emotional Dimensions," in IEEE Transaction on Affective Computing (TAFFC), April, 2020.
Avila, R., A., Alam, J., O’Shaughnessy, D., Falk, H., T., "On the Use of the I-vector Speech Representation for Instrumental Quality Measurement," in Quality and User Experience Journal, Springer, June, 2020.
Avila, R., A., Alam, J., Fabiano O. Costa Prado, O’Shaughnessy, D., Falk, H., T., "On the Use of Blind Channel Response Estimation and a Residual Neural Network to Detect Physical Access Attacks to Speaker Verification Systems," accepted for publication in Computer Speech & Language journal, Elsevier, October, 2020.
Montiero, J, Alam, J., and Falk, T., "Residual Convolutional Neural Network with Attentive Feature Pooling for End-To-End Language Identification from Short-Duration Speech," in Computer Speech & Language journal, vol. 54, p. 364-376, November 2019.
Montiero, J, Alam, J., and Falk, T., "Multi-task Training of Speaker Embedding Models for Text-independent Automatic Speaker Verification," in Computer Speech & Language journal, (submitted revision), May, 2019.
Stafylakis, T., Alam, J. et Kenny, P. "Text-dependent speaker recognition with random digit strings" IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24 (7) 2016 : 1195-1204.
Stafylakis, T., Kenny, P., Alam, J. and Kockmann, M. "Speaker and Channel Factors in Text-Dependent Speaker Recognition" IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24 (1) 2016 : 65-78 .
Alam, J., Kenny, P., O Shaughnessy, D., Regularized Minimum Variance Distortionless Response-Based Cepstral Features for Robust Continuous Speech Recognition, Speech Communication, 73, pp. 28-46, October 2015.
Alam, J., Gupta, V., Kenny, P., and Dumouchel, P., Speech Recognition in Reverberant and Noisy Environments Employing Multiple Feature Extractors and I-Vector Speaker Adaptation, EURASIP Journal on Advances in Signal Processing, pp. 2015-50, 2015.
Alam, J., Kenny, P., O Shaughnessy, D., Robust Feature Extraction Based on an Aysmmetric Level-Dependent Auditory Filterbank and a Subband Spectrum Enhancement Technique, Digital Signal Processing, 29, pp. 147-157, June 2014.
Alam, J., Kenny, P., and O Shaugnessy, D., Low-variance Multitaper Mel-frequency Cepstral Coefficient Features for Speech and Speaker Recognition Systems, Cognitive Computation, Springer, December 2012.
Alam, J., Kinnunen, T., Kenny, P., Ouellet, P., and O Shaugnessy, D., Mutlitaper MFCC and PLP Features for Speaker Verification Using I-Vectors Speech Communication, 55(2), pp. 237-251, February 2013. [ISCA-Award for the best paper published in Speech Communication (2013 - 2015)]
Alam, J., Chowdhury, M. F. A., and Alam, M. F., "Wiener Denoising based on Perceptual Frequency Weighting and Noise Spectrum Shaping," in Istanbul University Journal of Electrical and Electronic Engineering (IU-JEEE), pp. 1197-1203, Vol. 25, No.1, 2013.
Alam, J., O’Shaughnessy, D., “Perceptual improvement of Wiener filtering employing a post-filter,” Digital Signal Processing, vol. 21(1), pp. 54-65, January 2011.
Alam, J., Md. Faqrul Alam Chowdhury, Md. Fasiul Alam, “Comparative study of a priori signal-tonoise ratio (SNR) estimation approaches for speech enhancement,” Istanbul University–Journal of Electrical and Electronic Engineering (IU-JEEE), vol. 9 (1), pp. 809-817, 2009.
Mohiuddin AHMAD, Mostafa Zaman CHOWDHURY, and Jahangir ALAM, “Tissue-Motion Analysis of Artery Pulsation in Cranial Ultrasonogram of Newborn Baby”, Istanbul University–Journal of Electrical and Electronic Engineering (IU-JEEE), pp.53-59, Vol. 6 No.1, 2006.

Conferences

Alam, J. and Md Shahidul Alam, "Text-Independent Speaker Verification Employing A Novel Hybrid Neural Embedding Extractor," in proc. of the international Joint Conference on Biometrics (IJCB), Osaka, Japan, 8-11 September, 2025.
Fathan, Abderrahim, Alam, J., "Automatic Labeling and Correction of Noisy Labels for Robust Self-Supervised Speaker Verification," accepted for publication in the ISCA INTERSPEECH, Rotterdan, The Netherlands, 17-21 August 2025.
Fathan, Abderrahim, Alam, J., Zhu, Xiaolin, "An Investigative Study on Recent Sharpness- and Flatness-Based Optimizers for Enhanced Self-Supervised Speaker Verification," accepted for publication in the ISCA INTERSPEECH, Rotterdan, The Netherlands, 17-21 August 2025.
Alam, J. Fathan, Abderrahim, and Md Shahidul Alam, "A Hybrid Neural Approach to Speaker Verification with an Improved Additive Angular Margin Loss," accepted for publication in International Joint Conference on Neural Networks (IJCNN), Rome, Italy, June 30-July 5, 2025.
Alam, J. and Md Shahidul Alam, "A Novel Hybrid Neural Embedding Extractor for Text Independent Speaker Verification," in proc. of the international Workshop on Biometrics and Forensics (IWBF), Munich, Germany, 24-25 April, 2025.
Gnana Praveen Rajasekhar, Alam, J. and Eric Charton, "United we stand, Divided we fall: Handling Weak Complementary Relationships for Audio-Visual Emotion Recognition in Valence-Arousal Space," in the IEEE Computer Vision and Pattern Recognition (IEEE CVPR) Workshop (8th ABAW), Nashville, USA, 10-17 June 2025.
Abderrahim Fathan, Xiaolin Zhu and Alam, J., "AdaptiveDrop: A Simple Adaptive Label Noise Filtering Scheme for Enhanced Self-supervised Speaker Verification," in the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hyderabad, India, 06-10 April 2025.
Gnana Praveen Rajasekhar and Alam, J., "LAVViT: Latent Audio-Visual Vision Transformers for Speaker Verification," in the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hyderabad, India, 06-10 April 2025.
Hossein Zeinali, Kong Aik Lee, Jahangir Alam and Lukas Burget, "Text-dependent Speaker Verification Challenge 2024: Exploring Shared and User-defined Passphrases," in the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hyderabad, India, 06-10 April 2025.
Rajasekhar, Gnana Praveen, and Alam, J.. "SSAVSV: Towards Unified Model for Self-Supervised Audio-Visual Speaker Verification." arXiv preprint arXiv:2506.17694 (2025).
Alam, J. et. al, "ABC SYSTEM DESCRIPTION FOR NIST SRE 2024," in NIST SRE 2024 Workshop, San Juan, Puerto Rico, 3-4 December 2024.
Gnana Praveen Rajasekhar and Alam, J., "Less is Enough: Adapting Pre-trained Vision Transformers for Audio-Visual Speaker Verification Transformers for Audio-Visual Person Verification," in the NeurIPS 4th Efficient Natural Language and Speech Processing (ENLSP-IV) workshop, Vancouver, British Columbia, Canada, 10-26 Dec 2024.
A. Fathan, X. Zhu and J. Alam, "On the influence of regularization techniques on label noise robustness: Self-supervised speaker verification as a use case," 2024 IEEE International Joint Conference on Biometrics (IJCB), Buffalo, NY, USA, 2024, pp. 1-23, doi: 10.1109/IJCB62174.2024.10744521.
Abderrahim Fathan, Xiaolin Zhu and Alam, J., "Enhanced label noise robustness through early adaptive filtering for the self-supervised speaker verification task," in the NeurIPS 4th Efficient Natural Language and Speech Processing (ENLSP-IV) workshop, Vancouver, British Columbia, Canada, 10-26 Dec 2024.
Alam, J. and Md Shahidul Alam, "On the Influence of CNN-based Feature Learning Modules in Neural Speaker Verification Framework," in proc. of the SPECOM, Belgrade, Serbia, 25-29 November, 2024.
Gnana Praveen Rajasekhar and Alam, J., "Dynamic Cross-Attention for Audio-Visual Continuous Emotion Recognition" in FRQ-S Co-Chair in AI and Digital Health Symposium, Montreal, Quebec, Canada, 18 October 2024.
Abderrahim Fathan and Alam, J., "On the impact of several regularization techniques on label noise robustness of self-supervised speaker verification systems," in proc. of the ISCA INTERSPEECH, September 1 - 5, Kos Island, Greece, 2024.
Abderrahim Fathan and Alam, J., "Contrastive Information Maximization Clustering for Self-Supervised Speaker Recognition," in proc. of the IEEE Conference on Artificial Intelligence (IEEE CAI), Singapore, 25-27 June 2024.
Abderrahim Fathan and Alam, J., "On the influence of metric learning loss functions for robust self-supervised speaker verification to label noise," in proc. of the IEEE Conference on Artificial Intelligence (IEEE CAI), Singapore, 25-27 June 2024.
Abderrahim Fathan and Alam, J., "An investigative study of the effect of several regularization techniques on label noise robustness of self-supervised speaker verification systems," in proc. of the ISCA ODYSSEY Speaker and Language Recognition Workshop, Quebec City, Quebec, Canada, 18-21 June 2024.
Gnana Praveen Rajasekhar and Alam, J.,, "Cross-Modal Transformers for Audio-Visual Person Verification," accepted for publication in the ISCA Odyssey Speaker and Language Recognition Workshop, Quebec City, Quebec, Canada, 18-21 June 2024.
Gnana Praveen Rajasekhar and Alam, J.,, "Recursive Joint Cross-Modal Attention for Multimodal Fusion in Dimensional Emotion Recognition" in the IEEE Computer Vision and Pattern Recognition (IEEE CVPR) Workshop (6th ABAW), Seattle, USA, 17-21 June 2024.
Gnana Praveen Rajasekhar and Alam, J., "CROSS-ATTENTION IS NOT ALWAYS NEEDED: DYNAMIC CROSS-ATTENTION FOR AUDIO-VISUAL DIMENSIONAL EMOTION RECOGNITION," accepted for publication in the IEEE Conference on Multimedia and Expo (IEEE ICME), Niagra Falls, Canada, 15-19 July 2024.
Gnana Praveen Rajasekhar and Alam, J., "Dynamic Cross Attention for Audio-Visual Person Verification," accepted for publication in the IEEE Conference on Automatic Face and Gesture Recognition, Istanbul, Turkey, 27-31 May 2024.
Gnana Praveen Rajasekhar and Alam, J.,, "Audio-Visual Person Verification based on Recursive Fusion of Joint Cross-Attention," accepted for publication in the IEEE Conference on Automatic Face and Gesture Recognition, Istanbul, Turkey, 27-31 May 2024.
Abderrahim Fathan and Alam, J.,, "Self-supervised Speaker Verification Employing a Novel Clustering Algorithm," accepted for publication in IEEE ICASSP, Seoul, Korea, April 14-19, 2024.
Gnana Praveen Rajasekhar and Alam, J.,, "Recursive Joint Cross-Attention for Audio-Visual Speaker Verification," accepted for publication in the NeurIPS 3rd edition of the Efficient Natural Language and Speech Processing (ENLSP-III) workshop, New Orleans, USA, December 2023. https://neurips2023-enlsp.github.io/papers/paper_58.pdf
Abderrahim Fathan and Alam, J.,, "An efficient clustering algorithm for self-supervised speaker recognition," accepted for publication in the NeurIPS 3rd edition of the Efficient Natural Language and Speech Processing (ENLSP-III) workshop, New Orleans, USA, December 2023. https://neurips2023-enlsp.github.io/papers/paper_89.pdf
Abderrahim Fathan and Alam, J.,, "CAMSAT: Augmentation Mix and Self-Augmented Training Clustering for Self-Supervised Speaker Recognition," accepted for publication in IEEE ASRU, Taipei, Taiwan, December 16-20, 2023. [acceptance rate 45.00%] 10.1109/ASRU57964.2023.10389758
Abderrahim Fathan and Alam, J.,, "Self-Supervised Speaker Verification Employing Augmentation Mix and Self-Augmented Training-Based Cluster," accepted for publication in the SPECOM conference, Nov. 29 - Dec. 01, Hubli-Dharwad, India, 2023. https://link.springer.com/chapter/10.1007/978-3-031-48312-7_44
Gnana Praveen Rajasekhar and Alam, J.,, "Audio-Visual Speaker Verification via Joint Cross Attention," accepted for publication in the SPECOM conference, Nov 29 - Dec 01, Hubli-Dharwad, India, 2023. https://arxiv.org/abs/2309.16569 / https://link.springer.com/chapter/10.1007/978-3-031-48312-7_2
Abderrahim Fathan, Jahangir Alam and Xiaolin Zhu, "Multi-task learning over mixup variants for the speaker verification task," accepted for publication in the SPECOM conference, Nov 29 - Dec 01, Hubli-Dharwad, India, 2023. https://link.springer.com/chapter/10.1007/978-3-031-48312-7_36
Md Shahidul Alam, Abderrahim Fathan and Jahangir Alam, "Audio DeepFake Detection Employing Multiple Parametric Exponential Linear Units," accepted for publication in the SPECOM conference, Nov 29 - Dec 01, Hubli-Dharwad, India, 2023. https://link.springer.com/chapter/10.1007/978-3-031-48312-7_25
Alam, J., "On the Use of Cross- and Self-Module Attentive Statistics Pooling Techniques for Text-Independent Speaker Verification," accepted for publication in the IEEE International Joint Conference on Biometrics (IJCB), Ljubljana, Slovenia, 25-28 September 2023. [acceptance rate 36.7%]
Alam, J., Kang, W., and Fathan, A., "Hybrid Neural Network With Cross- and Self-Module Attention Pooling for Text-Independent Speaker Verification," accepted for publication in IEEE ICASSP, Rhodes Island, Greece, 4-10 June, 2023.
Fathan, A., Alam, J. and Kang W., "Investigation of the quality of pseudo-labels for the self-supervised speaker verification task," accepted for publication in the SASB 2023: Self-Supervision in Audio, Speech and Beyond, a satellite Workshop of ICASSP, Rhodes Island, Greece, 4-10 June, 2023 [https://sites.google.com/view/icassp-sasb-2023/accepted-papers].
Fathan, A., and Alam, J., "On the influence of the quality of pseudo-labels on the self-supervised speaker verification task: a thorough analysis," accepted for publication in IEEE IWBF, Barcelona, Spain, 19-20 April, 2023 [https://ieeexplore.ieee.org/document/10157651].
Alam, J., and Fathan, A., "On the Use of Cross-module Attention Statistics Pooling for Speaker Verification," accepted for publication in IEEE IWBF, Barcelona, Spain, 19-20 April, 2023 [https://ieeexplore.ieee.org/document/10157564].
Fathan, A., Alam, J., and Kang, W., "On the impact of the quality of pseudo-labels on the self-supervised speaker verification task," in the proc. of the publication to the NeurIPS 2022 Efficient Natural Language and Speech Processing (ENLSP) Workshop, December, 2022. https://neurips2022-enlsp.github.io/papers/paper_51.pdf
Kang, W., Alam, J., and Fathan, A., "Flow-ER: a Flow-based Embedding Regularization Strategy for Robust Speech Representation Learning," in proc. of the IEEE Spoken Language Technology (SLT) Workshop, Doha, Qatar, 9-12 January 2022.
Kang, W., Alam, J., and Fathan, A., "An analytic study on clustering-based pseudo-labels for self-supervised deep speaker verification," accepted for publication in Proc. of the 24th SPECOM conference, GURUGRAM, INDIA, November 14-16, 2022.
Fathan, A., Alam, J., and Kang, W., "Multiresolution Decomposition Analysis via Wavelet Transforms for Audio Deepfake Detection," accepted for publication in Proc. of the 24th SPECOM conference, GURUGRAM, INDIA, November 14-16, 2022.
Alam, J., Kang, W., and Fathan, A., "Neural Embedding Extractors for Text-Independent Speaker Verification," accepted for publication in Proc. of the 24th SPECOM conference, GURUGRAM, INDIA, November 14-16, 2022.
Kang, W., Alam, J., and Fathan, A., "End-to-end framework for spoof-aware speaker verification," in Proc. of the ISCA INTERSPEECH, Incheon Korea (Hybrid), September 18 - 22, 2022.
Kang, W., Alam, J., and Fathan, A., "Mixup regularization strategies for spoofing countermeasure system," in Proc. of the ISCA INTERSPEECH, Incheon Korea (Hybrid), September 18 - 22, 2022.
Kang, W., Alam, J., and Fathan, A., "MIM-DG: Mutual information minimization-based domain generalization for speaker verification," in Proc. of the ISCA INTERSPEECH, Incheon Korea (Hybrid), September 18 - 22, 2022.
Kang, W., Alam, J.,, "Investigation on deep speaker embedding extraction methods for multi-genre speaker verification," in Proc. of the ODYSSEY 2022 Speaker and Language Recognition Workshop, CNSRC 2022 special session, Beijing, China (Virtual), June 28 - July 1, 2022.
Kang, W., Alam, J., and Fathan, A., "Investigation on mixup strategies for end-to-end voice spoof detection system," in Proc. of ODYSSEY 2022 Speaker and Language Recognition Workshop, Beijing, China (Virtual), June 28 - July 1, 2022.
Kang, W., Alam, J., and Fathan, A., "Domain generalized speaker embedding learning via mutual information minimization," in Proc. of ODYSSEY 2022 Speaker and Language Recognition Workshop, Beijing, China (Virtual), June 28 - July 1, 2022.
Alam, J., Kang, W., and Fathan, A., "Hybrid Neural Network-based Deep Embedding Extractors for Text-Independent Speaker Verification," in Proc. of ODYSSEY 2022 Speaker and Language Recognition Workshop, Beijing, China (Virtual), June 28 - July 1, 2022.
Alam, J., et al., "Development of ABC Systems for the 2021 Edition of NIST Speaker Recognition Evaluation," in Proc. of ODYSSEY 2022 Speaker and Language Recognition Workshop, Beijing, China (Virtual), June 28 - July 1, 2022.
Fathan, A., Alam, J., and Kang, W., "Mel-spectrogram image-based end-to-end audio deepfake detection under channel-mismatched conditions," in Proc. of the ICME conference, July 18-22, Taipei, Taiwan, 2022.
Kang, W., Alam, J., and Fathan, A., "Deep learning-based end-to-end spoken language identification system for domain-mismatched scenario," in Proc. of the international Conference on Language Resources Evaluation (LREC), Marseille, France, June 20-25, 2022.
Kang, W., Alam, J., and Fathan, A., “Robust self-supervised speaker representation learning via instance mix regularization,” accepted for publication in IEEE ICASSP, Singapore, May 22-27, 2022.
Kang, W., Alam, J., and Fathan, A., "Investigation on instance mixup regularization strategies for robust self-supervised speaker representation learning," in the proc. of the AAAI 2022 Self-supervision in Audio and Speech (AAAI 2022 SAS), Virtual, February 28 - March 01, 2022. https://aaai-sas-2022.github.io/static/media/AAAISAS2022_vmix_cameraready-compressed.e0312ee8.pdf
Kang, W., Alam, J., and Fathan, A., "Attentive activation function for improving end-to-end spoofing countermeasure systems," in arXiv, May, 2022 [https://arxiv.org/pdf/2205.01528.pdf].
Kang, W., Alam, J., and Fathan, A., "Robust Speech Representation Learning via Flow-based Embedding Regularization," in arXiv, 7 December 2021 [https://arxiv.org/pdf/2112.03454.pdf]. ar
Kang, W., Alam, J., and Fathan, A., "Hybrid network with multi-level global-local statistics pooling for robust text-independent speaker recognition," accepted for publication in the Proc. of Automatic Speech Recognition and Understanding (ASRU), December 15-17, 2021.
Monteiro, J., Alam, J., Falk, T., "A versatile and efficient approach to summarize speech into utterance-level representations," in the proc. of the NeurIPS 2021 Efficient Natural Language and Speech Processing (ENLSP) Workshop, December, 2021. https://neurips2021-nlp.github.io/papers/2/CameraReady/ENLSP_MLTDNN.pdf
Alam, J., Fathan, A., and Kang, W., "Text-Independent Speaker Verification Employing CNN-LSTM-TDNN Hybrid Networks," in Proc. of the 23rd SPECOM conference, September 27-30, St. Petersburg, Russia, 2021.
Alam, J., Fathan, A., and Kang, W., "End-to-End Voice Spoofing Detection Employing Time Delay Neural Networks and Higher Order Statistics," in Proc. of the 23rd SPECOM conference, September 27-30, St. Petersburg, Russia, 2021.
Fathan, A., Alam, J., and Kang, W., "An Ensemble Approach for the Diagnosis of COVID-19 from Speech and Cough Sounds," in Proc. of the 23rd SPECOM conference, September 27-30, St. Petersburg, Russia, 2021. online: https://link.springer.com/chapter/10.1007/978-3-030-87802-3_18
Kang, W., Alam, J., and Fathan, A., "Investigation on activation functions for robust end-to-end spoofing attack detection system, "Proc. of the ASVspoof2021 Workshop - a satellite workshop of INTERSPEECH 2021, September 16th, Online, 2021.
Kang, W., Alam, J., and Fathan, A., "CRIM's System Description for the ASVSpoof2021 Challenge, "Proc. of the ASVspoof2021 Workshop - a satellite workshop of INTERSPEECH 2021, September 16th, Online, 2021.
Kang, W., Kim, N., S., "Team02 Text-Independent Speaker Verification System for SdSV Challenge 2021," in the Proceedings of INTERSPEECH, Brno, Czech Republic, 30 August - 3 September, 2021.
Kang, W., Alam, J., and Fathan, A., "Hybrid network with multi-level global-local statistics pooling for robust text-independent speaker recognition," accepted for publication in the Proc. of Automatic Speech Recognition and Understanding (ASRU), December 15-17, 2021.
J. Monteiro, I. Albuquerque, Alam, J., R. D. Hjelm, T. Falk, "An End-to-End Approach for the Verification Problem: Learning The Right Distance," in the Proceedings of ICML, July 12-18, 2020.
Montiero, J, Alam, J., and Falk, T., "An ensemble Based Approach for Generalized Detection of Spoofing Attacks to Automatic Speaker Recognizers," in proceedings of ICASSP, Barcelona, Spain, May, 2020.
Zeinali, H., Lee K., A., Alam, J., Burget, L., "SdSV Challenge 2020: Large-Scale Evaluation of Short‐Duration Speaker Verification," in proceedings of INTERSPEECH, China, October 25-29, 2020.
Montiero, J, Alam, J., and Falk, T., "On The Performance of Time-Pooling Strategies for End-to-End Spoken Language Identification," in Proceedings of LREC, May, 2020.
Alam, J., Boulianne, G., et al., "Analysis of ABC Submission to NIST SRE 2019 CMN and VAST Challenge," in Proceedings of Odyssey Speaker and Language Recognition Workshop, Tokyo, Japan, November, 2020.
Montiero, J, Alam, J., and Falk, T., "A Multi-condition Training Strategy for Countermeasures Against Spoofing Attacks to Speaker Recognizers," accepted in proceedings of Odyssey Speaker and Language Recognition Workshop, Tokyo, Japan, November, 2020 .
Gupta, V,, Alam, J., Boulinne, G., “CRIM's Automatic Speech Recognition and Speech Activity Detection Systems Description for the 2020 edition of NIST Open Speech Analytic Technologies Evaluation,” in proceedings of OpenSAT 2020 Evaluation Workshop, USA, September 2020.
Alam, J., Boulianne, G., Gupta, V., F., Abderrahim, "An Ensemble Approach to Unsupervised Anomalous Sound Detection," in DCASE 2020 Unsupervised Detection of Anomalous Sounds for Machine Condition Monitoring Challenge Workshop, November, 2020.
Alam, J., Boulianne, G., et al., "ABC System Description for NIST Multimedia Speaker Recognition Evaluation 2019," in NIST SRE 2019 Workshop, Singapore, December 12-13, 2019.
Alam, J., "On the use of Fisher Vector Encoding for Voice Spoofing Detection," in proceedings of the 13th International Conference on Ubiquitous Computing and Ambient Intelligence (UCAmI2019), Spain, December 2-5, 2019.
Alam, J., Boulianne, G., et al., "ABC NIST Speaker Recognition Evaluation 2019 CTS System Description," in NIST SRE 2019 Workshop, Singapore, December 12-13, 2019 .
J. Monteiro, I. Albuquerque, Alam, J., T. Falk, "An End-to-End Approach for the Verification Problem Through Learned Metric-like Spaces," in LXAI workshop at the 33rd NeurIPS, 2019.
Monteiro, J., Alam, J., "Development of Voice Spoofing Detection Systems for 2019 edition of Automatic Speaker Verification and Countermeasures Challenge," accepted for presentation at Automatic Speech Recognition and Understanding (ASRU) workshop, Singapore, December 14-18, 2019.
Alam, J., and Monteiro, J., "CRIM's Voice Spoofing Detection Systems for ASVspoof2019: End-to-end and Fisher-Vector Representations as Countermeasures," dans Automatic Speaker Verification and Countermeasures Challenge (ASVspoof) 2019, February, 2019.
Alam, J., Gupta, V., and Boulianne, G.,"Supervised and Unsupervised SAD Algorithms for the 2019 edition of NIST Open Speech Analytic Technologies Evaluation," in NIST Open Speech Analytic Technologies Evaluation 2019 (OpenSAT2019) Workshop, Maryland, USA, August 2019.
Anderson A., Alam, J., Douglas, O., Tiago, F.,"Intrusive Quality Measurement of Noisy and Enhanced Speech based on i-Vector Similarity," in Proc. of Eleventh International Conference on Quality of Multimedia Experience (QoMEX), June 2019.
Anderson A., Alam, J., Douglas, O., Tiago, F.,"Blind Channel Response Estimation for Replay Attack Detection," in proc. of INTERSPEECH, Graz, Austria, September 2019.
Bhattacharya, G., Montiero, J., Alam, J., and Kenny, P., "Generative Adversarial Speaker Embedding Networks for Domain-Robust End-to-End Speaker Verification" in Proceedings of ICASSP, Brighton, UK, May 12-19, 2019.
Bhattacharya, G., Alam, J. and Kenny, P., "Adapting End-to-End Neural Speaker Verification to New Languages and Recording Conditions with Adversarial Training," in Proceedings of ICASSP 2019, May 12-17, Brighton, UK, 2019.
Bhattacharya, G., Alam, J. and Kenny, P.,"Deep Speaker Recognition: Modular or Monolithic?," in proc. of INTERSPEECH, Graz, Austria, September 2019.
Gupta, V., Rebout, L., Boulianne, G., Ménard, P. A.,Alam, J.,"CRIM's Speech Transcription and Call Sign Detection System for the ATC Airbus Challenge task," in proc. of INTERSPEECH, Graz, Austria, September 2019.
Monteiro, J., Alam, J., Bhattacharya, G. et Falk, T. "End-to-End Language Identification using a Residual Convolutional Neural Network with Attentive Temporal Pooling," in proc. of European Signal Processing Conference (EUSIPCO), Spain, September 2019.
Monteiro, J., Alam, J. et Falk, T. "Combining Speaker Recognition and Metric Learning for Speaker-Dependent Representation Learning" in proc. of INTERSPEECH, Graz, Austria, September 2019.
Monteiro, J., Alam, J. et Falk, T. "End-to-end Detection of Attacks to Automatic Speaker Recognizers with Time-attentive Light Convolutional Neural Networks," in proc. of MLSP, Pittsburgh, PA, USA, 2019.
Monteiro, J., Alam, J. et Falk, T. "Performance Comparison of Time-Pooling Strategies for End-to-End Speech-Based Language Identification," submitted to MLSP 2019.
Monteiro, J., Alam, J. et Falk, T. "Latent variable models with implicit priors: an application on speaker-dependent representation learning" submitted to European Signal Processing Conference (EUSIPCO) 2019.
Alam, J., Bhattacharya, G. et Kenny, P. "Boosting the Performance of Spoofing Detection Systems on Replay Attacks Using q-Logarithm Domain Feature Normalization" in Proc. of Odyssey 2018 - The Speaker and Language Recognition Workshop (Odyssey 2018), pp. 393-398. Les Sables d'Olonne, France, 26-29, June 2018.
Alam, J., Bhattacharya, G. et Kenny, P. "Speaker Verification in Mismatched Conditions with Frustratingly Easy Domain Adaptation" in Proc. of Odyssey 2018 - The Speaker and Language Recognition Workshop (Odyssey 2018), pp. 176-180. Les Sables d'Olonne, France, 26-29 June 2018.
Alam, J. et al.,"ABC NIST SRE 2018 SYSTEM DESCRIPTION," NIST Speaker Recognition Evaluation Workshop, Athens, Greece, December 2018.
Alam, J., Monteiro, J., "CRIM's Voice Spoofing Detection Systems for ASVspoof2019: End-to-end and Fisher-Vector Representations as Countermeasures," dans Automatic Speaker Verification and Countermeasures Challenge (ASVspoof) 2019, February, 2019.
Avila, A. R., Alam, J. et Falk, T. "Investigating Speech Enhancement and Perceptual Quality for Speech Emotion Recognition" in proceedings of INTERSPEECH 2018, pp. 3663-3667. Hyderabad, India, 2-6 September 2018.
Bhattacharya, G. and Montiero, J., and Alam, J., and Kenny, P., "SpeakerGAN: Recognizing Speakers in New Languages with Generative Adversarial Networks" in Proceedings of NIPS 2018 IRASL Workshop, Montreal, Canada, December 8, 2018.
Bhattacharya, G., Alam, J., Gupta, V. et Kenny, P. "Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification" dans proceedings of INTERSPEECH 2018, pp. 3588-3592. Hyderabad, Inde, du 2 au 6 septembre 2018.
Gupta, V., Alam, J. CRIM’s Speaker Diarization System for the DIHARD Diarization Challenge, dans proceedings of Interspeech Special Session: The First DIHARD Speech Diarization Challenge, Hyderabad, Inde, 5 septembre 2018.
Alam, J. et Kenny, P. "Spoofing Detection Employing Infinite Impulse Response - Constant Q Transform-based Feature Representations" dans Proc. of the 25th European Signal Processing Conference (EUSIPCO 2017), pp. 111-115. Kos Island, Grece, du 28 août au 2 septembre 2017.
Alam, J., Kenny, P., Bhattacharya, G. et Kockmann, M. "Speaker Verification Under Adverse Conditions Using I-Vector Adaptation and Neural Networks " dans Proceedings of INTERSPEECH 2017 - Situated interaction (Interspeech 2017), pp. 3732-3736. Stockholm, Sweden, du 20 au 24 août 2017 .
Bhattacharya, G., Alam, J. et Kenny, P. "Deep Speaker Embeddings for Short-Duration Speaker Verification " dans Proceedings of INTERSPEECH 2017 - Situated interaction(Interspeech 2017), pp. 1517-1521. Stockholm, Sweeden, du 20 au 24 août 2017 .
Plchot, O., Matejka, P., Silnova, A., Novotny, O., Diez Sanchez, M., Rohdin, J., Glembek, O., Brümmer, N., Swart, A., Jorrin-Prieto, J., Garcia, P., Buera, L., Kenny, P., Alam, J. et Bhattacharya, G. "Analysis and Description of ABC Submission to NIST SRE 2016 " dans Proceedings of INTERSPEECH 2017 - Situated interaction (Interspeech 2017), pp. 1348-1352. Stockholm, Sweden, du 20 au 24 août 2017 .
Brummer, N., Swart, A., Jorrin-Prieto, J., Garcia, P., Buera, L. et al., “ABC NIST SRE 2016 System Description,” NIST SRE 2016 Workshop, San Diego, CA, December 2016.
Kenny, P., Stafylakis, T., Alam, J., Gupta, V. and Kockmann, M. "Uncertainty modeling without subspace methods in text-dependent speaker recognition" in Proc. of the Odyssey Speaker and Language Recognition Workshop (Odyssey 2016). Bilbao, Spain, du 21 au 24 juin 2016
Alam, J., Kenny, P. and Gupta, V. "Tandem Features for Text-Dependent Speaker Verification on the RedDots Corpus " in Proc. of the 17th Annual Conference of the International Speech Communication Association (INTERSPEECH 2016), pp. 420-424. San Francisco, USA, du 8 au 12 septembre 2016.
Jahangir Alam, Vishwa Gupta and Patrick Kenny, "CRIM's Speech Recognition System for the 4th CHIME Challenge," Proc. 4th CHIME Challenge, San Francisco, CA 2016.
Bhattacharya, G., Alam, J., Kenny, P. et Gupta, V. "Modelling Speaker and Channel Variability Using Deep Neural Networks for Robust Speaker Verification" dans Proc. of the 2016 IEEE Workshop on Spoken Language Technology (IEEE SLT 2016). San Diego, California, USA, du 13 au 16 décembre 2016.
Alam, J., Kenny, P., Gupta, V. and Stafylakis, T. "Spoofing Detection on the ASVSpoof2015 Challenge Corpus Employing Deep Neural Networks" in Proc. of the Odyssey Speaker and Language Recognition Workshop (Odyssey 2016). Bilbao, Spain, du 21 au 24 juin 2016.
Themos Stafylakis, Patrick Kenny, Vishwa Gupta, Jahangir Alam and Marcel Kockmann, "Compensation for Phonetic Nuisance Variability in Speaker Recognition Using DNNs," Proc. Odyssey Speaker and Language Recognition Workshop, Bilbao, Spain, June 2016.
Bhattacharya, G, Alam, J., Stafylakis, T. and Kenny, P. "Deep Neural Network based Text-Dependent Speaker Recognition : Preliminary Results" in Proc. of Odyssey Speaker and Language Recognition Workshop (Odyssey 2016), pp. 9-15. Bilboa, Spain, du 21 au 24 juin 2016.
Jahangir Alam, Patrick Kenny and Themos Stafylakis, "Combining Amplitude and Phase-Based Features for Speaker Verification with Short Duration Utterances," Proc. Interspeech, Dresden Germany, Sept. 2015.
Jahangir Alam, Patrick Kenny, Gautam Bhattacharya and Themos Stafylakis, Development of CRIM System for the Automatic Speaker Verification Spoofing and Countermeasures Challenge 2015, Proc. Interspeech, Dresden, Germany, Sept. 2015.
Themos Stafylakis, Patrick Kenny, Jahangir Alam and Marcel Kockmann, JFA for Speaker Recognition with Random Digit Strings, Proc. Interspeech, Dresden Germany, Sept. 2015.
Patrick Kenny, Themos Stafylakis, Jahangir Alam and Marcel Kockmann, An I-Vector Backend for Speaker Verification, Proc. Interspeech, Dresden Germany, Sept. 2015.
Patrick Kenny, Themos Stafylakis, Jahangir Alam and Marcel Kockmann, JFA Modeling with Left-to-Right Structure and a New Backend for Text-Dependent Speaker Recognition Proc. ICASSP, Brisbane, Australia, April 2015.
Patrick Cardinal, Najim Dehak, Alessandro Lameiras Koerich, Jahangir Alam, Patrice Boucher, "ETS System for AV+EC 2015 challenge," proceeding of AVEC 2015, Brisbane, Australia, October 2015.
Alam, Jahangir, Yazid Attabi, Patrick Kenny, Pierre Dumouchel, and Douglas O’Shaughnessy. "Automatic Emotion Recognition from Cochlear Implant-Like Spectrally Reduced Speech." In Ambient Assisted Living and Daily Activities, pp. 332-340. Springer International Publishing, 2014.
. Patrick Kenny, Themos Stafylakis, Jahangir Alam, Pierre Ouellet and Marcel Kockmann, In-Domain versus Out-of-Domain Training for Text-Dependent JFA Proc. INTERSPEECH, Singapore, September 2014.
Jahangir Alam, Patrick Kenny, Pierre Dumouchel and Douglas O'Shaughnessy, Noise Spectrum Estimation using Gaussian Mixture Model-based Speech Presence Probability for Robust Speech Recognition, Proc. INTERSPEECH, Singapore, September 2014.
Jahangir Alam, Patrick Kenny, Pierre Dumouchel and Douglas O'Shaughnessy, Robust Feature Extractors for Continuous Speech Recognition Proc. EUSIPCO, Lisbon, Portugal, September 2014.
Jahangir Alam, Patrick Kenny, Pierre Dumouchel and Douglas O'Shaughnessy, Robust Speech Recognition Using Warped DFT-Based Cepstral Features in Clean and Multistyle Training Proc. of EUSIPCO, Lisbon, Portugal, 2014.
Kenny, P., Stafylakis, T., Alam, J., Ouellet, P., and Kockmann, M., Joint Factor Analysis For TextDependent Speaker Verification Proc. Odyssey Speaker and Language Recognition Workshop, Joensuu, Finland, June 2014.
Kenny, P., Gupta, V., Stafylakis, T., Ouellet, P. and Alam, J., Deep Neural Networks for Extracting Baum-Welch Statistics for Speaker Recognition Proc. Odyssey Speaker and Language Recognition Workshop, Joensuu, Finland, June 2014.
Alam, J., Kenny, P., Ouellet, P., Stafylakis, T. and Dumouchel, P., Supervised/Unsupervised Voice Activity Detectors for Text-Dependent Speaker Recognition on the RSR2015 Corpus Proc. Odyssey Speaker and Language Recognition Workshop, Joensuu, Finland June 2014.
. Kenny, P., Stafylakis, T., Ouellet, P., and Alam, J., JFA-Based Front Ends for Speaker Recognition Proc. ICASSP, May 2014.
Alam, J., Gupta, V., Kenny, P., Dumouchel, P., Use of Multiple Front-Ends and I-Vector Based Speaker Adaptation for Robust Speech Recognition, Proc. REVERB Challenge, Florence, Italy, May 2014.
Alam, J., Attabi, Y., Dumouchel, P., Kenny, P., O Shaughnessy, D., Amplitude Modulation Features for Emotion Recognition from Speech Proc. Interspeech, Lyon, France, August 2013.
Alam, J., Kenny, P., O Shaughnessy, D., Regularized MVDR Spectrum Estimation-Based Robust Feature Extractors for Speech Recognition Proc. Interspeech, Lyon, France, August 2013.
Alam, J., Kenny, P., O Shaughnessy, D., Smoothed Nonlinear Energy Operator Based Amplitude Modulation Features for Robust Speech Recognition Proc. NOLISP, Mons, Belgium, June 2013.
Kinnunen, T., Alam, J., Matejka, P., Kenny, P., Cernocky, J., O Shaughnessy, D., Frequency Warping and Robust Speaker Verification: A Comparison of Alternative Mel-Scale Representations, Proc. Interspeech, Lyon, France, August 2013.
Alam, J., Kenny, P., O Shaughnessy, D., Speech Recognition using Regularized Minimum Variance Distortionless Response Spectrum Estimation Based Cepstral Features Proc. ICASSP, Vancouver, Canada, May 2013.
Alam, J., O Shaughnessy, D., Kenny, P., A Novel Feature Extractor Employing Regularized MVDR Spectrum Estimator and Subband Spectrum Enhancement Technique Proc. WOSSPA, Algeirs, Algeria, May 2013. [was nominated for best student paper].
Kenny, P., Stafylakis. T., Ouellet, P., Alam, J., Dumouchel, P., PLDA for Speaker Verification with Utterances of Arbitrary Duration Proc. ICASSP, Vancouver, Canada, May 2013.
Attabi, Y., Alam, J., Dumouchel, P., Kenny, P., O Shaughnessy, D., Multiple Windowed Spectral Features for Emotion Recognition Proc. ICASSP, Vancouver, Canada, May 2013.
Alam, J., Kenny, P., and O Shaughnessy, D., Robust Speech Recognition under Noisy Environments using Asymmetric Tapers Proc. EUSIPCO, 2012.
Alam, J., Kenny, P., and O Shaughnessy, D., Robust Feature Extraction for Speech Recognition by Enhancing Auditory Spectrum Proc. Interspeech, Portland, Oregon, September 2012.
Alam. J., Kenny, P., and O Shaughnessy, D., On the Use of Asymmetric-shaped Tapers for Speaker Verification using I-Vectors Proc. Odyssey Speaker and Language Recognition Workshop, Singapore, June 2012.
Alam, J., Kinnunen, T., Kenny, P., Ouellet, P., and O Shaughnessy, D., Multi-Taper MFCC Features for Speaker Verification Using I-Vectors Proc. ASRU 2011, Hawaii, December 2011.
Alam, J., Ouellet, P., Kenny, P., O Shaughnessy, D., Comparative Evaluation of Feature Normalization Techniques for Speaker Verification Proc NOLISP 2011, LNAI 7015, pp. 246-253, Las Palmas, Spain, November 2011.
Alam, J., Kenny, P., O Shaughnessy, D., A Study of Low-Variance Multi-Taper Features for Distributed Speech Recognition Proc NOLISP 2011, LNAI 7015, pp. 239-245, Las Palmas, Spain, November 2011.
Alam, M., J., Selouani, S-A., and O'Shaughnessy, D., "An improved perceptual speech enhancement technique employing a psychoacoustically motivated weighting factor", Proceeding of IEEE Automatic Speech Recognition and Understanding (ASRU), pp. 266-270, December 2009.
Alam, Jahangir, Douglas O. Shaughnessy, and Sid-Ahmed Selouani. "Novel objective criteria for perceptual separation of two kinds of distortion in speech enhancement applications." In Computers and Information Technology, 2009. ICCIT'09. 12th International Conference on, pp. 483-487. IEEE, 2009.
Jahangir Alam, Sid-Ahmad Selouani, and Douglas O’Shaughnessy, “Speech enhancement based on novel two-step a priori SNR estimators,” Proceeding of INTERSPEECH’08, pp. 565-568, Brisbane, Australia, September 2008.
Jahangir Alam, Sid-Ahmad Selouani, Douglas O’Shaughnessy and Sofia Ben Jebara, “Speech Enhancement using a Wiener denoising technique and musical noise reduction,” Proceeding of INTERSPEECH’08, pp. 407-410, Brisbane, Australia, September 2008.
Jahangir Alam, Sid-Ahmad Selouani, and Douglas O’Shaughnessy, “Speech enhancement based on a hybrid a priori signal-to-noise ratio (SNR) estimator and a self-adaptive Lagrange multiplier,” Proceeding of European Signal Processing Conference (EUSIPCO), Lausanne, Switzerland, August 2008.
Jahangir Alam, Douglas O’Shaughnessy, and Sid-Ahmad Selouani, “Speech Enhancement employing Sigmoid-type gain function with a modified a priori signal-to-noise ratio estimation”, Proceeding of IEEE CCECE’08, pp. 631-636, Ontario, Canada, May 2008.
Jahangir Alam, Douglas O’Shaughnessy, and Sid-Ahmad Selouani, “A new perceptual post-filter for single channel speech enhancement,” Proc. of IEEE ICECE’08, Dhaka, Bangladesh, 2008.
. Md. Faqrul Alam Chowdhury, Jahangir Alam and Md. Fasiul Alam, “Perceptually weighted multiband spectral subtraction speech enhancement technique,” Proc. of IEEE ICECE’08, Dhaka, Bangladesh, 2008.

Page updated

Google Sites

Report abuse