Research and Publications
I currently work as Principal Applied Scientist for Amazon. Prior to that, I was a Senior Research Scientist at the Human Language Technology Center of Excellence (HLTCOE), Johns Hopkins University. My research interests are in the broad areas of speech processing, deep learning, and multi-modal person identification. For the past few years I have been working on deep neural networks for speaker, language recognition, and diarization. I am a proud co-inventor of the x-vector embeddings that have set the state of the art in these fields. I enjoy providing hands-on technical leadership that brings new ideas from inception to production.
Education
Ph.D. in Electrical and Computer Engineering, University of Maryland, College Park, 2012.
M.S. in Electrical and Computer Engineering, Universidad Politecnica de Madrid, Spain, 2004.
B.S. in Electrical and Computer Engineering, Universidad Politecnica de Madrid, Spain, 2000.
Publications
2020
Daniel Garcia-Romero, Gregory Sell, Alan McCree, "MagNetO: X-vector Magnitude Estimation Network plus Offset for Improved Speaker Recognition", Odyssey, 2020. (Best paper award)
Daniel Garcia-Romero, Alan McCree, David Snyder, Gregory Sell, "JHU-HLTCOE System for the VoxSRC Speaker Recognition Challenge", ICASSP, 2020.
Jesús Villalba, Nanxin Chen, David Snyder, Daniel Garcia-Romero, Alan McCree, Gregory Sell, Jonas Borgstrom, Leibny Paola García-Perera, Fred Richardson, Réda Dehak, Pedro A Torres-Carrasquillo, Najim Dehak, "State-of-the-art speaker recognition with neural network embeddings in NIST SRE18 and speakers in the wild evaluations", Computer Speech & Language, 2020.
2019
Daniel Garcia-Romero, David Snyder, Gregory Sell, Alan McCree, Daniel Povey, Sanjeev Khudanpur, "X-vector DNN Refinement with Full-Length Recordings for Speaker Recognition", Interspeech, 2019.
Daniel Garcia-Romero, David Snyder, Shinji Watanabe, Gregory Sell, Alan McCree, Daniel Povey, Sanjeev Khudanpur, "Speaker Recognition Benchmark Using the CHiME-5 Corpus", Interspeech, 2019.
Alan McCree, Gregory Sell, Daniel Garcia-Romero, "Speaker Diarization Using Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings", Interspeech, 2019.
Jesús Villalba, Nanxin Chen, David Snyder, Daniel Garcia-Romero, Alan McCree, Gregory Sell, Jonas Borgstrom, Fred Richardson, Suwon Shon, François Grondin, Réda Dehak, Leibny Paola García-Perera, Daniel Povey, Pedro A Torres-Carrasquillo, Sanjeev Khudanpur, Najim Dehak, "State-of-the-Art Speaker Recognition for Telephone and Video Speech: The JHU-MIT Submission for NIST SRE18", Interspeech, 2019.
David Snyder, Daniel Garcia-Romero, Gregory Sell, Alan McCree, Daniel Povey, Sanjeev Khudanpur, "Speaker recognition for multi-speaker conversations using x-vectors", ICASSP, 2019.
Gregory Sell, David Etter, Daniel Garcia-Romero, Alan McCree, "Script Identification using Across-and Within-Image Distribution Estimation", ICDAR, 2019.
2018
David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, Sanjeev Khudanpur, "X-vectors: Robust dnn embeddings for speaker recognition", ICASSP, 2018.
Gregory Sell, Kevin Duh, David Snyder, Dave Etter, Daniel Garcia-Romero, "Audio-visual person recognition in multimedia data from the IARPA Janus program", ICASSP, 2018.
Gregory Sell, David Snyder, Alan McCree, Daniel Garcia-Romero, Jesús Villalba, Matthew Maciejewski, Vimal Manohar, Najim Dehak, Daniel Povey, Shinji Watanabe, Sanjeev Khudanpur, "Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge", Interspeech, 2018.
Anna Silnova, Niko Brummer, Daniel Garcia-Romero, David Snyder, Lukas Burget, "Fast Variational Bayes for Heavy-tailed PLDA Applied to i-vectors and x-vectors", Interspeech, 2018.
David Snyder, Daniel Garcia-Romero, Alan McCree, Gregory Sell, Daniel Povey, Sanjeev Khudanpur, "Spoken language recognition using x-vectors", Odyssey, 2018. (Best paper award)
Alan McCree, David Snyder, Greg Sell, Daniel Garcia-Romero, "Language Recognition for Telephone and Video Speech: The JHU HLTCOE Submission for NIST LRE17", Odyssey, 2018.
2017
Daniel Garcia-Romero, David Snyder, Gregory Sell, Daniel Povey, Alan McCree, "Speaker diarization using deep neural network embeddings", ICASSP 2017.
David Snyder, Daniel Garcia-Romero, Daniel Povey and Sanjeev Khudanpur, "Deep Neural Network Embeddings for Text-Independent Speaker Verification", Interspeech 2017.
Alan McCree, Gregory Sell, Daniel Garcia-Romero, "Extended Variability Modeling and Unsupervised Adaptation for PLDA Speaker Recognition", Interspeech 2017.
2016
David Snyder, Pegah Ghahremani, Daniel Povey, Daniel Garcia-Romero, Yishay Carmiel, Sanjeev Khudanpur, "Deep neural network-based speaker embeddings for end-to-end speaker verification", SLT 2016.
Daniel Garcia-Romero, Alan McCree, "Stacked Long-Term TDNN for Spoken Language Recognition", Interspeech, 2016.
Gregory Sell, Alan McCree, Daniel Garcia-Romero, "Priors for Speaker Counting and Diarization with AHC", Interspeech, 2016.
Audrey Tong, Craig Greenberg, Alvin Martin, Desire Banse, John Howard, George Doddington, Danilo Romero, Douglas Reynolds, Lisa Mason, Tina Kohler, Jaime Hernandez-Cordero, Elliot Singer, Alan McCree, "Summary of the 2015 NIST Language Recognition i-Vector Machine Learning Challenge", Odyssey, 2016.
Alan McCree, Gregory Sell, Daniel Garcia-Romero, "Augmented Data Training of Joint Acoustic/Phonotactic DNN i-vectors for NIST LRE15", Odyssey, 2016.
2015
Daniel Garcia-Romero, Alan McCree, "Insights into deep neural networks for speaker recognition", Interspeech, 2015.
Gregory Sell, Daniel Garcia-Romero, Alan McCree "Speaker diarization with i-vectors from DNN senone posteriors", Interspeech, 2015.
Alan McCree, Daniel Garcia-Romero, "DNN senone MAP multinomial i-vectors for phonotactic language recognition", Interspeech, 2015.
Désiré Bansé, George Doddington, Daniel Garcia-Romero, John Godfrey, Craig Greenberg, Jaime Hernández-Cordero, John Howard, Alvin Martin, Lisa Mason, Alan McCree, Douglas Reynolds, "Analysis of the second phase of the 2013-2014 i-Vector machine learning challenge", Interspeech, 2015.
Gregory Sell, Daniel Garcia-Romero, "Diarization resegmentation in the factor analysis subspace", ICASSP, 2015.
Jonathan Wintrode, Gregory Sell, Aren Jansen, Michelle Fox, Daniel Garcia-Romero, Alan McCree, "Content-based recommender systems for spoken documents", ICASSP, 2015.
Chandler May, Francis Ferraro, Alan McCree, Jonathan Wintrode, Daniel Garcia-Romero, Benjamin Van Durme, "Topic identification and discovery on text and speech", EMNLP, 2015.
David Snyder, Daniel Garcia-Romero, Daniel Povey, "Time delay deep neural network-based universal background models for speaker recognition", ASRU, 2015.
2014
Daniel Garcia-Romero, Alan McCree, Stephen Shum, Niko Brümmer, Carlos Vaquero, "Unsupervised Domain Adaptation for i-vector Speaker Recognition", Odyssey, 2014. (Best paper award).
Stephen Shum, Douglas Reynolds, Daniel Garcia-Romero, Alan McCree, "Unsupervised Clustering Approaches for Domain Adaptation in Speaker Recognition Systems", Odyssey, 2014. (Best student paper award).
Alan Mccree, Douglas Reynolds, Daniel Garcia-Romero, Tomi Kinnunen, Craig Greenberg, Désiré Bansé, George Doddington, John Godfrey, Alvin Martin, Mark Przybocki, "The NIST 2014 Speaker Recognition i-vector Machine Learning Challenge", Odyssey, 2014.
Désiré Bansé, George R Doddington, Daniel Garcia-Romero, John J Godfrey, Craig S Greenberg, Alvin F Martin, Alan McCree, Mark Przybocki, Douglas A Reynolds, "Summary and initial results of the 2013-2014 speaker recognition i-vector machine learning challenge", Interspeech, 2014.
Daniel Garcia-Romero, Alan McCree, "Supervised domain adaptation for i-vector based speaker recognition", ICASSP, 2014.
Niko Brümmer, Daniel Garcia-Romero, "Generative modelling for unsupervised score calibration", ICASSP, 2014.
Aren Jansen, Daniel Garcia-Romero, Pascal Clark, Jaime Hernandez-Cordero, "Unsupervised idiolect discovery for speaker recognition", ICASSP, 2014.
Gregory Sell, Daniel Garcia-Romero, "Speaker diarization with PLDA i-vector scoring and unsupervised calibration", SLT, 2014.
Daniel Garcia-Romero, Xiaohui Zhang, Alan McCree, Daniel Povey, "Improving speaker recognition performance in the domain adaptation challenge using deep neural networks", SLT, 2014.
2013
Daniel Garcia-Romero, Alan McCree, "Subspace-constrained supervector PLDA for speaker verification", Interspeech 2013.
Balaji Srinivasan, Yuancheng Luo, Daniel Garcia-Romero, Dmitry Zotkin, Ramani Duraiswami, "A symmetric kernel partial least squares framework for speaker recognition", IEEE Transactions on audio, speech, and language processing, 2013.
2012
Daniel Garcia-Romero, Xinhui Zhou, Carol Y Espy-Wilson, "Multicondition training of Gaussian PLDA models in i-vector space for noise and reverberation robust speaker recognition", ICASSP, 2012.
Daniel Garcia-Romero, Xinhui Zhou, D Zotkin, B Srinivasan, Yuancheng Luo, Sriram Ganapathy, Samuel Thomas, S Nemala, Garimella SVS Sivaram, Majid Mirbagheri, SH Mallidi, Thomas Janu, Padmanabhan Rajan, Nima Mesgarani, Mounya Elhilali, Hynek Hermansky, S Shamma, Ramani Duraiswami, "The UMD-JHU 2011 speaker recognition system", ICASSP, 2012.
Xinhui Zhou, Daniel Garcia-Romero, Nima Mesgarani, Maureen Stone, Carol Espy-Wilson, Shihab Shamma, "Automatic intelligibility assessment of pathologic speech in head and neck cancer based on auditory-inspired spectro-temporal modulations", Interspeech, 2012.
2011
Daniel Garcia-Romero, Carol Y Espy-Wilson, "Analysis of i-vector length normalization in speaker recognition systems", Interspeech, 2011.
Xinhui Zhou, Daniel Garcia-Romero, Ramani Duraiswami, Carol Espy-Wilson, Shihab Shamma, "Linear versus mel frequency cepstral coefficients for speaker recognition", ASRU, 2011.
2010 - 2003
Daniel Garcia-Romero, Carol Y Espy-Wilson, "Joint Factor Analysis for Speaker Recognition Reinterpreted as Signal Coding Using Overcomplete Dictionaries", Odyssey, 2010.
Daniel Garcia-Romero, Carol Y Espy-Wilson, "Automatic acquisition device identification from speech recordings", ICASSP, 2010.
Daniel Garcia-Romero, Carol Y Espy-Wilson, "Intersession variability in speaker recognition: A behind the scene analysis", Interspeech, 2008.
Vikramjit Mitra, Daniel Garcia-Romero, Carol Y Espy-Wilson, "Language and genre detection in audio content analysis", Interspeech, 2008.
Vikramjit Mitra, Daniel Garcia-Romero, Carol Y Espy-Wilson, "Language detection in audio content analysis", ICASSP, 2008.
Daniel Garcia-Romero, Julian Fierrez-Aguilar, Joaquin Gonzalez-Rodriguez, Javier Ortega-Garcia, "Using quality measures for multilevel speaker recognition", Computer Speech & Language, 2006.
Julian Fierrez-Aguilar, Daniel Garcia-Romero, Javier Ortega-Garcia, Joaquin Gonzalez-Rodriguez, "Adapted user-dependent multimodal biometric authentication exploiting general information", Pattern Recognition Letters, 2005.
Julian Fierrez-Aguilar, Daniel Garcia-Romero, Javier Ortega-Garcia, Joaquin Gonzalez-Rodriguez, "Bayesian adaptation for user-dependent multimodal biometric authentication", Pattern Recognition, 2005.
Julian Fierrez-Aguilar, Daniel Garcia-Romero, Javier Ortega-Garcia, Joaquin Gonzalez-Rodriguez,"Speaker verification using adapted user-dependent multilevel fusion", International Workshop on Multiple Classifier Systems, 2005.
Julian Fierrez-Aguilar, Daniel Garcia-Romero, Javier Ortega-Garcia, Joaquin Gonzalez-Rodriguez, "Exploiting general knowledge in user-dependent fusion strategies for multimodal biometric verification", ICASSP, 2004
Daniel Garcia-Romero, Julian Fierrez-Aguilar, Joaquin Gonzalez-Rodriguez, Javier Ortega-Garcia, "On the use of quality measures for text-independent speaker recognition", Odyssey, 2004.
Daniel Garcia-Romero, Julian Fierrez-Aguilar, Joaquin Gonzalez-Rodriguez, Javier Ortega-Garcia, "Support vector machine fusion of idiolectal and acoustic speaker information in spanish conversational speech", IEEE ICME, 2003
Joaquin Gonzalez-Rodriguez, Daniel Garcia-Romero, Marta Garcia-Gomar, D Ramos-Castro, Javier Ortega-Garcia, "Robust likelihood ratio estimation in Bayesian forensic speaker recognition", Eurospeech, 2003.
Daniel Garcia-Romero, Joaquin Gonzalez-Rodriguez, Julian Fierrez-Aguilar, Javier Ortega-Garcia, "U-norm likelihood normalization in PIN-based speaker verification systems", AVBPA, 2003.