My Publications

This is no longer my official publications page and will eventually become out of date.

These are in reverse order of time.


"Krylov Subspace Descent for Deep Learning", Oriol Vinyals and D. Povey, AISTATS 2012 (pdf)

"Generating exact lattices in the WFST framework", D. Povey, M. Hannemann et. al, ICASSP 2012 (pdf)

"Revisiting Semi-continuous Hidden Markov Models", K. Reidhammer, T. Bocklet, A. Ghoshal and D. Povey, ICASSP 2012  (pdf)

"Modeling Gender Dependency in the Subspace GMM Framework", Ngoc Thang Vu, Tanja Schultz and D. Povey, ICASSP 2012 (pdf)

"Revisiting Recurrent Neural Networks for Robust ASR", Oriol Vinyals, Suman V. Ravuri, Daniel Povey, ICASSP 2012 (pdf)


"The Kaldi Speech Recognition Toolkit", D. Povey, A. Ghoshal et. al, ASRU 2011 (accepted) (pdf)

"Speaker Adaptation with an Exponential Transform", Daniel Povey, Geoffrey Zweig and Alex Acero, ASRU 2011 (accepted) (pdf) (+techreport)

"The Subspace Gaussian Mixture Model– a Structured Model for Speech Recognition", D. Povey, Lukas Burget et. al Computer Speech and Language, 2011 (pdf)

"A basis representation of constrained MLLR transforms for robust adaptation", Daniel Povey and Kaisheng Yao, Computer Speech and Language, 2011. (pdf)

"Minimum Bayes Risk decoding and system combination based on a recursion for edit distance", Haihua Xu, Daniel Povey, Lidia Mangu and Jie Zhu, Computer Speech and Language, 2011. (pdf)

"A Basis Method for Robust Estimation of Constrained MLLR", Daniel Povey and Kaisheng Yao, ICASSP 2011 (pdf)

"A Symmetrization of the Subspace Gaussian Mixture Model", Daniel Povey, Martin Karafiat, Arnab Ghoshal, Petr Schwarz, ICASSP 2011 (pdf)

"State-Level Data Borrowing for Low-Resource Speech Recognition Based on Subspace GMMs", Yanmin Qian, Daniel Povey and Jia Lu, Interspeech 2011 (pdf)


The Symmetric Subspace Gaussian Mixture Model: Microsoft Research technical report MSR-TR-2010-138 (pdf)

"Subspace Gaussian Mixture Models for Speech Recognition", D. Povey, Lukas Burget et al., ICASSP 2010. (pdf)

"A Novel Estimation of feature-space MLLR for Full-covariance Models", Arnab Ghoshal, D. Povey et al., ICASSP 2010 (pdf)

"An Improved Consensus-like Method for Minimum Bayes Risk Decoding and Lattice Combination", Haihua Xu, D. Povey, L. Mangu, Jie Zhu,  ICASSP 2010 (pdf)

"Multilingual Acoustic Modeling For Speech Recognition Based On Subspace Gaussian Mixture Models", Lukas Burget, Petr Schwarz et. al, ICASSP 2010 (pdf)

"Approaches To Automatic Lexicon Learning With Limited Training Examples", Nagendra Goel, Samuel Thomas et. al, ICASSP 2010 (pdf)

"Computational Issues In Principal Components Analysis", D. Povey and Ariya Rastrow, rejected, ICASSP 2010 (pdf) . This paper is tutorial in nature and not really novel, the point was to describe a practical way to do fast PCA using the Lanczos method, that's available via Web search (these things tend to be described only in hard-to-obtain books).

Stephen Chu, Daniel Povey et al., “The 2009 IBM GALE Mandarin Broadcast News Transcription System”, ICASSP 2010

Hagen Soltau, George Saon et al., “The IBM 2008 GALE Arabic Speech Transcription System”, ICASSP 2010.

Stephen Chu and Daniel Povey, “Speaking Rate Adaptation using Continuous Frame Rate Normalization”, ICASSP 2010 (pdf)

"Approaches to Speech Recognition based on Speaker Recognition Techniques", chapter in forthcoming GALE book (pdf)


For closing presentations from JHU 2009 workshop, see here

"A Tutorial-Style Introduction To Subspace Gaussian Mixture Models For Speech Recognition", Microsoft Research technical report MSR-TR-2009-111 (pdf)

Lecture on "estimation techniques in speech recognition" given at JHU CLSP summer school (pdf)

Lecture on "Subspace based/Universal Background Model (UBM) based speech modeling" given at JHU CLSP summer school (pdf)

Lab tutorial on estimation for speech in Octave/Matlab, given at JHU CLSP summer school (pdf)

``Subspace Gaussian Mixture Models for Speech Recognition'', Povey D., Microsoft Research technical report  MSR-TR-2009-64 (pdf)

"Minimum Hypothesis Phone Error as a Decoding Method for Speech Recognition", Haihua Xu, Daniel Povey, Jie Zhu and Guanyong Wu, Interspeech 2009 (pdf) (slides,pdf)


Dan Povey & Brian Kingsbury, "Monte Carlo Model-Space Noise Adaptation for Speech Recognition", Interspeech 2008 (pdf) 

Daniel Povey, Hong-Kwang J. Kuo, Hagen Soltau, "Fast Speaker Adaptive Training for Speech Recognition", Interspeech 2008 (pdf)  

 Daniel Povey, Hong-Kwang J. Kuo, "XMLLR for Improved Speaker Adaptation in Speech Recognition", Interspeech 2008 (pdf) 

George Saon and Daniel Povey, "Penalty Function Maximization for Large Margin HMM Training", Interspeech 2008 (pdf)

Daniel Povey, Dimitri Kanevsky, Brian Kingsbury, Bhuvana Ramabhadran, George Saon & Karthik Visweswariah, “Boosted MMI for Model and Feature Space Discriminative Training”, ICASSP 2008 (pdf)

Balakrishnan Varadarajan & Daniel Povey, “Quick FMLLR for Speaker Adaptation in Speech Recognition”, ICASSP 2008 (pdf)

Daniel Povey, Stephen M Chu & Balakrishnan Varadarajan, “Universal Background Model Based Speech Recognition”, ICASSP 2008 (pdf)


Daniel Povey & Brian Kingsbury, "Evaluation of Proposed Modifications to MPE for Large Scale Discriminative Training", ICASSP 2007 (pdf)


D. Povey & George Saon, "Feature and model space speaker adaptation with full covariance Gaussians," ICSLP 2006. (pdf)

D. Povey, "SPAM and full covariance for speech recognition," ICSLP 2006. (pdf)

J. Pelecanos, Daniel Povey, Ganesh Ramaswamy, "Secondary Classification for GMM Based Speaker Recognition," ICASSP 2006.  (pdf)

Ghinwa Choueiter, Daniel Povey, Stanley Chen & Geoffrey Zweig, "Morpheme-based language. modeling for Arabic LVCSR", ICASSP 2006. (pdf) 

 Geoffrey Zweig, Olivier Siohan, George Saon, Bhuvana Ramabhadran, Daniel Povey, Lidia Mangu and Brian Kingsbury, "Automated Quality Monitoring in the Call Center with ASR and Maximum Entropy", ICASSP 2006.  (pdf)

Stanley Chen, Brian Kingsbury, Lidia Mangu, Daniel Povey, George Saon, Hagen Soltau & Geoffrey Zweig, "Advances in Speech Transcription at IBM under the DARPA EARS Program," 2006, IEEE Transactions on Audio, Speech and Language processing, Vol. 14 , Issue 5, pp. 1596-1608 (pdf)


George Saon, Daniel Povey & Geoffrey Zweig, "Anatomy of an extremely fast LVCSR decoder," Interspeech 2005. (pdf)(poster,pdf)

Hagen Soltau, Brian Kingsbury, Lidia Mangu, Daniel Povey, George Saon & Geoffrey Zweig, "The IBM 2004 Conversational Telephony System for Rich Transcription," ICASSP 2005 (pdf)

Daniel Povey, Brian Kingsbury, Lidia Mangu, George Saon, Hagen Soltau & Geoffrey Zweig, "fMPE: Discriminatively Trained Features for Speech Recognition," ICASSP 2005 (pdf)

Jing Huang & Daniel Povey, "Discriminatively Trained Features using fMPE for Multi-Stream Audio-Visual Speech Recognition," Interspeech 2005 (pdf)

Hain, T. Woodland, P.C. Evermann, G. Gales, M.J.F. Xunying Liu Moore, G.L. Povey, D. Lan Wang, "Automatic transcription of conversational telephone speech", IEEE Trans on Speech and Audio Procesing, Nov. 2005, vol. 3, Issue 6, pp. 1173-1185. (pdf)

Daniel Povey, "Improvements to fMPE for discriminative training of features," Interspeech 2005 (pdf)


Daniel Povey, Brian Kingsbury, Lidia Mangu, George Saon, Hagen Soltau, Geoffrey Zweig, "fMPE: Discriminatively trained features for speech recognition," RT'04 meeting, 2004. (pdf)

D. Povey, "Phone Duration Modeling for LVCSR," ICASSP 2004 (pdf)

Saon, G. Dharanipragada, S. Povey, D. "Feature space Gaussianization", ICASSP 2004. (pdf)


Roongroj Nopuswanchai & D. Povey, "Discriminative training for HMM-based offline handwritten character recognition", Proc. Int'l Conf. on Document Analysis and Recognition, 2003. (pdf)

D. Povey, M.J.F. Gales, D.Y. Kim & P.C. Woodland, "MMI-MAP and MPE-MAP for Acoustic Model Adaptation," Eurospeech 2003. (pdf) (slides,pdf)

Daniel Povey, "Recent work on Discriminative Training," Talk given to one day meeting for young speech researchers, London, Apr 24th 2003 (pdf)

Daniel Povey, "Discriminative Training for Large Vocabulary Speech Recognition," PhD thesis, Cambridge University Engineering Dept, 2003 (pdf)

D. Povey, P.C. Woodland, and M.J.F. Gales. Discriminative MAP for Acoustic Model Adaptation. In Proc. ICASSP, 2003. (ps)

Daniel Povey, "Minimum Phone Error - Better than MMI," talk given at IBM, 2003 (pdf)

M.J.F. Gales, Y. Dong, D. Povey and P.C. Woodland. "Porting: SwitchBoard to the VoiceMail Task." ICASSP 2003. (ps)


D. Povey & P.C. Woodland, "Minimum Phone Error and I-Smoothing for Improved Discrimative Training," ICASSP 2002 (pdf) (slides,long version,pdf) (slides,short version,pdf)

Phil Woodland, Gunnar Evermann, Mark Gales, Thomas Hain, Andrew Liu, Gareth Moore, Dan Povey & Lan Wang: "CU-HTK April 2002 Switchboard System", Rich Transcription Workshop 2002. (pdf)

Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P., The HTK book (for HTK version 3.2). Technical Report, Cambridge University, Engineering Department, 2002. (pdf)


T. Hain, P.C. Woodland, G. Evermann and D. Povey. "New features in the CU-HTK system for transcription of conversational telephone speech", ICASSP 2001. (pdf)

D. Povey & P.C. Woodland, "Improved Discriminative Training Techniques for Large Vocabulary Speech Recognition," ICASSP 2001 (pdf)


D. Povey and P. C. Woodland, "Large-scale MMIE Training for Conversational Telephone Speech Recognition", Proc. NIST Speech Transcription Workshop, College Park, MD, 2000. (pdf)

Woodland, P.C and Povey, D. "Large Scale Discriminative Training for Speech Recognition", in ASR 2000. (pdf)


D. Povey & P.C. Woodland, "Frame Discrimination Training of HMMs for Large Vocabulary Speech Recognition," Technical report, Cambridge University Engineering Dept., 1999. (pdf)

D. Povey & P.C. Woodland, "Frame Discrimination training of HMMs for Large Vocabulary Speech Recognition," ICASSP 1999 (pdf) (slides,pdf)

Daniel Povey, "Implementation of Frame Discrimination on a large task," MPhil thesis, Cambridge University Engineering Dept, 1999 (pdf)


(circa 2003) Unfinished paper on MPE which was to be submitted to the IEEE transactions, has some otherwise unpublished experiments on a different (but not-any-better) way to compute the Levenshtein distance in the MPE computation (pdf)

(circa 2005) Unfinished notes on HLDA-MMI, never published, probably totally wrong, (pdf)