Georg Heigold

Biography

Georg Heigold received the Diplom degree in physics from ETH Zurich, Zurich, Switzerland, in 2000. He was a Software Engineer at De La Rue, Berne, Switzerland, from
2000 to 2003. From 2004 to 2010, he was a research assistant with the Computer Science Department, RWTH Aachen University, Aachen, Germany. In summer 2008, he
was an intern at Microsoft Research, Redmond, USA. From 2010 until 2015, he was a research scientist in the speech team at Google, Mountain View, USA. Since 2015, he is a research scientist in the multilingual technologies group at DFKI.

Research

Research interests:
  • Machine translation, speech recognition, speaker recognition, etc.
  • Deep learning
  • Statistical modeling and learning
  • Discriminative and log-linear techniques

Publications

Theses

G. Heigold. A Log-Linear Discriminative Modeling Framework for Speech Recognition. PhD Thesis, Aachen, Germany, June 2010.

G. Heigold. A Neutron Spectroscopic Investigation of the Magnetic Excitations in TlCuCl3. Diplomarbeit, Zurich, Switzerland, February 2000.

Journal papers

S.J. Wright, D. Kanevsky, L. Deng, X. He, G. Heigold, and H. Li. Selected Topics in Optimization Algorithms and Applications for Speech and Language Processing. IEEE Transactions on Audio, Speech and Language Processing – Special Issue on Large­-Scale Optimization for Audio, Speech and Language Processing.


G. Heigold, H. Ney, and R. Schlüter. Investigations on an EM-­style optimization algorithm for discriminative training of HMMs. IEEE Transactions on Audio, Speech and Language Processing, 2013.


G. Heigold, H. Ney, R.Schlüter, and S. Wiesler. Discriminative Training for ASR: Modeling, Criteria, Optimization, Implementation, and Performance. IEEE Signal Processing Magazine (Special Issue on Fundamental Technologies in Modern Speech Recognition), 2012.

T. Deselaers, T. Gass, G. Heigold, and H. Ney. Latent Log-Llinear Models for Handwritten Digit Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence. In Press.

B. Hoffmeister, G. Heigold, D. Rybach, R. Schlüter, and H. Ney. WFST Enabled Solutions to ASR Problems: Beyond HMM Decoding. Audio, Speech, and Language Processing, IEEE Transcations on, vol. PP, July 2011.

P. Dreuw, G. Heigold, and H. Ney. Confidence and Margin-Based MMI/MPE Discriminative Training for Offline Handwriting Recognition. International Journal on Document Analysis and Recognition, accepted for publication, Aachen, Germany, March 2011. DOI 10.1007/s10032-011-0160-x.

G. Heigold, H. Ney, P. Lehnen, T. Gass, and R. Schlüter. Equivalence of generative and log-linear models. Transactions on Audio, Speech, and Language (TASL), to appear, 2011.

G. Heigold, P. Dreuw, S. Hahn, R. Schlüter, and H. Ney. Margin-based discriminative training for string recognition. Journal of Selected Topics in Signal Processing - Statistical Learning Methods for Speech and Language Processing, December 2010.

P. Nguyen, G. Heigold, and G. Zweig. Speech Recognition with Flat Direct Models. Journal of Selected Topics in Signal Processing - Statistical Learning Methods for Speech and Language Processing, December 2010.

T. Deselaers, G. Heigold, and H. Ney. Object classification by fusing SVMs and Gaussian mixtures. Pattern Recognition, volume 43, number 7, pages 2476-2484, July 2010.

Book chapters

G. Heigold, T. Deselaers, R. Schlüter, H. Ney, G. Saon, and D. Povey. Integration of large margin concept into standard discriminative training criteria. In J. Olive, C. Christianson, and J. McCary: Handbook of Natural Language processing and Machine Translation: DARPA Global Autonomous Language Exploitation (GALE), Springer, New York, USA, 2011.

P. Dreuw, D. Rybach, G. Heigold, and H. Ney. RWTH OCR: A large vocabulary optical character recognition system for Arabic scripts. In Volker Märgner, and Haikal El Abed: Guide to OCR for Arabic Scripts Chp. Book Chapter, Springer, London, UK, April 2011.

Conference proceedings

Research scientist, Google, USA

G. Heigold, I. Moreno, S. Bengio, N. Shazeer. End-to-end text-dependent speaker verification. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Shanghai, China, March, 2016.


E. Variani, E. McDermott, G. Heigold. A Gaussian mixture model layer jointly optimized with discriminative features within a deep neural network architecture. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brisbane, Australia, April, 2015.


M. Bacchiani, A. Senior, G. Heigold. Asynchronous, online, GMM-free training of a context-dependent acoustic model for speech recognition. In Proceedings of the International Conference on Spoken Language Processing (Interspeech), Singapore, September 2014.


E. McDermott, G. Heigold, P. Moreno, A. Senior, M. Bacchiani. Asynchronous stochastic optimization for sequence training of deep neural networks: Towards big data. In Proceedings of the International Conference on Spoken Language Processing (Interspeech), Singapore, September 2014.


H. Sak, O. Vinyals, G. Heigold, A. Senior, E. McDermott, R. Monga, M. Mao. Sequence discriminative distributed training of long short-term memory recurrent neural networks. In Proceedings of the International Conference on Spoken Language Processing (Interspeech), Singapore, September 2014.


S. Bengio and G. Heigold. Word embeddings for speech recognition. In Proceedings of the International Conference on Spoken Language Processing (Interspeech), Singapore, September 2014.


G. Heigold, E. McDermott, V. Vanhoucke, A. Senior, and M. Bacchiani. Asynchronous stochastic optimization for sequence training of deep neural networks. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence, Italy, May, 2014.


G. Chen, C. Parada, and G. Heigold. Small-­footprint keyword spotting using deep neural networks. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence, Italy, May, 2014.


A. Senior, G. Heigold, M. Bacchiani, and H. Liao. GMM-­free DNN training. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence, Italy, May, 2014.


G. Heigold, V. Vanhoucke, A. Senior, P. Nguyen, M. Ranzato, M. Devin, and J. Dean. Multilingual acoustic models using distributed deep neural networks. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada, May, 2013.


V. Vanhoucke, M. Devin, and G. Heigold. Multiframe deep neural networks for acoustic modeling. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada, May, 2013.


X. Lei, H. Lin, and G. Heigold. Deep neural networks with auxiliary Gaussian mixture models for real­-time speech recognition. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada, May, 2013.


A. Senior, G. Heigold, M. Ranzato, and K. Yang. An empirical study of learning rates in deep neural networks for speech recognition. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada, May, 2013.

G. Heigold, P. Nguyen, M. Weintraub, and V. Vanhoucke. Investigations on exemplar-based features for speech recognition towards thousands of hours of unsupervised, noisy data. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan, March, 2012.

PhD student, RWTH Aachen
D. Kanevsky, G. Heigold, S. Wright, and H. Ney. Overview of large scale optimization for discriminative training in speech recognition. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan, March, 2012.

G. Heigold. Eine Formulierung für den log-linearen, diskriminativen Ansatz in der Spracherkennung. In Ausgezeichnete Informatikdissertationen 2010. Gesellschaft für Informatik (GI), 2011.

G. Heigold, S. Hahn, P. Lehnen, and H. Ney. EM-style optimization of hidden conditional random fields for grapheme-to-phoneme conversion. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Prague, Czech Republic, May, 2011.

S. Wiesler, G. Heigold, M. and Nußbaum-Thom, R. Schlüter, and H. Ney. A discriminative splitting criterion for phonetic decision trees. In Interspeech, Makuhari, Japan, September 2010.

G. Heigold, S. Wiesler, M. Nussbaum, P. Lehnen, R. Schlüter, and H. Ney. Discriminative HMMs, log-linear models, and CRFs: What is the Difference?. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Dallas, Texas, USA, March 2010.

S. Wiesler, M. Nußbaum-Thom, G. Heigold, R. Schlüter, and H. Ney. Investigations on Features for Log-Linear Acoustic Models in Continuous Speech Recognition. In IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Merano, Italy, December 2009.

M. Tahir, G. Heigold, C. Plahl, R. Schlüter, and H. Ney. Log-Linear Framework for Linear Feature Transformations in Speech Recognition. In IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Merano, Italy, December 2009.

G. Heigold, D. Rybach, R. Schlüter, and H. Ney. Investigations on convex optimization using log-linear HMMs for digit string recognition. In Proceedings of the International Conference on Spoken Language Processing (Interspeech), Brighton, England, September 2009.

S. Hahn, P. Lehnen, G. Heigold, and H. Ney. Optimizing CRFs for SLU tasks in various languages using modified training criteria. In Proceedings of the International Conference on Spoken Language Processing (Interspeech), Brighton, England, September 2009.

D. Rybach, C. Gollan, G. Heigold, B. Hoffmeister, J. Lööf, R. Schlüter, and H. Ney. The RWTH Aachen University open source speech recognition system. In Proceedings of the International
Conference on Spoken Language Processing (Interspeech), Brighton, England, September 2009.

C. Plahl, B. Hoffmeister, G. Heigold, J. Lööf, R. Schlüter, and H. Ney. Development of the GALE 2008 Mandarin LVCSR system. In Proceedings of the International Conference on Spoken Language Processing (Interspeech), Brighton, England, September 2009.

P. Dreuw, G. Heigold, and H. Ney. Confidence-based discriminative training for model adaptation in offline Arabic handwriting recognition. In International Conference on Document Analysis and Recognition (ICDAR), Barcelona, Spain, July 2009.

G. Heigold, R. Schlüter, and H. Ney. Modified MPE/MMI in a transducer-based framework. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan, April 2009.

G. Heigold, G. Zweig, X. Li, and P. Nguyen. A flat direct model for speech recognition. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan, April 2009.

T. Deselaers, G. Heigold, and H. Ney. SVMs, Gaussian mixtures, and their generative/discriminative fusion. In International Conference on Pattern Recognition (ICPR), Tampa, Florida, USA, December 2008.

G. Heigold, P. Lehnen, R. Schlüter, and H. Ney. On the equivalence of Gaussian and log-linear HMMs. In Proceedings of the International Conference on Spoken Language Processing (Interspeech), Brisbane, Australia, September 2008.

C. Plahl, B. Hoffmeister, M. Hwang, D. Lu, G. Heigold, J. Lööf, R. Schlüter, and H. Ney. Recent improvements of the RWTH GALE Mandarin LVCSR system. In Interspeech, Brisbane, Australia, September 2008.

G. Heigold, T. Deselaers, R. Schlüter, and H. Ney. Modified MMI/MPE: A direct evaluation of the margin in speech recognition. In International Conference on Machine Learning (ICML), Helsinki, Finland, July 2008.

G. Heigold, T. Deselaers, R. Schlüter, and H. Ney. GIS-like estimation of log-linear models with hidden variables. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, NV, USA, April 2008.

B. Hoffmeister, C. Plahl, P. Fritz, G. Heigold, J. Lööf, R. Schlüter, and H. Ney. Development of the 2007 RWTH Mandarin LVCSR system. In IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Kyoto, Japan, December 2007. G. Heigold, R. Schlüter, and H. Ney. On the equivalence of Gaussian HMM and Gaussian HMM-like hidden conditional random fields. In Proceedings of the International Conference on Spoken Language Processing (Interspeech), Antwerp, Belgium, August 2007.

T. Deselaers, G. Heigold, and H. Ney. Speech recognition with state-based nearest neighbour classifiers. In Proceedings of the International Conference on Spoken Language Processing (Interspeech), Antwerp, Belgium, August 2007.

J. Lööf, C. Gollan, S. Hahn, G. Heigold, B. Hoffmeister, C. Plahl, D. Rybach, R. Schlüter, and H. Ney. The RWTH 2007 TC-STAR evaluation system for European English and Spanish. In Proceedings of the International Conference on Spoken Language Processing (Interspeech), Antwerp, Belgium, August 2007.

J. Lööf, M. Bisani, C. Gollan, G. Heigold, B. Hoffmeister, C. Plahl, R. Schlüter, and H. Ney. The 2006 RWTH parliamentary speeches transcription system. In Proceedings of the International Conference on Spoken Language Processing (Interspeech), Pittsburgh, PA, September 2006.

J. Lööf, M. Bisani, C. Gollan, G. Heigold, B. Hoffmeister, C. Plahl, R. Schlüter, and H. Ney. The 2006 RWTH parliamentary speeches transcription system. In TC-STAR Workshop on Speech-to-Speech Translation, Barcelona, Spain, June 2006.

G. Heigold, W. Macherey, R. Schlüter, and H. Ney. Minimum exact word error training. In IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), San Juan, Puerto Rico, November 2005.

Community

Co-organizer (with D. Kanevsky, X. He, H. Li, H. Ney, S. Wright) of Special Issue on Large-Scale Optimization for Audio, Speech, and Language ProcessingIEEE Transactions on Audio, Speech, and Language Processing.
Co-organizer (with F. Bach, M. Wainwright, L. Deng, F. Sha, H. Ney, P. Olsen) of workshop on Log-Linear Models, NIPS, Lake Tahoe, USA, December, 2012.
Co-organizer (with D. Kanevsky, S. Wright, and H. Ney) of special session on Large-Scale Optimization for Signal Processing and Speech Recognition, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan, March, 2012.


Honors

IEEE ICASSP 2015 Best Student Paper Award (with E. Variani, E. McDermott)

Borchers-Plakette for PhD thesis

GOOGLE best student paper award, IEEE ASRU 2009 (with S. Wiesler, M. Nussbaum, R. Schlüter, and H. Ney)

ISCA best student paper award, Interspeech 2008 (with P. Lehnen, R. Schlüter, and H. Ney)
Subpages (1): [Untitled]