# Tara Sainath

## Biography

Dr. Tara Sainath holds an S.B., M.Eng, and PhD in Electrical Engineering and Computer Science from MIT. She has many years of experience in speech recognition and deep neural networks, including 5 years at IBM T.J. Watson Research Center, and more than 10 years at Google. She is currently a Distinguished Research Scientist and the co-lead of the Gemini Audio Pillar at Google DeepMind. There, she focuses on the integration of audio capabilities with large language models (LLMs).

Her technical prowess is recognized through her IEEE and ISCA Fellowships, and awards such as the 2021 IEEE SPS Industrial Innovation Award and the 2022 IEEE SPS Signal Processing Magazine Best Paper Award. She has served as a member of the IEEE Speech and Language Processing Technical Committee (SLTC) as well as the Associate Editor for IEEE/ACM Transactions on Audio, Speech, and Language Processing. Dr. Sainath's leadership is exemplified by her roles as Program Chair for ICLR (2017, 2018) and her extensive work co-organizing influential conferences and workshops, including: Interspeech (2010, 2016, 2019), ICML (2013, 2017), and NeurIPS 2020. Her primary research interests are in deep neural networks for speech and audio processing.

## Invited Talks

End-to-End Speech Recognition: The Journey from Research to Production, Invited Talk, EEML Summer School, July 2022. (also given at SANE - October 2022; Nvidia - November 2022)

End-to-End Speech Recognition, Invited Talk, Samsung AI Forum, October 2020.

Towards End-to-End Speech Recognition, Invited Talk, LxMLS Workshop, July 2019.

Towards End-to-End Speech Recognition, Interspeech Tutorial, September 2018.

Multichannel Raw-Waveform Neural Network Acoustic Models toward Google HOME, ASRU Keynote, December 2017.

Multichannel Raw-Waveform Neural Network Acoustic Models, Invited Talk at NIPS End-to-End Workshop, December 2016.

Towards End-To-End Speech Recognition Using Deep Neural Networks, Invited Talk at ICML Deep Learning Workshop, July 2015.

Advancements in Deep Learning, SLT Keynote, December 2014.

Talks on Deep Learning at WiSSAP Winter School on Speech and Audio Processing, January 2014.

Deep Neural Networks for Acoustic Modeling

Improvements to Deep Neural Networks for Large Vocabulary Continuous Speech Recognition Tasks

Techniques for Improving Training Time of Deep Neural Networks with Applications to Speech Recognition

Deep Learning Talk, Speech and Audio in the Northeast (SANE) Workshop, October 2012

## Publications

### 2023

T.N. Sainath, R. Prabhavalkar, D. Caseiro, P. Rondon, C. Allauzen, "Improving Contextual Biasing with Text Injection," in Proc. ICASSP, 2023.

C. Yang, B. Li, Y. Zhang, N. Chen, R. Prabhavalkar, T.N. Sainath, "From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition," in Proc. ICASSP, 2023.

C. Peyser, M. Picheny, K. Cho, T.N. Sainath, W.R. Huang, R. Prabhavalkar, "A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale," in Proc. ICASSP, 2023.

W.R. Huang, S. Chang, T.N. Sainath, Y. He, D. Rybach, R. David, R. Prabhavalkar, C. Allauzen, C. Peyser, T. Strohman, "E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model," in Proc. ICASSP, 2023.

W. Wang, D. Zhao, S. Ding, H. Zhang, S. Chang, D. Rybach, T.N. Sainath, Y. He, I. McGraw, S. Kumar, "Multi-output RNN-T Joint Networks for Multi-task Learning of {ASR} and Auxiliary Tasks," in Proc. ICASSP, 2023.

S. Chang, C. Zhang, T.N. Sainath, B. Li, T. Strohman, "Context-Aware End-to-End ASR Using Self-Attentive Embedding and Tensor Fusion," in Proc. ICASSP, 2023.

C. Yang, B. Li, Y. Zhang, N. Chen, T.N. Sainath, S. Siniscalchi, C. Lee, "A Quantum Kernel Learning Approach to Low-Resource Spoken Command Recognition," in Proc. ICASSP, 2023.

R. Botros, R. Prabhavalkar, J. Schalkwyk, C. Chelba, T.N. Sainath, "Lego-Features: Exporting modular encoder features for streaming and deliberation ASR," in Proc. ICASSP, 2023.

S.M. Hernandez, D. Zhao, S. Ding, A Bruguier, R. Prabhavalkar, T.N. Sainath, Y. He, I. McGraw, "Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models," in Proc. ICASSP, 2023.

Z. Meng, W. Wang, R. Prabhavalkar, T.N. Sainath, T. Chen, E. Variani, Y. Zhang, B. Li, A. Rosenberg, B. Ramabhadran, "JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition," in Proc. ICASSP, 2023.

Z. Huo, K. Sim, B. Li, D. Hwang, T.N. Sainath, T. Strohman, "Resource-Efficient Transfer Learning From Speech Foundation Model Using Hierarchical Feature Fusion," in Proc. ICASSP, 2023.

K. Hu, T.N. Sainath, B. Li, N. Du, Y. Huang, A. Dai, Y. Zhang, R. Cabrera, Z. Chen, T. Strohman, "Massively Multilingual Shallow Fusion with Large Language Models," in Proc. ICASSP, 2023.

B. Li, D. Hwang, Z. Huo, J. Bai, G. Arumugam, T.N. Sainath, K. Sim, Y. Zhang, W. Han, T. Strohman, F. Beaufays, "Efficient Domain Adaptation for Speech Foundation Models," in Proc. ICASSP, 2023.

C. Zhang, B. Li, T.N. Sainath, T. Strohman, S. Chang, "UML: A Universal Monolingual Output Layer for Multilingual ASR," in Proc. ICASSP, 2023.

T.N. Sainath, R. Prabhavalkar, A. Bapna, Y. Zu, Z. Huo, Z. Chen, B. Li, W. Wang and T. Strohman, "JOIST: A Joint Speech and Text Streaming Model for ASR," in Proc. SLT, 2023.

S. Bijwadia, S. Chang, B. Li, T.N. Sainath, C. Zhang, Y. He, "Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems," in Proc. SLT, 2023.

T. Munkhdalai, Z. Wu, G. Pundak, K. Sim, J. Li, P. Rondon, T. N. Sainath, "NAM+: Towards Scalable End-to-End Contextual Biasing for Adaptive ASR," in Proc. SLT, 2023.

S. Mavandadi, B. Li, C. Zhang, B. Farris, T.N. Sainath, T. Strohman, "A Truly Multilingual First Pass and Monolingual Second Pass Streaming On-Device ASR System," in Proc. SLT, 2023.

K. Hu, B. Li, T. N. Sainath, "Scaling Up Deliberation for Multilingual ASR," in Proc. SLT, 2023.

C. Peyser, W.R. Huang, T.N. Sainath, R. Prabhavalkar, M. Picheny, K. Cho, "Dual Learning for Large Vocabulary On-Device ASR," in Proc. SLT, 2023.

### 2022

S. Chang, B. Li, T.N. Sainath, C. Zhang, T. Strohman, Q. Liang, Y. He, "Turn-Taking Prediction for Natural Conversational Speech," in Proc. Interspeech, 2022.

S. Chang, G. Prakash, Z. Wu, T.N. Sainath, B. Li, Q. Liang, A. Stambler, S. Upadhyay, M. Faruqui, T. Strohman, "Streaming Intended Query Detection Using E2E Modeling for Continued Conversation," in Proc. Interspeech, 2022.

B. Li, T.N. Sainath, R. Pang, S. Chang, Q. Xu, T. Strohman, V. Chen, Q. Liang, H. Liu, Y. He, P. Haghani, S. Bidichandani, "A Language Agnostic Multilingual Streaming On-Device ASR System," in Proc. Interspeech, 2022.

C. Zhang, B. Li, T.N. Sainath, T. Strohman, S. Mavandadi, S. Chang, P. Haghani, "Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification," in Proc. Interspeech, 2022.

C. Peyser, W.R. Huang, A. Rosenberg, T.N. Sainath, M. Picheny, K. Cho, "Towards Disentangled Speech Representations," in Proc. Interspeech, 2022.

K. Hu, T.N. Sainath, Y. He, R. Prabhavalkar, "Improving Deliberation by Text-Only and Semi-Supervised Training," in Proc. Interspeech, 2022.

W. Ronny Huang, Shuo-Yiin Chang, David Rybach, Tara Sainath, Rohit Prabhavalkar, Cal Peyser, Zhiyun Lu, "E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR," in Proc. Interspeech, 2022.

T.N. Sainath, Y. He, A. Narayanan, R. Botros, W. Wang, D. Qiu, C.C. Chiu, R. Prabhavalkar, A. Gruenstein, A. Gulati, B. Li, D. Rybach, E. Guzman, I. McGraw, J. Qin, K. Choromanski, Q. Liang, R. David, R. Pang, S. Chang, T. Strohman, W.R. Huang, W. Han, Y. Wu, Y. Zhang, "Improving the Latency and Quality of Cascaded Encoder," in Proc. ICASSP, 2022.

B. Li, Ruoming Pang, Yu Zhang, Tara Sainath, Trevor Strohman, Parisa Haghani, Yun Zhu, Brian Farris, Neeraj Gaur, Manasa Prasad, "Massively Multilingual ASR: A Lifelong Learning Solution," in Proc. ICASSP, 2022.

C. Zhang, B. Li, Z. Lu, T.N. Sainath, S. Chang, "Improving the Fusion of Acoustic and Text Representations in RNN-T," in Proc. ICASSP, 2022.

J. Bai, B. Li, Y. Zhang, A. Bapna, N. Siddhartha, K. Sim, T.N. Sainath, "Joint Unsupervised and Supervised Training for Multilingual ASR," in Proc. ICASSP, 2022.

W. Wang, K. Hu, T.N. Sainath, "Deliberation of Streaming RNN-Transducer by Non-Autoregressive Decoding," in Proc. ICASSP, 2022.

K. Hu, T.N. Sainath, A. Narayanan, R. Pang, T. Strohman, "Transducer-Based Streaming Deliberation For Cascaded Encoders," in Proc. ICASSP, 2022.

### 2021

B. Li, Ruoming Pang, T. N. Sainath, A. Gulati, Y. Zhang, J. Qin, P. Haghani, W.R. Huang, M. Ma, J. Bai, "Scaling End-to-End Models for Large-Scale Multilingual ASR," in Proc. ASRU, 2021.

T.N. Sainath, Y. He, A. Narayanan, R. Botros, R. Pang, D. Rybach, C. Allauzen, E. Variani, J. Qin, Q. Le-The, S. Chang, B. Li, A. Gulati, J. Yu, C.C. Chiu, D. Casiero, W. Li, Q. Liang, P. Rondon, “An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling”, in Proc. Interspeech, 2021.

S. Mavandadi, T.N. Sainath, K. Hu and Z. Wu, "A Deliberation-Based Joint Acoustic and Text Decoder," in Proc. Interspeech, 2021.

W.R. Huang, T.N. Sainath, C. Peyser, S. Kumar, D. Rybach, T. Strohman, "Lookup-Table Recurrent Language Modeling for Long Tail Speech Recognition," in Proc. Interspeech, 2021.

R. Botros, T.N. Sainath, R. David, E. Guzman, W. Li, Y. He, "Tied and Reduced RNN-T Decoder," in Proc. Interspeech, 2021.

P. Wang, T.N. Sainath, R.J. Weiss, "Multitask Training with Text Data for End-to-End Speech Recognition," in Proc. Interspeech, 2021.

B. Li, A. Gulati, J. Yu, T.N. Sainath, C.C. Chiu, A. Narayanan, S. Chang, R. Pang, Y. He, J. Qin, W. Han, Q. Liang, Y. Zhang, T. Strohman, Y. Wu, "A Better and Faster End-to-End Model for Streaming ASR," in Proc. ICASSP, 2021.

A. Narayanan, T.N. Sainath, R. Pang J. Yu, C.C Chiu, R. Prabhavalkar, E. Variani, T. Strohman, "Cascade Encoders for Unifying Streaming and Non-streaming ASR," in Proc. ICASSP, 2021.

J. Yu, C.C. Chiu, B. Li, S. Chan, T.N. Sainath, Y. He, A. Narayanan, W. Han, A. Gulati, Y. Wu, R. Pang, "FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization," in Proc. ICASSP, 2021.

R. Prabhavalkar, Y. He, D. Rybach, S. Campbell, A. Narayanan, T. Strohman, T.N. Sainath, "Less is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging," in Proc. ICASSP, 2021.

H. Shrivastava, A. Garg, Y. Cao, Y. Zhang, T.N. Sainath, "Echo State Speech Recognition," in Proc. ICASSP, 2021.

D. Qiu, Q. Li, Y. He, Y. Zhang, B. Li, L. Cao, R. Prabhavalkar, D. Bhatia, W. Li, K. Hu, T.N. Sainath, I. McGraw, "Learning Word-Level Confidence for Subword End-to-End ASR," in Proc. ICASSP, 2021.

J. Yu, W. Han, A. Gulati, C.C. Chiu, B. Li, T.N. Sainath, Y. Wu, R. Pang, "Universal ASR: Unify and Improve Streaming ASR with Full-context Modeling," in Proc. ICLR, 2021.

K. He, R. Pang, T. N. Sainath and T. Strohman "Transformer Based Deliberation for Two-Pass Speech Recognition," in Proc. SLT 2021.

C. Chiu, A. Narayanan, W. Han, R. Prabhavalkar, Y. Zhang, N. Jaitly, R. Pang, T.N. Sainath, P. Nguyen, L. Cao, Y. Wu, "RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions," in Proc. SLT 2021.

### 2020

T.N. Sainath, R. Pang, D. Rybach, B. Garcia and T. Strohman, "Emitting Word Timings with End-to-End Models," in Proc. Interspeech 2020.

C. Peyser, S. Mavandadi, T.N. Sainath, J. Apfel, R. Pang, S. Kumar, "Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus," in Proc. Interspeech, 2020.

S. Chang, B. Li, D. Rybach, Y. He, W. Li, T.N. Sainath and T. Strohman, "Low Latency Speech Recognition using End-to-End Prefetching," in Proc. Interspeech, 2020.

T.N. Sainath, Y. He, B. Li, A. Narayanan, R. Pang, A. Bruguier, S. Chang, W. Li, R. Alvarez, Z. Chen, C.C. Chiu, D. Garcia, A. Gruenstein, K. Hu, M. Jin, A. Kannan, Q. Liang, I. McGraw, C. Peyser, R. Prabhavalkar, G. Pundak, D. Rybach, Y. Shangguan, Y. Sheth, T. Strohman, M. Visontai, Y. Wu, Y. Zhang and D. Zhao, "A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency," in Proc. ICASSP, 2020.

T.N. Sainath, R. Pang, R. Weiss, Y. He, C.C. Chiu, T. Strohman, "An Attention-Based Joint Acoustic and Text On-Device End-to-End Model," in Proc. ICASSP, 2020.

B. Li, S. Chang, T.N. Sainath, R. Pang, Y. He T. Strohman and Y. Wu, "Towards Fast and Accurate Streaming End-to-End ASR," in Proc. ICASSP, 2020.

K. Hu, T.N. Sainath, R. Pang and R. Prabhavalkar, "Deliberation Model Based Two-Pass End-to-End Speech Recognition," in Proc. ICASSP, 2020.

C. Peyser, T.N. Sainath and G. Pundak, "Improving Proper Noun Recognition in End-to-End ASR by Customization of the MWER Loss Criteria," in Proc. ICASSP, 2020.

Z. Wu, B. Li, Y. Zhang, P.S. Aleksic and T.N. Sainath, "Multistate Encoding with End-to-End Speech RNN Transducer Network," in Proc. ICASSP, 2020.

### 2019

A. Narayanan, R. Prabhavalkar, C. Chiu, D. Rybach, T.N. Sainath and T. Strohman, “Recognizing Long-Form Speech using Streaming End-to-End Models,” in Proc. ASRU, 2019.

C. Chiu, W. Han, Y. Zhang, R. Pang, S. Kishchenko, P. Nguyen, H. Soltau, A. Narayanan, H. Liao, S. Zhang, A. Kannan, R. Prabhavalkar, Z. Chen, T.N. Sainath, Y. Wu, “A Comparison of End-to-End models for Long-Form Speech Recognition,” in Proc. ASRU, 2019.

T.N. Sainath, R. Pang, D. Rybach, Y. He, R. Prabhavalkar, W. Li, M. Visontai, Q. Liang, T. Strohman, Y. Wu, I. McGraw and C.C Chiu, "Two-Pass End-to-End Speech Recognition," in Proc. Interspeech, 2019.

D. Zhao, T.N. Sainath, D. Rybach, P. Rondon, D. Bhatia, B. Li and R. Pang, "Shallow-Fusion End-to-End Contextual Biasing," in Proc. Interspeech, 2019.

C. Peyser, H. Zhang, T.N. Sainath and Z. Wu, "Improving Performance of End-to-End ASR on Numeric Sequences," in Proc. Interspeech, 2019.

K. Hu, A. Bruguier, T.N. Sainath, R. Prabhavalkar, G. Pundak, "Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models," in Proc. Interspeech, 2019.

A. Kannan, A. Datta, T.N. Sainath, E. Weinstein, B. Ramabhadran, Y. Wu, A. Bapna, Z. Chen and S. Lee, "Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model," in Proc. Interspeech, 2019.

Y. He, T. N. Sainath, R. Prabhavalkar, I. McGraw, R. Alvarez, D. Zhao, D. Rybach, A. Kannan, Y. Wu, R. Pang, Q. Liang, D. Bhatia, Y. Shangguan, B. Li, G. Pundak, K. Sim, T. Bagby, S. Chang, K. Rao, A. Gruenstein, "Streaming End-to-end Speech Recognition For Mobile Devices," in Proc. ICASSP, 2019.

S. Chang, R. Prabhavalkar, Y. He, T.N. Sainath and G. Simko, "Joint Endpointing and Decoding with End-to-End Models," in Proc. ICASSP, 2019.

J. Guo, T.N. Sainath and R.J. Weiss, "A Spelling Correction Model for End-to-End Speech Recognition," in Proc. ICASSP, 2019.

A. Bruguier, R. Prabhavalkar, G. Pundak and T.N. Sainath, "Phoebe: Pronunciation-Aware Contextualization for End-to-End Speech Recognition," in Proc. ICASSP, 2019.

U. Alon, G. Pundak and T.N. Sainath, "Contextual Speech Recognition with Difficult Negative Training Examples," in Proc. ICASSP, 2019.

B. Li, T.N. Sainath, R. Pang and Z. Wu, "Semi-Supervised Training for End-to-End Models via Weak Distillation," in Proc. ICASSP, 2019.

B. Li, Y. Zhang, T.N. Sainath, Y. Wu and W. Chan,"Bytes are all you Need: End-to-End Multilingual Speech Recognition and Synthesis with Bytes," in Proc. ICASSP, 2019.

### 2018

G. Pundak, T.N. Sainath, R. Prabhavalkar, A. Kannan, D. Zhao, "Deep context: end-to-end contextual speech recognition," in Proc. SLT, 2018.

S. Toshniwal, A. Kannan, C.C. Chiu, Y. Wu, T.N. Sainath and K. Livescu, "A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition," in Proc. SLT, 2018.

R. Pang, T.N. Sainath, R. Prabhavalkar, S. Gupta, Y. Wu, S. Zhang and C.C. Chiu, "Compression of End-to-End Models," in Proc. Interspeech, 2018.

K. Sim, A. Narayanan, A. Misra, A. Tripathi, G. Pundak, T.N. Sainath, P. Haghani, B. Li and M. Bacchiani, "Domain Adaptation Using Factorized Hidden Layer for Robust Automatic Speech Recognition," in Proc. Interspeech, 2018.

I. Williams, A. Kannan, P. Aleksic, D, Rybach and T.N. Sainath, "Contextual Speech Recognition in End-to-end Neural Network Systems Using Beam Search," in Proc. Interspeech, 2018.

C.C. Chiu, T. N. Sainath, Y. Wu, R. Prabhavalkar, P. Nguyen, Z. Chen, A. Kannan, R. J. Weiss, K. Rao, E. Gonina, N. Jaitly, B. Li, J. Chorowski and M. Bacchiani, "State-of-the-art Speech Recognition With Sequence-to-Sequence Models," in Proc. ICASSP, 2018.

T. N. Sainath, R. Prabhavalkar, S. Kumar, S. Lee, A. Kannan, D. Rybach, V. Schogol, P. Nguyen, B. Li, Y.i Wu, Z. Chen, C.C Chiu, "No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models," in Proc. ICASSP, 2018.

T. N. Sainath, C.C. Chiu, R. Prabhavalkar, A. Kannan, Y. Wu, P. Nguyen, Z. Chen, "Improving the Performance of Online Neural Transducer Models," in Proc. ICASSP, 2018.

R. Prabhavalkar, T. N. Sainath, Y. Wu, P. Nguyen, Z. Chen, C.C. Chiu, Anjuli Kannan, "Minimum Word Error Rate Training for Attention-based Sequence-to-Sequence Models," in Proc. ICASSP, 2018. [Best Industry Paper Award]

A.Kannan, Y. Wu, P. Nguyen, T. N. Sainath, Z. Chen, R. Prabhavalkar, "An analysis of incorporating an external language model into a sequence-to-sequence model," in Proc. ICASSP, 2018.

B. Li, T. N. Sainath, K. Chai Sim, M. Bacchiani, E. Weinstein, P. Nguyen, Z. Chen, Y. Wu, K. Rao, "Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model," in Proc. ICASSP, 2018.

S. Toshniwal, T. N. Sainath, R. J. Weiss, B. Li, P. Moreno, E. Weinstein, K. Rao, "Multilingual Speech Recognition With A Single End-To-End Model," in Proc. ICASSP, 2018.

J. Heymann, M. Bacchiani and T. N. Sainath, "Performance of mask based statistical beamforming in a smart home scenario," in Proc. ICASSP, 2018.

S. Chang, B. Li, G. Simko, T .N. Sainath, A. Tripathi, A. Oord, O. Vinyals, "Temporal Modeling Using Dialated Convolution and Gating For Voice Activity Detection," in Proc. ICASSP, 2018.

C. Kim and T. N. Sainath and A. Narayanan and A. Misra and R. Nongpiur and M. Bacchiani, "Spectral Distortion Model for Training Phase-Sensitive Deep Neural Networks for Far-field Speech Recognition," in Proc. ICASSP, 2018.

### 2017

T. N. Sainath, V. Peddinti, O. Siohan and A. Narayanan, "Annealed F-smoothing as a Mechanism to Speed up Neural Network Training," in Proc. Interspeech, 2017.

R. Prabhavalkar, T. N. Sainath, B. Li, K. Rao and N. Jaitly, "An Analysis of "Attention" in Sequence-to-Sequence Models," in Proc. Interspeech, 2017.

R. Prabhavalkar, K. Rao, T.N. Sainath, B. Li, L. Johnson and N. Jaitly, "A Comparison of Sequence-to-Sequence Models for Speech Recognition," in Proc. Interspeech, 2017.

G. Pundak and T. N. Sainath, "Highway-LSTM and Recurrent Highway Networks for Speech Recognition," in Proc. Interspeech, 2017.

B. Li, T. N. Sainath, J. Caroselli, A. Narayanan, M. Bacchiani, A. Misra, I. Shafran, H. Sak, G. Pundak, K. Chin, K. Sim, R. J. Weiss, K. W. Wilson, E. Variani, C. Kim, O. Siohan, M. Weintraub, E. McDermott, R. Rose and M. Shannon, "Acoustic Modeling for Google Home," in Proc. Interspeech, 2017.

B. Li and T. N. Sainath, "Reducing the Computational Complexity of Two-Dimensional LSTMs," in Proc. Interspeech, 2017.

S. Chang, B. Li, T. N. Sainath, G. Simko and C. Parada, "Endpoint Detection using Grid Long Short-term Memory Networks for Streaming Speech Recognition," in Proc. Interspeech, 2017.

C. Kim, A. Misra, K. Chin, T. Hughes, A, Narayanan, T. N. Sainath and M. Bacchiani, "Generation of Simulated Utterances in Virtual Rooms to Train Deep Neural Networks for Far-field Speech Recognition in Google Home," in Proc. Interspeech, 2017.

T. N. Sainath, R. J. Weiss, K. W. Wilson, B. Li, A. Narayanan, E. Variani, M. Bacchiani, I. Shafran, A. Senior, K. Chin, A. Misra and C. Kim "Multichannel Signal Processing with Deep Neural Networks for Automatic Speech Recognition," in IEEE Transactions on Speech and Language Processing, 2017.

T. N. Sainath, R. J. Weiss, K. W. Wilson, B. Li, A. Narayanan, E. Variani, M. Bacchiani, I. Shafran, A. Senior, K. Chin, A. Misra and C. Kim "Raw Multichannel Processing Using Deep Neural Networks," chapter in New Era for Robust Speech Recognitino: Exploiting Deep Learning, 2017.

### 2016

T. N. Sainath and Bo Li, "Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks," in Proc. Interspeech, 2016.

B. Li, T. N Sainath, R. Weiss, K. Wilson and M. Bacchiani, "Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition," in Proc. Interspeech, 2016.

E. Variani, T. N. Sainath, I. Shafran and M. Bacchiani, "Complex Linear Projection (CLP): A Discriminative Approach to Joint Feature Extraction and Acoustic Modeling," in Proc. Interspeech, 2016.

R. Zazo, T. N. Sainath, G. Simko and C. Parada, "Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection," in Proc. Interspeech, 2016.

T. N. Sainath, R. J. Weiss, K. W. Wilson, A. Narayanan and M. Bacchiani, "Factored Spatial and Spectral Multichannel Raw Waveform CLDNNs," in Proc. ICASSP, March 2016.

Z. Lu, V. Sindhwani and T. N. Sainath, "Learning Compact Recurrent Neural Networks," in Proc. ICASSP, March 2016.

T. N. Sainath, A. Narayanan, R. Weiss, E. Variani, K. Wilson, M. Bacchiani and I. Shafran, "Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature Extraction," in Proc. Interspeech, 2016.

G. Pundak and T. N. Sainath, "Lower Frame Rate Neural Network Acoustic Models," in Proc. Interspeech, 2016.

### 2015

T. N. Sainath, R. J. Weiss, K. W. Wilson, A. Narayanan, M. Bacchiani and A. Senior, "Speaker Location and Microphone Spacing Invariant Acoustic Modeling from Raw Multichannel Waveforms," in Proc. ASRU, December 2015.

A. Senior, H. Sak, F. de Chaumont Quitry, T. N. Sainath and K. Rao, "Acoustic Modelling with CD-CTC-sMBR LSTM RNNs," in Proc. ASRU, December 2015.

V. Sindhwani, T. N. Sainath and S. Kumar, "Structured Transforms for Small-footprint Deep Learning," in Proc. NIPS, December 2015.

T. N. Sainath, R. J. Weiss, A. Senior, K. W. Wilson and O. Vinyals, "Learning the Speech Front-end with Raw Waveform CLDNNs," in Proc. Interspeech 2015.

T. N. Sainath and C. Parada, "Convolutional Neural Networks for Small-Footprint Keyword Spotting," in Proc. Interspeech 2015.

Y. Chen, I. Lopez-Moreno, T. N. Sainath, M. Visontai, R. Alvarez and C. Parada, "Locally-Connected and Convolutional Neural Networks for Small Footprint Speaker Recognition," in Proc. Interspeech 2015.

H. Liao, G. Pundak, O. Siohan, M. Carroll, N. Coccaro, Q. Jiang, T. N. Sainath, A. Senior, F. Beaufays and M. Bacchiani, "Large Vocabulary Automatic Speech Recognition for Children," in Proc. Interspeech 2015.

T. N. Sainath, O. Vinyals, A. Senior and H. Sak, "Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks," in Proc. ICASSP 2015.

G. Chen, C. Parada and T. N. Sainath, "Query-by-Example Keyword Spotting Using Long Short-Term Memory Networks," in Proc. ICASSP 2015.

R. Prabhavalkar, R. Alvarez, C. Parada, P. Nakkiran and T. N. Sainath, "Automatic Gain Control and Multi-style Training for Robust Small-Footprint Keyword Spotting with Deep Neural Networks," in Proc. ICASSP 2015.

### 2014

T. N. Sainath, B. Kingsbury, G. Saon, H. Soltau, A. Mohamed, G. Dahl and B. Ramabhadran, "Deep Convolutional Neural Networks for Large-Scale Speech Tasks," in Elsevier, Special Issue in Deep Learning, November 2014.

I. Chung, T. N. Sainath, B. Ramabhadran, M. Picheny, J. Gunnels, V. Austel, U. Chaudhari and B. Kingsbury, "Parallel Deep Neural Network Training for Big Data on Blue Gene/Q," in Proc. of the International Conference on High Performance Computing, Networking, Storage and Analysis, November 2014.

T. N. Sainath, V. Peddinti, B. Kingsbury, P. Fousek, D. Nahamoo and B. Ramabhadhran, "Deep Scattering Spectra with Deep Neural Networks for LVCSR Tasks," in Proc. Interspeech, September 2014.

T. N. Sainath, I. Chung, B. Ramabhadran, M. Picheny, J. Gunnels, B. Kingsbury, G. Saon, V. Austel and U. Chaudhari, "Parallel Deep Neural Network Training for LVCSR using Blue Gene/Q," in Proc. Interspeech, September 2014.

T. N. Sainath, B. Kingsbury, A. Mohamed, G. Saon and B. Ramabhadran, "Improvements to Filterbank and Delta Learning within a Deep Neural Network Framework," in Proc. ICASSP, May 2014.

V. Peddinti, T. N. Sainath, S. Maymon, B. Ramabhadran, D. Nahamoo, V. Goel, "Deep Scattering Spectrum with Deep Neural Networks," in Proc. ICASSP, May 2014.

P. Huang, H. Avron, T. N. Sainath, V. Sindhwani and B. Ramabhadran, "Kernel Methods Match Deep Neural Networks on TIMIT: Scalable Learning in High-Dimensional Random Fourier Spaces," in Proc. ICASSP, May 2014. [Best Student Paper Award]

H. Soltau, G. Saon and T. N. Sainath, "Joint Training of Convoutional and Non-Convoutional Neural Networks," in Proc. ICASSP, May 2014.

### 2013

T. N. Sainath, B. Kingsbury, A. Mohamed and B. Ramabhadran, "Learning Filter Banks within a Deep Neural Network Framework," in Proc. ASRU, December 2013.

T. N. Sainath, L. Horesh, B. Kingsbury, A. Aravkin and B. Ramabhadran, "Accelerating Hessian-Free Optimization for Deep Neural Networks by Implicit Preconditioning and Sampling," in Proc. ASRU, December 2013.

T. N. Sainath, B. Kingsbury, A. Mohamed, G. Dahl, G. Saon, H. Soltau, T. Beran, A. Aravkin and B. Ramabhadran, "Improvements to Deep Convolutional Neural Networks for LVCSR," in Proc. ASRU, December 2013.

T. N. Sainath, B. Kingsbury, H. Soltau and B. Ramabhadran, "Optimization Techniques to Improve Training Speed of Deep Neural Networks for Large Speech Tasks," in Transactions on Audio, Speech and Language Processing, November 2013.

T. N. Sainath, A. Mohamed, B. Kingsbury and B. Ramabhadran, "Deep Convolutional Neural Networks for LVCSR," in Proc. ICASSP, May 2013.

T. N. Sainath, B. Kingsbury, V. Sindhwani, E. Arisoy and B. Ramabhadran, "Low-Rank Matrix Factorization for Deep Neural Network Training with High-Dimensional Output Targets," in Proc. ICASSP, May 2013.

G. Dahl, T. N. Sainath and G. Hinton, "Improving Deep Neural Networks for LVCSR using Rectified Linear Units and Dropout," in Proc. ICASSP, May 2013.

R. Prabhavalkar, T. N. Sainath, D. Nahamoo, B. Ramabhadran and D. Kanevsky, "An Evaluation Of Posterior Modeling Techniques for Phonetic Recognition," in Proc. ICASSP, May 2013.

J. Cui, X. Cui, B. Ramabhadran, J. Kim, B. Kingsbury, J. Mamou, L. Mangu, M. Picheny, T. N. Sainath, A. Sethy, "Developing Speech Recognition Systems for Corpus Indexing Under the IARPA Babel Program," in Proc. ICASSP, May 2013.

### 2012

T. N. Sainath, B. Kingsbury and B. Ramabhadran, "Improving Training Time of Deep Belief Networks Through Hybrid Pre-Training And Larger Batch Sizes," in Proc. NIPS Workshop on Log-linear Models, Dec. 2012.

G. Hinton, L. Deng, D. Yu, G. Dahl, A.Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, "Deep Neural Networks for Acoustic Modeling in Speech Recognition," in IEEE Signal Processing Magazine, 29, November 2012.

T. N. Sainath, B. Ramabhadran, D. Nahamoo, D. Kanevsky, D. Van Compernolle, K. Demuynck, J. F. Gemmeke, J. R. Bellegarda, S. Sundaram, "Exemplar-Based Processing for Speech Recognition," in IEEE Signal Processing Magazine, 29, November 2012.

T. N. Sainath, D. Nahamoo, B. Ramabhadran and D. Kanevsky, "Enhancing Exemplar-Based Posteriors for Speech Recognition Tasks," in Proc. Interspeech, September 2012.

B. Kingsbury, T. N. Sainath, and H. Soltau, "Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization," in Proc. Interspeech, September 2012.

E. Arisoy, T. N. Sainath, B. Kingsbury, and B. Ramabhadran, "Deep Neural Network Language Models," in Proc. NAACL, June 2012.

T. N. Sainath, B. Kingsbury, and B. Ramabhadran, "Auto-Encoder Bottleneck Features Using Deep Belief Networks," in Proc. ICASSP, March 2012.

C. Plahl, T. N. Sainath, B. Ramabhadran and D. Nahamoo, "Improved Pre-Training of Deep Belief Networks Using Sparse Encoding Symmetric Machines," in Proc. ICASSP, March 2012.

N. Itoh, T. N. Sainath, D. Jiang, J. Zhou and B. Ramabhadran, "N-best Entropy Based Data Selection for Acoustic Modeling," to appear in Proc. ICASSP, March 2012.

### 2011

T. N. Sainath, B. Kingsbury, B. Ramabhadran, P. Fousek, P. Novak and A. Mohamed, "Making Deep Belief Networks Effective for Large Vocabulary Continuous Speech Recognition," in Proc. ASRU, December 2011.

T. N. Sainath, D. Nahamoo, D. Kanevsky, B. Ramabhadran and P. M. Shah, “A Convex Hull Approach to Sparse Representations for Exemplar-Based Speech Recognition,” in Proc. ASRU, December 2011.

T. N. Sainath, B. Ramabhadran, M. Picheny, D. Nahamoo and D. Kanevsky, “Exemplar-Based Sparse Representation Features: From TIMIT to LVCSR,” in IEEE Transactions on Speech and Audio Processing, November 2011.

T. N. Sainath, B. Ramabhadran, D. Nahamoo and D. Kanevsky, “Reducing Computational Complexities of Exemplar-Based Sparse Representations With Applications to Large Vocabulary Speech Recognition,” in Proc. Interspeech, August 2011.

D. Kanevsky, D. Nahamoo, T. N. Sainath and B. Ramabhadran, "Convergence of Line Search A-Function Methods," in Proc. Interspeech, August 2011.

T. N. Sainath, D. Nahamoo, D. Kanevsky, B. Ramabhadran and P. M. Shah, “A Convex Hull Approach to Sparse Representations for Exemplar-Based Speech Recognition,” Technical Report, Speech and Language Algorithm Group, IBM, April 2011.

T. N. Sainath, D. Nahamoo, D. Kanevsky, B. Ramabhadran and P. M. Shah, “Exemplar-Based Sparse Representation Phone Identification Features,” in Proc. ICASSP, May 2011.

A. Mohamed, T. N. Sainath, G. Dahl, B. Ramabhadran, G. Hinton and M. Picheny, "Deep Belief Networks using Discriminative Features for Phone Recognition," in Proc. ICASSP, May 2011.

D. Kanevsky, D. Nahamoo, T. N. Sainath, B. Ramabhadran and P. A. Olsen, "A-Functions: A Generalization of Extended Baum-Welch Transformations to Convex Optimization," in Proc. ICASSP, May 2011.

B. Zhang, A. Sethy, T. N. Sainath and B. Ramabhadran, "Application Specific Loss Minimization Using Gradient Boosting," in Proc. ICASSP, May 2011.

### 2010

T. N. Sainath, B. Ramabhadran, D. Nahamoo, D. Kanevsky and A. Sethy, “Exemplar-Based Sparse Representation Features for Speech Recognition ,” in Proc. Interspeech, September 2010.

T. N. Sainath, S. Maskey, D. Kanevsky, B. Ramabhadran, D. Nahamoo and J. Hirschberg, “Sparse Representations for Text Categorization,” in Proc. Interspeech, September 2010.

V. Goel, T. N. Sainath, B. Ramabhadran, P. A. Olsen, D. Nahamoo and D. Kanevsky, “Incorporating Sparse Representation Phone Identification Features in Automatic Speech Recognition Using Exponential Families,” in Proc. Interspeech, September 2010.

D. Kanevsky, T. N. Sainath, B. Ramabhadran and D. Nahamoo, "An Analysis of Sparseness and Regularization in Exemplar-Based Methods for Speech Classification,” in Proc. Interspeech, September 2010.

A. Sethy, T. N. Sainath, B. Ramabhadran and D. Kanevsky, “Data Selection for Language Modeling Using Sparse Representations,” in Proc. Interspeech, September 2010.

D. Kanevsky, A. Carmi, L. Horesh, P. Gurfil, B. Ramabhadran and T.N. Sainath, "Kalman Filtering for Compressed Sensing," in Proc. Information Fusion, Edinburgh, UK, July 2010.

T. N. Sainath, D. Nahamoo, B. Ramabhadran and D. Kanevsky, “Sparse Representation Phone Identification Features for Speech Recognition,” Technical Report, Speech and Language Algorithm Group, IBM, April 2010.

S. Teller, M. Walter, M. Antone, A. Correa, R. Davis, L. Fletcher, E. Frazzoli, J. Glass, J. How, A. Huange, J. Jeon, S. Karaman, B. Luders, N. Roy and T. N. Sainath, "A Voice-Commandable Robitic Forklift Working Alongside Humans in Minimally-Prepared Outdoor Environments," in Proc. ICRA, Anchorage, Alaska, May 2010.

T. N. Sainath, A. Carmi, D. Kanevsky and B. Ramabhadran, “Bayesian Compressive Sensing for Phonetic Classification,” in Proc. ICASSP, Dallas, Texas, March 2010.

A. Carmi, T. N. Sainath, P. Gurfil, D. Kanevsky, D. Nahamoo and B. Ramabhadran, “The Use Of Isometric Transformations and Bayesian Estimation In Compressive Sensing for fMRI Classification,” in Proc. ICASSP, Dallas, Texas, March 2010.

### 2009

T. N. Sainath, B. Ramabhadran and M. Picheny, “An Exploration of Large Vocabulary Tools for Small Vocabulary Phonetic Recognition,” Proceedings of ASRU, Merano, Italy, December 2009.

T. N. Sainath, “Island-Driven Search Using Broad Phonetic Classes,” Proceedings of ASRU, Merano, Italy, December 2009.

D. Kanevsky, T. N. Sainath and B. Ramabhadran, “A Generalized Family of Parameter Estimation Techniques,” Proceedings of ICASSP, Tapei, Taiwan, April 2009.

### 2008

T. N. Sainath and V. Zue, “A Comparison of Broad Phonetic and Acoustic Units for Noise Robust Segment-Based Speech Recognition,” Proceedings of Interspeech, September 2008.

D. Kanevsky, T. N. Sainath, B. Ramabhadran and D. Nahamoo, “Generalization of Extended Baum-Welch Parameter Estimation for Discriminative Training and Decoding,” Proceedings of Interspeech, September 2008.

T. N. Sainath, D. Kanevsky and B. Ramabhadran, “Gradient Steepness Metrics using Extended Baum-Welch Transformations for Universal Pattern Recognition Tasks,” Proceedings of ICASSP, Las Vegas, USA, April 2008.

### 2007

T. N. Sainath, D. Kanevsky and B. Ramabhadran, “Broad Phonetic Class Recognition in a Hidden Markov Model Framework using Extended Baum-Welch Transformations,” Proceedings of ASRU, Kyoto, Japan, December 2007.

T. N. Sainath, V. Zue and D. Kanevsky, “Audio Classification using Extended Baum-Welch Transformations,” Proceedings of Interspeech, Antwerp, Belgium, August 2007.

T. N. Sainath, D. Kanevsky and G. Iyengar, “Unsupervised Audio Segmentation using Extended Baum-Welch Transformations,” Proceedings of ICASSP, Honolulu, USA, April 2007.

### 2006

T. N. Sainath and T.J. Hazen, “A Sinusoidal Model Approach to Acoustic Landmark Detection and Segmentation for Robust Segment-Based Speech Recognition,” Proceedings of ICASSP, Toulouse, France, May 2006.

## Theses

T. N. Sainath, "Applications of Broad Class Knowledge for Noise Robust Speech Recognition," PhD Thesis, MIT Department of Electrical Engineering and Computer Science, June 2009.

T. N. Sainath, "Acoustic Landmark Detection and Segmentation using the McAulay-Quatieri Sinusoidal Model," M.Eng. Thesis, MIT Department of Electrical Engineering and Computer Science, August 2005.