George Tucker

Google Scholar - LinkedIn - Github -

I am a research scientist on the Google Brain team. My current focus is on sequential models and reinforcement learning. We want to learn rich models and complex policies efficiently in the number of training samples.

Previously, I was a research scientist on the Amazon Speech team in Boston, where I designed deep neural networks acoustic models for small-footprint keyword spotting. Before joining Amazon, I was a visiting Postdoctoral Research Fellow in the Price lab at the Harvard School of Public Health. I worked on methods for genetic risk prediction and association testing in genome wide association (GWAS) studies with related individuals. I conducted my PhD research in the MIT Mathematics department in Professor Bonnie Berger's research group.

Google Brain (selected publications)

Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives. G Tucker, D Lawson, S Gu, CJ Maddison.

Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion. J Buckman, D Hafner, G Tucker, E Brevdo, H Lee.

  • NIPS 2018 (Oral Presentation, <1% acceptance rate)

The Mirage of Action-Dependent Baselines in Reinforcement Learning. G Tucker, S Bhupatiraju, S Gu, RE Turner, Z Ghahramani, S Levine.

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling. C Riquelme, G Tucker, J Snoek.

REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models. G Tucker, A Mnih, CJ Maddison, D Lawson, J Sohl-Dickstein.

Filtering Variational Objectives. CJ Maddison*, D Lawson*, G Tucker*, N Heess, M Norouzi, A Doucet, A Mnih, YW Teh.

Confidence Penalties

We propose regularizing neural networks by penalizing low entropy output distributions. We show that penalizing low entropy output distributions, which has been shown to improve exploration in reinforcement learning, acts as a strong regularizer in supervised learning. We connect our confidence penalty to label smoothing through the direction of the KL divergence between networks output distribution and the uniform distribution. We exhaustively evaluate our proposed confidence penalty and label smoothing (uniform and unigram) on 6 common benchmarks: image classification (MNIST and Cifar-10), language modeling (Penn Treebank), machine translation (WMT'14 English-to-German), and speech recognition (TIMIT and WSJ). We find that both label smoothing and our confidence penalty improve state-of-the-art models across benchmarks without modifying existing hyper-parameters.

Amazon Speech - Small-footprint acoustic models

Compacting Neural Network Classifiers via Dropout Training. Kubo Y., Tucker G., Wiesler, S. NIPS 2016 workshop on Efficient Methods for Deep Neural Networks.

Max-pooling Loss Training of Long Short-Term Memory Networks for Small-footprint Keyword Spotting. M Sun, A Raju, G Tucker, S Panchapagesan, G Fu, A Mandal, et al. SLT 2016.

Model compression applied to small-footprint keyword spotting

Recently, a number of devices and services have enabled fully voice-based interfaces, such as the Google Now, iPhone 6s, and the Amazon Echo. For privacy reasons, these devices rely on the user to preface their commands with a keyword, such as "Alexa". Accurate on-device keyword spotting is critical to usability. In this work, we focused on keyword spotting systems (KWS) for small-footprint devices. In particular, we investigated the use of low rank weight matrices and knowledge distillation applied to a deep neural network (DNN) based KWS system. We found that these techniques combine to give significant reductions in false alarms (FAs) and misses (~10% reduction in FAs at a fixed miss rate).

HSPH & MIT - Medical Genetics


Genetic prediction based on either identity by state (IBS) sharing or pedigree information has been investigated extensively using Best Linear Unbiased Prediction (BLUP) methods. Such methods were pioneered in the plant and animal breeding literature and have since been applied to predict human traits with the aim of eventual clinical utility. However, methods to combine IBS sharing and pedigree information for genetic prediction in humans have not been explored. We introduce a two variance component model for genetic prediction: one component for IBS sharing and one for approximate pedigree structure, both estimated using genetic markers. In simulations using real genotypes from CARe and FHS family cohorts, we demonstrate that the two variance component model achieves gains in prediction r^2 over standard BLUP at current sample sizes, and we project based on simulations that these gains will continue to hold at larger sample sizes. Accordingly, in analyses of four quantitative phenotypes from CARe and two quantitative phenotypes from FHS, the two variance component model significantly improves prediction r^2 in each case, with up to a 20% relative improvement. We also find that standard mixed model association tests can produce inflated test statistics in data sets with related individuals, whereas the two variance component model corrects for inflation.


Using a reduced subset of SNPs in a linear mixed model can improve power for genome-wide association studies, yet this can result in insufficient correction for population stratification. We propose a hybrid approach using principal components that does not inflate statistics in the presence of population stratification and improves power over standard linear mixed models.