Shixiang Shane Gu

Shixiang Shane Gu‪Google DeepMind‬ - ‪‪Cited by 72,563‬‬ - ‪Deep Learning‬ - ‪Artificial Intelligence‬ - ‪Machine Learning‬ - ‪Reinforcement Learning‬ - ‪Robotics‬

X (English), X (Japanese), LinkedIn

Shixiang Shane Gu is a Senior Staff Research Scientist at Google DeepMind, currently focusing on the Gemini Thinking team. Previously, he led the Multilinguality team for Gemini Post-Training served as a senior researcher on the ChatGPT team and led Japan market entry at OpenAI, and was a founding member of Google Brain Robotics team.

Shane is recognized for pioneering contributions across generative modeling, reinforcement learning, and reasoning in large language models (LLMs). His work includes co-inventing Gumbel-Softmax, developing sample-efficient deep reinforcement learning algorithms, co-authoring the seminal work on Zero-Shot Chain-of-Thought prompting ("Let's think step by step") and demonstrating how LLMs can self-improve through self-generated reasoning.

Academically, Shane holds a PhD in Machine Learning from the University of Cambridge and the Max Planck Institute for Intelligent Systems, supervised by Richard E. Turner, Zoubin Ghahramani, and Bernhard Schölkopf. He earned his B.ASc. in Engineering Science from the University of Toronto under the supervision of Geoffrey E. Hinton. He has also served as a Visiting Associate Professor at the University of Tokyo and a Visiting Scholar at Stanford University.

His research has been recognized with a Best Paper Award at CoRL 2019, a Google Focused Research Award, and features in MIT Technology Review. A Japan-born Chinese Canadian, Shane is fluent in three languages.

Conference Publications

Gemini Team, Google DeepMind. “Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context”. 2024.
Gemini Team, Google DeepMind. “Gemini: a family of highly capable multimodal models.” 2023.
OpenAI. “GPT-4 technical report”. 2023.
Paul Yoo, Jiaxian Guo, Yutaka Matsuo, Shixiang Shane Gu. “DreamSparse: Escaping from Plato’s Cave with 2D Diffusion Model Given Sparse Views”. NeurIPS 2023.
Scott Fujimoto, Wei-Di Chang, Edward J. Smith, Shixiang Shane Gu, Doina Precup, David Meger. “For SALE: State-Action Representation Learning for Deep Reinforcement Learning”. NeurIPS 2023.
Hiroki Furuta, Kuang-Huei Lee, Ofir Nachum, Yutaka Matsuo, Aleksandra Faust, Shixiang Shane Gu, Izzeddin Gur.. “Multimodal Web Navigation with Instruction-Finetuned Foundation Models”. ICLR 2023.
Ruibo Liu, Jason Wei, Shixiang Shane Gu, Te-Yen Wu, Soroush Vosoughi, Claire Cui, Denny Zhou, Andrew M. Dai. "Mind's Eye: Grounded Language Model Reasoning through Simulation". ICLR 2023. [Openreview]
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa. "Large Language Models are Zero-Shot Reasoners". NeurIPS 2022. [Arxiv]
Seyed Kamyar Seyed Ghasemipour, Shixiang Shane Gu, Ofir Nachum. "Why So Pessimistic? Estimating Uncertainties for Offline RL through Ensembles, and Why Their Independence Matters." NeurIPS 2022. [Arxiv]
Scott Fujimoto, David Meger, Doina Precup, Ofir Nachum, Shixiang Shane Gu. "Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error". ICML 2022. [Arxiv]
Seyed Kamyar Seyed Ghasemipour, Daniel Freeman, Byron David, Shixiang Shane Gu, Satoshi Kataoka, Igor Mordatch. "Blocks Assemble! Learning to Assemble with Large-Scale Structured Reinforcement Learning". ICML 2022. [Arxiv]
Hiroki Furuta, Yutaka Matsuo, Shixiang Shane Gu. "Generalized Decision Transformer for Offline Hindsight Information Matching". ICLR 2022 [Spotlight, 5.2% of 3391 submissions]. [Arxiv] [Project Page] [Github]
Scott Fujimoto, Shixiang Shane Gu. "A Minimalist Approach to Offline Reinforcement Learning". NeurIPS 2021 [Spotlight, 3% of 9122 submissions]. [Arxiv] [Code]
Hiroki Furuta, Tadashi Kozuno, Tatsuya Matsushima, Yutaka Matsuo, Shixiang Shane Gu. "Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning". NeurIPS 2021. [Arxiv]
Hiroki Furuta, Tatsuya Matsushima, Tadashi Kozuno, Yutaka Matsuo, Sergey Levine, Ofir Nachum, Shixiang Shane Gu. "Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning". ICML 2021 (also a contributed talk at ICLR 2021 Workshop on Never-Ending RL). [Arxiv]
Jongwook Choi, Archit Sharma, Honglak Lee, Sergey Levine, Shixiang Shane Gu. "Variational Empowerment as Representation Learning for Goal-Based Reinforcement Learning". ICML 2021. [Arxiv]
Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, Shixiang Shane Gu. "EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL". ICML 2021. [Arxiv]
Tatsuya Matsushima*, Hiroki Furuta*, Yutaka Matsuo, Ofir Nachum, Shixiang Gu. "Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization". ICLR 2021. *equal contribution [Arxiv]
Lisa Lee, Benjamin Eysenbach, Ruslan Salakhutdinov, Shixiang Shane Gu, Chelsea Finn. "Weakly-Supervised Reinforcement Learning for Controllable Behavior". NeurIPS 2020. [Arxiv]
Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Shane Gu, Rosalind Picard. "Human-centric dialog training via offline reinforcement learning". EMNLP 2020. [Arxiv]
Archit Sharma, Michael Ahn, Sergey Levine, Vikash Kumar, Karol Hausman, Shixiang Gu. "Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning". RSS 2020. [Arxiv] [Videos]
Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, Karol Hausman. "Dynamics-Aware Unsupervised Discovery of Skills". ICLR 2020 [Oral, 1.8% of 2594 submissions]. [Arxiv] [Videos]
Seyed Kamyar Seyed Ghasemipour, Richard Zemel, Shixiang Gu. "A Divergence Minimization Perspective on Imitation Learning Methods". CoRL 2019 [Best Paper Award, 0.25% of 398 submissions]. [Arxiv] [Videos]
Ofir Nachum, Michael Ahn, Hugo Ponte, Shixiang Gu, Vikash Kumar. "Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real". CoRL 2019 [Oral]. [Arxiv]
Seyed Kamyar Seyed Ghasemipour, Shixiang Gu, Richard Zemel. "SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies". NeurIPS 2019. [PDF]
Yiding Jiang, Shixiang Gu, Kevin Murphy, Chelsea Finn. "Language as an Abstraction for Hierarchical Deep Reinforcement Learning". NeurIPS 2019. [Arxiv] [Videos]
Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine. "Near-Optimal Representation Learning for Hierarchical Reinforcement Learning". ICLR 2019. [Arxiv] [Videos]
George Tucker, Dieterich Lawson, Shixiang Gu, Chris J. Maddison. "Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives". ICLR 2019. [Arxiv]
Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine. "Data-Efficient Hierarchical Reinforcement Learning". NeurIPS 2018. [Arxiv] [Videos]
George Tucker, Surya Bhupatiraju, Shixiang Gu, Richard E. Turner, Zoubin Ghahramani, Sergey Levine. "The Mirage of Action-Dependent Baselines in Reinforcement Learning". ICML 2018. [Arxiv]
Vitchyr Pong*, Shixiang Gu*, Murtaza Dalal, Sergey Levine. "Temporal Difference Models: Model-Free Deep RL for Model-Based Control". ICLR 2018. *equal contribution [Paper] [Arxiv] [Blogpost]
Benjamin Eysenbach, Shixiang Gu, Julian Ibarz, Sergey Levine. "Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning". ICLR 2018. [Paper] [Arxiv] [Videos]
Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Bernhard Schölkopf, Sergey Levine. "Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning". NIPS 2017. [Paper]
Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, Jose Miguel Hernndez Lobato, Richard E. Turner, Douglas Eck. "Sequence Tutor: Conservative fine-tuning of sequence generation models with KL-control". ICML 2017. [Paper]
Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Sergey Levine. "Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic". ICLR 2017 [Oral, ~3%]. [Paper]
Eric Jang, Shixiang Gu, Ben Poole. "Categorical Reparametrization with Gumble-Softmax". ICLR 2017. [Paper]
Shixiang Gu*, Ethan Holly*, Timothy Lillicrap, Sergey Levine. "Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates". ICRA 2017. [Paper] [Google Blogpost] [MIT Technology Review] [ZDNet] [Video] *equal contribution
Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine. "Continuous Deep Q-Learning with Model-based Acceleration". ICML 2016. [Paper] [Arxiv]
Shixiang Gu, Sergey Levine, Ilya Sutskever, Andriy Mnih. "MuProp: Unbiased Backpropagation for Stochastic Neural Networks". ICLR 2016. [Paper]
Shixiang Gu, Zoubin Ghahramani, Richard E. Turner. "Neural Adaptive Sequential Monte Carlo". NIPS 2015. [Paper] [Arxiv] [Supplementary]
Nilesh Tripuraneni*, Shixiang Gu*, Hong Ge, Zoubin Ghahramani. "Particle Gibbs for Infinite Hidden Markov Models". NIPS 2015. [Paper] [Arxiv] *equal contribution
Steve Mann, Raymond Chun Hing Lo, Kalin Ovtcharov, Shixiang Gu, David Dai, Calvin Ngan, Tao Ai. "Realtime HDR (High Dynamic Range) Video for EyeTap Wearable Computers, FPGA-Based Seeing Aids, and GlassEyes", IEEE CCECE 2012, Montreal, 2012 April 29 to May 2. 6 pages, to be indexed in IEEE Xplore. ACM SIGGRAPH 2012, Emerging Technologies Exhibition. [Paper] [BibTex] [Video]

Workshop Papers

Ofir Nachum, Haoran Tang, Xingyu Lu, Shixiang Gu, Honglak Lee, Sergey Levine. "Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?". [Arxiv]
Abhishek Bhatia, Jaan Altosaar, Shixiang Gu. "Proximity-Constrained Reinforcement Learning". Advances in Approximate Bayesian Inference. NIPS 2017. [Paper]
Brandon Amos, Shixiang Gu, J. Zico Kolter. "End-to-End Model-Predictive Control". Deep Reinforcement Learning Symposium, NIPS 2017. [Paper]
Chen Wang, Xiangyu Chen, Zelin Ye, Jialu Wang, Ziruo Cai, Shixiang Gu, Cewu Lu. "TRL: Discriminative Hints for Scalable Reverse Curriculum Learning". Deep Reinforcement Learning Symposium, NIPS 2017. [Paper] [Videos]
Vitchyr Pong, Shixiang Gu, Sergey Levine . "Learning Long-term Dependencies with Deep Memory States ". Lifelong Learning: A Reinforcement Learning Approach Workshop, ICML 2017. [Paper]
Shixiang Gu, Luca Rigazio. “Toward Deep Neural Network Architectures Robust to Adversarial Examples”. ICLR 2015 Workshop. [Paper]

Invited Talks

Shixiang Shane Gu. "Continual Learning Machine" SEMICON Japan 2021 (26,626 in-person attendees).
Shixiang Gu. "Mutual Information in Deep RL: Toward a Single Tractable Reward Function for General Intelligence". Qingyuan Salon (青源沙龙) by Beijing Academy of Artificial Intelligence (BAAI), 2021.
Shixiang Gu. "Mutual Information in Deep RL: Toward a Single Tractable Reward Function for General Intelligence". Computational Sensorimotor Learning (CSL) Seminar at MIT, 2021.
Shixiang Gu. "The Future of Robotics". TKS (non-technical talk for 2000+ high school students), 2021. [Video, Website]
Shixiang Gu. "Empowerment as an Intelligence Measure". Re:Work Deep Learning Summit 2020. [Video]
Shixiang Gu. "Intelligence, Robotics, and Predictability Maximization". University of Tokyo, Japan, 2019. [Slides]
Shixiang Gu. "Model-based RL with Predictability Maximization". RIKEN AIP, Japan, 2019; University of Kyoto, Japan, 2019. [Slides]
Shixiang Gu. "Rewards, Resets, Exploration: Bottlenecks in Scaling Deep RL for Robotics". Re:Work Deep RL Summit, San Francisco, USA, 2019. [Slides] "30 Influential AI Presentations from 2019" by Re:Work
Shixiang Gu. “Sample-Efficient Deep RL for Robotics”. Vector Institute, Canada, 2017; CUHK, Hong Kong, China, 2017 (hosted by Xiaoou Tang and Xiaogang Wang); University of Tokyo, Japan, 2016 (hosted by Masashi Sugiyama and Matsuo Yutaka).
Timothy Lillicrap, Shixiang Gu. “Deep RL methods in Robotics”. Reinforcement Learning Forum. Google, USA, 2016.
Shixiang Gu. “Generalized Backprop, Neural Particle Filter, and Guided Q-Learning”. Hosted by Pieter Abbeel. UC Berkeley, USA, 2015.
Shixiang Gu. “Algorithms for Training Deep Stochastic Neural Networks”. Hosted by Noah Goodman. Stanford University, USA, 2015.
Shixiang Gu, Andrey Malinin. “Long Short-Term Memory Networks”. Machine Learning RCC. Cambridge University, UK, 2015.

Academic Activitites

Action Editor for Transactions on Machine Learning Research (TMLR).
Area Chair for ICML 2023.
Area Chair for NeurIPS 2021, 2022.
Co-organizer for Ecological Theory of RL Workshop at NeurIPS 2021.
Co-organizer for Exploration in RL Workshop at ICML 2018 and 2019.
Program Committee for ICLR A Roadmap To Never-ending RL workshop 2021.
Senior Program Committee for IJCAI 2021.
Expert Reviewer (ER) for International Conference on Machine Learning (ICML) 2021.
Reviewer for JMLR, 2022.
Program Committee for Learning for Dynamics and Control (L4DC), 2020, 2021.
Program Committee for NeurIPS Deep Reinforcement Learning Symposium, 2017, 2018, 2019.
Reviewer for Nature, 2020.
Reviewer for International Conference on Robotics and Automation (ICRA), 2018, 2022.
Reviewer for Neural Computation (NECO), 2018.
Reviewer for Conference on Robot Learning (CoRL), 2017, 2018, 2019.
Reviewer for AISTATS 2018.
Reviewer for Neural Information Processing Systems (NeurIPS), 2017, 2018, 2019, 2020.
Reviewer for International Conference on Machine Learning (ICML), 2017, 2018, 2019, 2020.
Reviewer for International Conference on Learning Representations (ICLR), 2016, 2017, 2018, 2019.
Reviewer for IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017.
Reviewer for IEEE Conference on Decision and Control (CDC), 2017.

Contact

Email: <my-english-first-name-><my-last-name> at <the-company-i-work-at> dot com

Google Sites

Report abuse