Alexander Toshev

Research Scientist / Tech Lead

Robotics @ Google

toshev at google / alex.t.toshev at gmail

Google AI:


Google Scholar:


My work lies at the intersection of AI, Computer Vision and Robotics.

Most recently, I have been working on building perceptual capabilities for autonomous agents, initiating and leading an effort on Semantic Visual Navigation, as part of the robotics research effort at Google.

Prior to that, I have extensively worked on a wide range of computer vision problems. Notable achievements:

  • human pose estimation: first deep learning based approach, SOTA results over the years

  • object detection: first deep learning based approach, SOTA results ca 2015, widely deployed at Google

  • language and computer vision: co-initiated a stream on language and vision in the computer vision community, one of first works on neural image captioning.


CVPR'20, CVPR'21 Workshop on Embodied AI

CVPR' 19 Workshop on Deep Learning for Semantic Visual Navigation

Area Chair, CVPR 2017, CVPR 2020, ECCV 2020

Program committee, CVPR, ICCV, ECCV, NIPS

Recent Talks

iGibson Sim2Real Challenge, Embodied AI Workshop, CVPR 2020.

Robot Learning Workshop, Robot Learning Workshop, NSF & Lehigh University, 2019.



Ayzaan Wahid, Austin Stone, Kevin Chen, Brian Ichter, Alexander Toshev, Learning Object-conditioned Exploration using Distributed Soft Actor Critic, CoRL 2020.

Fei Xia, Chengshu Li, Or Litany, Roberto Martin-Martin, Alexander Toshev, Silvio Savarese, ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation, 2020.

Sören Pirk, Karol Hausman, Alexander Toshev, Mohi Khansari, Modeling Long-horizon Tasks as Sequential Interaction Landscapes, CoRL 2020.

Fei Xia, William Chen, Chengshu Li, Priya Kasimbeg, Micael Tchampi, Alexander Toshev, Roberto Martin-Martin, Silvio Savarese, Interactive Gibson: A Benchmark in Navigation in Cluttered Environments, RA-Letters, 2020

Kuan Fang, Alexander Toshev, Silvio Savarese, Li Fei-Fei, Scene Memory Transformer for Embodied Agents in Long Horizon Tasks, CVPR 2019.

Ayzaan Wahid, Alexander Toshev, Marek Fiser, Edward Lee, Long Range Neural Navigation Policies for the Real World, IROS 2019.

Arsalan Mousavian, Alexander Toshev, Marek Fiser, Jana Kosecka, James Davidson, Visual Representations for Semantic Target Driven Navigation, ICRA 2019.

Fereshteh Sadeghi, Alexander Toshev, Eric Jang, Sergey Levine, Sim2Real Viewpoint Invariant Visual Servoing by Recurrent Control, CVPR 2018.

Language and Vision

Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge, IEEE Transactions on PAMI, 2017.

Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan L Yuille, Kevin Murphy, Generation and Comprehension of Unambiguous Object Descriptions, CVPR 2016.

Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, Show and tell: A neural image caption generator, CVPR 2015 (oral, 3100+ citations).

Human Pose Estimation

AJ Piergiovanni, Anelia Angelova, Alexander Toshev, Michael S. Ryoo, Adversarial Generative Grammars for Human Activity Prediction, ECCV 2020.

George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, Kevin Murphy, Towards accurate multi-person pose estimation in the wild, CVPR 2017 (best on human pose estimation on COCO).

Georgia Gkioxari, Alexander Toshev, Navdeep Jaitly, Chained Predictions Using Convolutional Neural Networks, ECCV 2016.

Alexander Toshev, Christian Szegedy, DeepPose: Human Pose Estimation via Deep Neural Networks, CVPR 2014 (oral, 1300+ citations).

Benjamin Sapp, Alexander Toshev, Ben Taskar, Cascaded Models for Articulated Pose Estimation, ECCV 2010.

Object Detection

Etienne Pot, Alexander Toshev, Jana Kosecka, Self-supervisory Signals for Object Discovery and Detection, 2018.

Dumitru Erhan, Christian Szegedy, Alexander Toshev, Dragomir Anguelov, Scalable Object Detection Using Deep Neural Networks, CVPR 2014 (700+ citations).

Christian Szegedy, Alexander Toshev, Dumitru Erhan, Deep Neural Networks for Object Detection, NIPS 2013 (800+ citations).


AJ Piergiovanni, Anelia Angelova, Alexander Toshev, Michael S Ryoo, Evolving Space-Time Neural Architectures for Videos, In Submission, 2019.

Yair Movshovitz-Attias, Alexander Toshev, Thomas K Leung, Sergey Ioffe, Saurabh Singh, No Fuss Distance Metric Learning via Proxies, ICCV 2017.

Jonathan Krause, Benjamin Sapp, Andrew Howard, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, Li Fei-Fei, The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition, ECCV 2016.

Yunchao Gong, Yangqing Jia, Thomas Leung, Alexander Toshev, Sergey Ioffe, Deep Convolutional Ranking for Multi-label Image Annotation, ICLR 2013

Alexander Toshev, Philippos Mordohai, Ben Taskar, Detecting and Parsing Architecture at City Scale from Range Data, CVPR 2010.

Alexander Toshev, Ben Taskar, Kostas Daniilidis, Object Detection via Boundary Structure Segmentation, CVPR 2010.

Alexander Toshev, Ameesh Makadia, Kostas Daniilidis, Shape-based object recognition in videos using 3D synthetic object models, CVPR 2009.

Alexander Toshev, Jianbo Shi, Kostas Daniilidis, Image Matching via Saliency Region Correspondences, CVPR 2007 (oral).

Alexander Toshev, Submodular Function Minimization, University of Pennsylvania, 2010.


Generating natural language descriptions of images, Samy Bengio, Oriol Vinyals, Alexander Toshev, Dumitru Erhan, US Patent 9,858,524, 2018.

Automatic translation of digital graphic novels, Greg Don Hartrell, Debajit Ghosh, Matthew William Vaughan-Vail, John Michael Rivlin, US Patent 9,881,003, 2018.

Sublinear time classification via feature padding and hashing, Sergey Ioffe, Alexander Toshev, US Patent 9,940,552, 2018.

Ranking approach to train deep neural nets for multilabel image annotation, Yunchao Gong, King Hong Thomas Leung, Alexander Toshev, Sergey Ioffe, US Patent 9,552,549, 2017.

Object detection using deep neural networks, Christian Szegedy, Dumitru Erhan, Alexander Toshev, US Patent 9,275,308, 2016.

System and method for using segmentation to identify object location in images, Vivek Kwatra, Jay Yagnik, Alexander Toshev, US Patent 9,483,701, 2016.

Object recognition, Alexander Toshev, King Hong Thomas Leung, Jiwoong Jack Sim, US Patent 8,942,468, 2015.

Perceptually-driven representation for object recognition, Alexander Toshev, Jay Yagnik, Vivek Kwatra, US Patent 9,008,356, 2015.

Discriminitive learning for object detection, Dragomir Anguelov, Alexander Toshkov Toshev, Deva K Ramanan, Xiangxin Zhu, US Patent 9,098,741, 2015.

System and method for exploiting segment co-occurrence relationships to identify object location in images, Vivek Kwatra, Jay Yagnik, Alexander Toshev, Poonam Suryanarayan, US Patent 8,768,048, 2014.

Segmentation-based feature pooling for object models, Alexander Toshev, Jay Yagnik, Vivek Kwatra, , US Patent 8,467,607, 2013.