I am a Research Scientist and Manager at Apple ML Research, leading research efforts in Multimodal Learning and Embodied AI.
Prior to that, I was a Research Scientist and Tech Lead at Robotics@Google where I initiated and led the Robot Navigation effort. Notable achievements are defining and pushing the limits on Semantic Navigation, on Social Navigation, and more recently initiating language + robotics efforts.
Prior to that, I was fortunate to work with an exceptional group of researchers in defining the first deep learning approaches on fundamental computer vision problems:
human pose estimation: first deep learning based approach, SOTA results over the years
object detection: first deep learning based approach, SOTA results ca 2015, widely deployed at Google
language and computer vision: co-initiated a stream on language and vision in the computer vision community, one of first works on neural image captioning.
Symposium on Social Navigation Benchmarking, Feb 2022.
Area Chair, CVPR 2017, 2020, 2023; ECCV 2020, 2022; NIPS 2021, 2023
Program committee, CVPR, ICCV, ECCV, NIPS
Georgia Tech / Google Robotics Workshop, May 2021.
iGibson Sim2Real Challenge, Embodied AI Workshop, CVPR 2020.
Robot Learning Workshop, Robot Learning Workshop, NSF & Lehigh University, 2019.
Anthony Francis, Claudia Perez-D'Arpino, Chengshu Li, Fei Xia, Alexandre Alahi, Aniket Bera, Abhijat Biswas, Joydeep Biswas, Hao-Tien Lewis Chiang, Michael Everett, Sehoon Ha, Justin Hart, Haresh Karnan, Tsang-Wei Edward Lee, Luis Manso, Reuth Mirsky, Soren Pirk, Phani Teja Singamaneni, Peter Stone, Ada Taylor, Peter Trautman, Nathan Tsoi, Marynel Vazquez, Xuesu Xiao, Peng Xu, Naoki Yokoyama, Roberto Martin-Martin, and Alexander Toshev, Benchmarking Robot Social Navigation across Academia and Industry, Symposium on HRI in Academia and Industry, March, 2023.
Chen Chen, Bowen Zhang, Liangliang Cao, Jiguang Shen, Tom Gunter, Albin Madappally Jose, Alexander Toshev, Jonathon Shlens, Ruoming Pang, Yinfei Yang, STAIR: Learning Sparse Text and Image Representation in Grounded Tokens, In Submission.
Kanchana Ranasinghe, Brandon McKinzie, Sachin Ravi, Yinfei Wang, Alexander Toshev, Jon Shlens, Perceptual Grouping in Vision-Language Models, ICCV, 2023.
Miguel Angel Bautista, Pengsheng Guo, Samira Abnar, Walter Talbott, Alexander Toshev, Zhuoyuan Chen, Laurent Dinh, Shuangfei Zhai, Hanlin Goh, Daniel Ulbricht, Afshin Dehghan, Josh Susskind, GAUDI: A Neural Architect for Immersive 3D Scene Generation, Neurips, 2022.
M. Dietke, et al., Retrospectives on Embodied AI Workshop, 2022, Position Paper.
M. Ahn, A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, Ch. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, D. Ho, J. Hsu, J. Ibarz, B. Ichter, A. Irpan, E. Jang, R. Jauregui Ruano, K. Jeffrey, S. Jesmonth, N. J Joshi, R. Julian, D. Kalashnikov, Y. Kuang, K.-H. Lee, S. Levine, Y. Lu, L. Luu, C. Parada, P. Pastor, J. Quiambao, K. Rao, J. Rettinghouse, D. Reyes, P. Sermanet, N. Sievers, Cl. Tan, A. Toshev, V. Vanhoucke, F. Xia, T. Xiao, P. Xu, S. Xu, M. Yan, Do As I Can, Not As I Say: Grounding Language in Robotic Affordances, CoRL, 2022, (oral), Special Innovation Award.
Haresh Karnan, Anirudh Nair, Xuesu Xiao, Garrett Warnell, Soeren Pirk, Alexander Toshev, Justin Hart, Joydeep Biswas, Peter Stone, Socially Compliant Navigation Dataset (SCAND): A Large-Scale Dataset of Demonstrations for Social Navigation, IROS, 2022.
Soeren Pirk, Edward Lee, Xuesu Xiao, Anthony Francis, Leila Takayama, Alexander Toshev, A Protocol for Evaluating Social Navigation Policies, ICRA Workshop on Social Robot Navigation: Advances and Evaluation, 2022.
Dhruv Shah, Peng Xu, Yao Lu, Ted Xiao, Alexander Toshev, Sergey Levine, Brian Ichter, Value Function Spaces, Skill-Centric State Abstractions for Long-Horizon Reasoning, ICLR, 2022.
Dhruv Batra, Aaron Gokaslan, Aniruddha Kembhavi, Oleksandr Maksymets, Roozbeh Mottaghi, Manolis Savva, Alexander Toshev, Erik Wijmans, Objectnav revisited: On evaluation of embodied agents navigating to objects, position paper, 2020
Fei Xia, Chengshu Li, Or Litany, Roberto Martin-Martin, Alexander Toshev, Silvio Savarese, ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation, 2020.
Fei Xia, William Chen, Chengshu Li, Priya Kasimbeg, Micael Tchampi, Alexander Toshev, Roberto Martin-Martin, Silvio Savarese, Interactive Gibson: A Benchmark in Navigation in Cluttered Environments, RA-Letters, 2020
Kuan Fang, Alexander Toshev, Silvio Savarese, Li Fei-Fei, Scene Memory Transformer for Embodied Agents in Long Horizon Tasks, CVPR 2019.
Ayzaan Wahid, Alexander Toshev, Marek Fiser, Edward Lee, Long Range Neural Navigation Policies for the Real World, IROS 2019.
Arsalan Mousavian, Alexander Toshev, Marek Fiser, Jana Kosecka, James Davidson, Visual Representations for Semantic Target Driven Navigation, ICRA 2019.
Fereshteh Sadeghi, Alexander Toshev, Eric Jang, Sergey Levine, Sim2Real Viewpoint Invariant Visual Servoing by Recurrent Control, CVPR 2018.
Language and Vision
Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge, IEEE Transactions on PAMI, 2017.
Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan L Yuille, Kevin Murphy, Generation and Comprehension of Unambiguous Object Descriptions, CVPR 2016.
Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, Show and tell: A neural image caption generator, CVPR 2015 (oral, 3100+ citations).
Human Pose Estimation
George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, Kevin Murphy, Towards accurate multi-person pose estimation in the wild, CVPR 2017 (best on human pose estimation on COCO).
Georgia Gkioxari, Alexander Toshev, Navdeep Jaitly, Chained Predictions Using Convolutional Neural Networks, ECCV 2016.
Alexander Toshev, Christian Szegedy, DeepPose: Human Pose Estimation via Deep Neural Networks, CVPR 2014 (oral, 1300+ citations).
Benjamin Sapp, Alexander Toshev, Ben Taskar, Cascaded Models for Articulated Pose Estimation, ECCV 2010.
Etienne Pot, Alexander Toshev, Jana Kosecka, Self-supervisory Signals for Object Discovery and Detection, 2018.
Dumitru Erhan, Christian Szegedy, Alexander Toshev, Dragomir Anguelov, Scalable Object Detection Using Deep Neural Networks, CVPR 2014 (700+ citations).
Christian Szegedy, Alexander Toshev, Dumitru Erhan, Deep Neural Networks for Object Detection, NIPS 2013 (800+ citations).
AJ Piergiovanni, Anelia Angelova, Alexander Toshev, Michael S Ryoo, Evolving Space-Time Neural Architectures for Videos, In Submission, 2019.
Yair Movshovitz-Attias, Alexander Toshev, Thomas K Leung, Sergey Ioffe, Saurabh Singh, No Fuss Distance Metric Learning via Proxies, ICCV 2017.
Jonathan Krause, Benjamin Sapp, Andrew Howard, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, Li Fei-Fei, The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition, ECCV 2016.
Yunchao Gong, Yangqing Jia, Thomas Leung, Alexander Toshev, Sergey Ioffe, Deep Convolutional Ranking for Multi-label Image Annotation, ICLR 2013
Alexander Toshev, Philippos Mordohai, Ben Taskar, Detecting and Parsing Architecture at City Scale from Range Data, CVPR 2010.
Alexander Toshev, Ben Taskar, Kostas Daniilidis, Object Detection via Boundary Structure Segmentation, CVPR 2010.
Alexander Toshev, Ameesh Makadia, Kostas Daniilidis, Shape-based object recognition in videos using 3D synthetic object models, CVPR 2009.
Alexander Toshev, Jianbo Shi, Kostas Daniilidis, Image Matching via Saliency Region Correspondences, CVPR 2007 (oral).
Alexander Toshev, Submodular Function Minimization, University of Pennsylvania, 2010.
Distance Metric Learning Using Proxies, Yair Movshovitz-Attias, Thomas Leung, Sergey Ioffe, Saurabh Singh, Alexander Toshev, 10,387,749, 2019.
Generating natural language descriptions of images, Samy Bengio, Oriol Vinyals, Alexander Toshev, Dumitru Erhan, US Patent 9,858,524, 2018.
Automatic translation of digital graphic novels, Greg Don Hartrell, Debajit Ghosh, Matthew William Vaughan-Vail, John Michael Rivlin, US Patent 9,881,003, 2018.
Sublinear time classification via feature padding and hashing, Sergey Ioffe, Alexander Toshev, US Patent 9,940,552, 2018.
Ranking approach to train deep neural nets for multilabel image annotation, Yunchao Gong, King Hong Thomas Leung, Alexander Toshev, Sergey Ioffe, US Patent 9,552,549, 2017.
Object detection using deep neural networks, Christian Szegedy, Dumitru Erhan, Alexander Toshev, US Patent 9,275,308, 2016.
System and method for using segmentation to identify object location in images, Vivek Kwatra, Jay Yagnik, Alexander Toshev, US Patent 9,483,701, 2016.
Object recognition, Alexander Toshev, King Hong Thomas Leung, Jiwoong Jack Sim, US Patent 8,942,468, 2015.
Perceptually-driven representation for object recognition, Alexander Toshev, Jay Yagnik, Vivek Kwatra, US Patent 9,008,356, 2015.
Discriminitive learning for object detection, Dragomir Anguelov, Alexander Toshkov Toshev, Deva K Ramanan, Xiangxin Zhu, US Patent 9,098,741, 2015.
System and method for exploiting segment co-occurrence relationships to identify object location in images, Vivek Kwatra, Jay Yagnik, Alexander Toshev, Poonam Suryanarayan, US Patent 8,768,048, 2014.
Segmentation-based feature pooling for object models, Alexander Toshev, Jay Yagnik, Vivek Kwatra, , US Patent 8,467,607, 2013.
Kanchana Ranasinghe, Stony Brook, co-advised with Jon Shlens
Dhruv Shah, Student at UC Berkeley, co-advised with Brian Ichter
Fei Xia, Robotics @ Google
Joe Campbell, Postdoc at CMU
Chengshu Li, Student at Stanford University
Kevin Chen, Apple
Fereshteh Sadeghi, DeepMind
Arsalan Mousavian, NVidia Robotics, co-advised with Jana Kosecka
Oana-Maria Camburu, Postdoc at Oxford University
Georgia Gkioxari, FAIR, co-advised with Navdeep Jaitly
Andre Araujo, Google, co-advised with Sergey Ioffe
Jonathan Krause, Google, co-advised with Howard Zhou
Kota Yamaguchi, Assist. Prof. at Tohoku University
Ling-Ling Tao, Facebook AI
Yunchao Gong, Verkada
Jack Sim, Waymo, co-advised with Thomas Leung