5th Workshop on

Semantic Policy and Action Representations for Autonomous Robots (SPAR)

September 27, 2021 - Prague, Czech Republic (online)

at IROS 2021


Invited Speakers

All talks are now available on YouTube via the SPAR 2021 talks playlist.

Karthik Desingh

Univ of Washington

Aleksandra Faust

Google Brain

Nakul Gopalan

Georgia Tech

Gayane Kazhoyan

University of Bremen

George Konidaris

Brown University

Oliver Kroemer

Carnegie Mellon

Tetsuya Ogata

Waseda University

Matteo Saveriano

University of Innsbruck

Zhou Yu

Columbia University

Information about talk from the invited speakers

Aleksandra Faust

Google Brain Research

Learning to Learn for RL

Training reinforcement learning agents to perform complex tasks in real world environments is a difficult process, requiring heavy engineering. In fact, we can formulate the interaction between the human engineer and RL agent under training as a decision-making process that the human agent performs, and consequently automate the RL training by learning a decision making policy. In this talk we will cover several examples that illustrate the process, learning intrinsic rewards, RL loss functions, neural network architecture search, curriculum for continual learning, and even learning the accelerator parameters. We show that across different applications, learning to learn methods improve RL agents generalization and performance.

Biography


Aleksandra Faust is a Staff Research Scientist and Reinforcement Learning research team co-founder at Google Brain Research. Previously, Aleksandra founded and led Task and Motion Planning research in Robotics at Google, machine learning for self-driving car planning and controls in Waymo, and was a senior researcher in Sandia National Laboratories. She earned a Ph.D. in Computer Science at the University of New Mexico (with distinction), and a Master's in Computer Science from the University of Illinois at Urbana-Champaign. Her research interests include learning for safe and scalable reinforcement learning, learning to learn, motion planning, decision-making, and robot behavior. Aleksandra won IEEE RAS Early Career Award for Industry, the Tom L. Popejoy Award for the best doctoral dissertation at the University of New Mexico in the period of 2011-2014, and was named Distinguished Alumna by the University of New Mexico School of Engineering. Her work has been featured in the New York Times, PC Magazine, ZdNet, VentureBeat, and was awarded Best Paper in Service Robotics at ICRA 2018, Best Paper in Reinforcement Learning for Real Life (RL4RL) at ICML 2019, and Best Paper of IEEE Computer Architecture Letters in 2020.

Nakul Gopalan

Georgia Tech

Learning Transferable Symbols and Language Groundings from Perceptual Data for Instruction Following

A collaborative robot should be able to learn novel task specifications from its users to be a general purpose, programmable device. To learn novel tasks from people we must enable robots to learn 1) knowledge representations that can be leveraged for efficient planning and skill learning and; 2) mechanisms for natural language communication that enable the robot to understand a human partner's intent. In this work, I solve both of these problems. I show how representations for planning and language grounding can be learned together to follow commands in novel environments. This approach provides a framework to teach robots unstructured tasks via language to enable deployment of cooperative robots in homes, offices and industries.

Biography


Nakul Gopalan is a postdoctoral researcher in the CORE Robotics Lab with Prof. Matthew Gombolay at Georgia Tech. He completed his PhD at Brown University's Computer Science department in 2019. Previously he was a graduate student in Prof. Stefanie Tellex's H2R lab at Brown. His research interests lie at the intersection of language grounding and robot learning. Nakul has developed algorithms and methods that allow robots to be trained by leveraging demonstrations and natural language descriptions. Such learning would improve the usability of robots within homes and offices. His other research interests are in hierarchical reinforcement learning and planning. His work has received a best paper award at the RoboNLP workshop at ACL 2017.

George Konidaris

Brown University

Signal to Symbol (via Skills)

I will address the question of how a robot should learn an abstract, task-specific representation of an environment. I will present a constructivist approach, where the computation the representation is required to support - here, planning using a given set of motor skills - is precisely defined, and then its properties are used to build the representation so that it is capable of doing so by construction. The result is a formal link between the skills available to a robot and the symbols it should use to plan with them. I will present an example of a robot autonomously learning a (sound and complete) abstract representation directly from sensorimotor data, and then using it to plan. I will also discuss ongoing work on making the resulting abstractions portable across tasks.

Biography


George Konidaris is the John E. Savage Assistant Professor of Computer Science at Brown and the Chief Roboticist of Realtime Robotics, a startup commercializing his work on hardware-accelerated motion planning. He holds a BScHons from the University of the Witwatersrand, an MSc from the University of Edinburgh, and a PhD from the University of Massachusetts Amherst. Prior to joining Brown, he held a faculty position at Duke and was a postdoctoral researcher at MIT. George is the recent recipient of an NSF CAREER award, young faculty awards from DARPA and the AFOSR, and the IJCAI-JAIR Best Paper Prize.

Tetsuya Ogata

Waseda University

Toward Embodied Intelligence with Predictive Learning – From Data to Experiences

In order to adapt to the complex real world, it is essential not only to acquire an optimal behavior policy by machine learning etc., but also to adjust the behavior itself in real time based on the policy of prediction error minimization from the viewpoint of experience which is interaction between the body and the environment. In this talk, I will introduce an overview of deep predictive learning (DPL) proposed by the authors to realize such "embodied intelligence". I will also introduce the examples of our work with several companies using DPL, the latest research results on tool use and flexible object handling, and an overview of our proposal AIREC (AI-driven Robot for Embrace and Care) in the "Moonshot", a large-scale R&D program in Japan.

Biography


Tetsuya Ogata received the B.S., M.S., and D.E. degrees in mechanical engineering from Waseda University, in 1993, 1995, and 2000, respectively. He was a Research Associate with Waseda University from 1999 to 2001. From 2001 to 2003, he was a Research Scientist with the RIKEN Brain Science Institute. From 2003 to 2012, he was an Associate Professor with the Graduate School of Informatics, Kyoto University. Since 2012, he has been a Professor with the Faculty of Science and Engineering, Waseda University. Since 2017, he is a Joint-Appointed Research Fellow with the Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology. Since 2020, he is a director of the Institute of AI and robots, Waseda University. His current research interests include human-robot interaction, dynamics of human-robot mutual adaptation, and inter-sensory translation in robot systems with neuro-dynamical models.

Gayane Kazhoyan

University of Bremen

Semantic Representations for Scalability and Transferability to Novel Domains in Mobile Manipulation

This talk presents an approach to plan, parametrize and execute actions on mobile manipulation robots. The approach utilizes semantic representations to allow the system to scale to large execution domains and enable transfer to novel domains. The examined domains have four main dimensions of variation: (1) the types of the manipulated objects, (2) the configurations of the robot's environment, (3) the specifics of the robot's hardware, and (4) the application-specific requirements. One of the core concepts of the proposed approach are the scalable hierarchical models of robot actions and their implementation as generalized reactive plans. The plans are implemented using the operators of the "robot programming language" CPL, developed specifically for writing robot action plans. In order to generalize the action plans over multiple objects, environments, robot platforms and applications, the concept of symbolic action descriptions is proposed. These are underspecified descriptions of an action that are augmented during execution with subsymbolic parameter values specific to the context at hand. The proposed approach is evaluated on multiple physical and simulated robots. The demonstration applications involve variations of mobile pick and place actions and opening / closing doors and drawers in the robot's environment.

Biography


Gayane (shortly Gaya) is a PhD student supervised by Michael Beetz at the University of Bremen and currently an intern at the newest Google X robotics spin-off called Intrinsic Innovation. Her expertise is in plan executives and reactive planning for mobile manipulation, mostly in the household domain. In summer 2019 she was a visiting student at the LIS group of MIT led by Leslie Kaelbling and Tomas Lozano-Perez, where she worked on integrating TAMP with reactive planning. Before starting the PhD, she worked for a year at Frank Emika. She got her Master's degree majoring in Robotics and AI at the Technical University of Munich.

Karthik Desingh

Univ of Washington

Learning Object-centric Representations for Robot Manipulation Tasks

A crucial question for complex multi-step robotic tasks is how to represent relationships between entities in the world, particularly as they pertain to preconditions for various skills the robot might employ. In goal-directed sequential manipulation tasks with long-horizon planning, it is common to use a state estimator followed by a task and motion planner or other model-based system. A variety of powerful approaches exist for explicitly estimating the state of objects in the world. However, it is challenging to generalize these approaches to an arbitrary collection of objects. In addition, the objects are often in contact in manipulation scenarios, where explicit state estimation struggles from the problem of generalizing to unseen objects.

Fortunately, knowing exact poses of objects may not be necessary for manipulation. End-to-end methods leverage that fact and build networks that generate actions directly without explicitly representing objects. Nevertheless, these networks are very specific to the tasks they are trained on. For example, it is non-trivial to use a network trained on stacking blocks to unstack blocks.

In this talk, I will talk about our recent work where we take an important step towards a manipulation framework that generalizes few-shot to unseen tasks with unseen objects. Specifically, we propose a neural network that extracts implicit object embeddings directly from raw RGB images. Trained from large amounts of simulated robotic manipulation data, the object-centric embeddings produced by our network can be used to predict spatial relationships between the entities in the scene to inform a task and motion planner with relevant implicit state

information toward goal-directed sequential manipulation tasks.

Biography


Karthik Desingh works as a Postdoctoral Scholar at the University of Washington (UW) with Professor Dieter Fox. Before joining UW, he received his Ph.D. in Computer Science and Engineering from the University of Michigan working with Professor Chad Jenkins. During his Ph.D. he was closely associated with the Robotics Institute and Michigan AI. He earned his B.E. in Electronics and Communication Engineering at Osmania University, India, and M.S. in Computer Science at IIIT-Hyderabad and Brown University. He researches at the intersection of robotics, computer vision, and machine learning, primarily focusing on providing perceptual capabilities to robots using deep learning and probabilistic techniques to perform goal-directed tasks in unstructured environments.

Zhou Yu

Columbia University

Teaching Machines through Natural Language Interactions

Humans routinely learn new concepts through natural language communications. Learning to ask good questions is a key step towards effective learning. Can machines do the same? In this talk, we will discuss how can a machine learn to ask good natural language questions and plan dynamically on what questions to ask next to learn more effectively in low-resource learning settings. We will use a fine-grained classification task and a simulated robotics task as our applications.

Biography


Zhou Yu joined the CS department at Columbia University in Jan 2021 as an Assistant Professor. Before that, she was an Assistant Professor at UC Davis. She obtained her Ph.D. from Carnegie Mellon University in 2017. Zhou has built various dialog systems that have a real impact, such as a job interview training system, a depression screening system, and a second language learning system. Her research interest includes dialog systems, language understanding and generation, vision and language, human-computer interaction, and social robots. Zhou received an ACL 2019 best paper nomination, featured in Forbes 2018 30 under 30 in Science, and won the 2018 Amazon Alexa Prize.

Task and Motion Planning using Mixed Discrete, Continuous, Probabilistic, and Learned Representations

We seek to program a robot to autonomously complete complex tasks in a variety of real-world settings involving different environments, objects, manipulation skills, degrees of observability, initial states, and goal objectives. In order to successfully generalize across these settings, we take a model-based approach to building the robot's policy, which enables it to reason about the effects of it executing different sequences of parameterized manipulation skills. Specifically, we introduce a general-purpose hybrid planning framework that uses streams, modules that encode sampling procedures, to generate continuous parameter-value candidates. We present several domain-independent algorithms that efficiently combine streams in order to solve for parameter values that jointly satisfy the constraints necessary for a sequence of skills to achieve the goal. Each stream can be either engineered to perform a standard robotics subroutine, like inverse kinematics and collision checking, or learned from data to capture difficult-to-model behaviors, such as pouring, scooping, and grasping. Streams are also able to represent probabilistic inference operations, which enables our framework to plan in belief space and intentionally select actions that reduce the robot's uncertainty about the unknown world. We demonstrate the generality of our approach by applying it to several real-world tabletop, kitchen, and construction tasks and show that it can even be effective in settings involving objects that the robot has never seen before.


Biography

Caelan Garrett is a research scientist at NVIDIA's Seattle Robotics Lab which is led by Professor Dieter Fox. He received his PhD at MIT in the Learning and Intelligent Systems group within CSAIL where he was advised by Professors Tomás Lozano-Pérez and Leslie Pack Kaelbling. His research is on integrating robot motion planning, discrete AI planning, and machine learning to flexibly and efficiently plan for autonomous mobile manipulators operating in human environments. He recently authored the first survey paper on integrated task and motion planning. He is a recipient of the NSF Graduate Research Fellowship. He has previously interned in the autonomous vehicle industry while at Optimus Ride and in the autonomous fulfillment industry while at Amazon Robotics.


Oliver Kroemer

Carnegie Mellon

Learning to Structure Manipulation Skills

In the future, we want to create robots with the robustness and versatility to operate in unstructured and everyday environments. To achieve this goal, robots will need to learn manipulation skills that can be applied to a wide range of objects and task scenarios. In this talk, I will be presenting recent work from my lab on structuring

manipulation tasks for more efficient learning. I will discuss how modularity can be used to break down

challenging manipulation tasks to learn general object-centric solutions.

Biography

Oliver Kroemer received the bachelor's and master's degrees in engineering from the University of Cambridge, Cambridge, U.K., in 2008, and the Ph.D. degree in computer science from the Technische Universitaet Darmstadt, Darmstadt, Germany, in 2014.,He was a Postdoctoral Researcher with the University of Southern California (USC), Los Angeles, CA, USA, for two and a half years. He is currently an Assistant Professor with the Robotics Institute, Carnegie Mellon University (CMU), Pittsburgh, PA, USA, where he leads the Intelligent Autonomous Manipulation Lab. His research focuses on developing algorithms and representations to enable robots to learn versatile and robust manipulation skills. [1]


Matteo Saveriano

University of Innsbruck

Hierarchical action decomposition and motion learning for the execution of manipulation tasks

The execution of robotic manipulation tasks requires sophisticated tasks and motion planning. In this domain, the problem arises of generating physically feasible plans. This problem has been typically addressed in the robotic community by exploiting geometric reasoning and intensive, physics-based simulation. In this talk, I present recent work to tackle this problem.


An object-centered description of geometric constraints is used for task planning, allowing to generate physically plausible plans in changing domains. Action grounding is implemented using a task and motion planning approach that hierarchically decomposes a symbol to generate executable robotics commands. The talk describes the developed approach and shows promising results in complex manipulation tasks.

Towards Zero-Shot Emergent Communication for Embodied Agents

Effective communication is an important skill for enabling information exchange and cooperation in multi-agent settings, in which AI agents coexist in shared environments with other agents (artificial or human). Indeed, emergent communication is now a vibrant field of research, with common settings involving discrete cheap-talk channels. One limitation of this setting however is that it does not allow for the emergent protocols to generalize beyond the training partners. Furthermore, the typical problem setting of discrete cheap-talk channels may be less appropriate for embodied agents that communicate implicitly through physical action. This talk presents research that investigates methods for enabling AI agents to learn general communication skills through interaction with other artificial agents. In particular, the talk will focus on my ongoing work within Multi-Agent Reinforcement Learning, investigating emergent communication protocols, inspired by communication in more realistic settings. We present a novel problem setting and a general approach that allows for zero-shot communication (ZSC), i.e., emergence of communication protocols that can generalize to independently trained agents. We also explore and analyze specific difficulties associated with finding globally optimal ZSC protocols, as complexity of the communication task increases or the modality for communication changes (e.g. from symbolic communication to implicit communication through physical movement, by an embodied artificial agent). Overall, this work opens up exciting avenues for learning general communication protocols in more complex domains.

Biography

Kalesha Bullard recently completed a Postdoctoral Fellowship at Facebook AI Research and is soon to begin as a Research Scientist with the Multi-Agent team at DeepMind. Her research is generally in the space of multi-agent artificial intelligence. It focuses on developing principled methods for interactive and reinforcement learning for artificial agents in cooperative multi-agent settings. Over the course of her career, Kalesha’s work has enabled learning in shared environments with both human partners (PhD) and other artificial agents (Postdoc). Kalesha received her PhD in Computer Science from Georgia Institute of Technology in 2019; her doctoral research was in interactive robot learning and focused on active learning with human teachers. Beyond research, Kalesha has participated in a number of service roles throughout her research career: currently serving as the Program Chair for 2021 NeurIPS Workshop on Cooperative AI. Recently, she also served as an organizing committee member for the 2020 NeurIPS Workshop on Zero-Shot Emergent Communication, a Program Committee member for the 2020 NeurIPS Cooperative AI Workshop, and an Area Chair for the 2019 NeurIPS Women in Machine Learning Workshop. This past year, Kalesha was selected as one of the 2020 Electrical Engineering and Computer Science (EECS) Rising Stars.