All talks are now available on YouTube via the SPAR 2021 talks playlist.
DeepMind
Univ of Washington
Google Brain
NVIDIA
Georgia Tech
University of Bremen
Brown University
Carnegie Mellon
Training reinforcement learning agents to perform complex tasks in real world environments is a difficult process, requiring heavy engineering. In fact, we can formulate the interaction between the human engineer and RL agent under training as a decision-making process that the human agent performs, and consequently automate the RL training by learning a decision making policy. In this talk we will cover several examples that illustrate the process, learning intrinsic rewards, RL loss functions, neural network architecture search, curriculum for continual learning, and even learning the accelerator parameters. We show that across different applications, learning to learn methods improve RL agents generalization and performance.
A collaborative robot should be able to learn novel task specifications from its users to be a general purpose, programmable device. To learn novel tasks from people we must enable robots to learn 1) knowledge representations that can be leveraged for efficient planning and skill learning and; 2) mechanisms for natural language communication that enable the robot to understand a human partner's intent. In this work, I solve both of these problems. I show how representations for planning and language grounding can be learned together to follow commands in novel environments. This approach provides a framework to teach robots unstructured tasks via language to enable deployment of cooperative robots in homes, offices and industries.
I will address the question of how a robot should learn an abstract, task-specific representation of an environment. I will present a constructivist approach, where the computation the representation is required to support - here, planning using a given set of motor skills - is precisely defined, and then its properties are used to build the representation so that it is capable of doing so by construction. The result is a formal link between the skills available to a robot and the symbols it should use to plan with them. I will present an example of a robot autonomously learning a (sound and complete) abstract representation directly from sensorimotor data, and then using it to plan. I will also discuss ongoing work on making the resulting abstractions portable across tasks.
Waseda University
In order to adapt to the complex real world, it is essential not only to acquire an optimal behavior policy by machine learning etc., but also to adjust the behavior itself in real time based on the policy of prediction error minimization from the viewpoint of experience which is interaction between the body and the environment. In this talk, I will introduce an overview of deep predictive learning (DPL) proposed by the authors to realize such "embodied intelligence". I will also introduce the examples of our work with several companies using DPL, the latest research results on tool use and flexible object handling, and an overview of our proposal AIREC (AI-driven Robot for Embrace and Care) in the "Moonshot", a large-scale R&D program in Japan.
University of Bremen
This talk presents an approach to plan, parametrize and execute actions on mobile manipulation robots. The approach utilizes semantic representations to allow the system to scale to large execution domains and enable transfer to novel domains. The examined domains have four main dimensions of variation: (1) the types of the manipulated objects, (2) the configurations of the robot's environment, (3) the specifics of the robot's hardware, and (4) the application-specific requirements. One of the core concepts of the proposed approach are the scalable hierarchical models of robot actions and their implementation as generalized reactive plans. The plans are implemented using the operators of the "robot programming language" CPL, developed specifically for writing robot action plans. In order to generalize the action plans over multiple objects, environments, robot platforms and applications, the concept of symbolic action descriptions is proposed. These are underspecified descriptions of an action that are augmented during execution with subsymbolic parameter values specific to the context at hand. The proposed approach is evaluated on multiple physical and simulated robots. The demonstration applications involve variations of mobile pick and place actions and opening / closing doors and drawers in the robot's environment.
Univ of Washington
A crucial question for complex multi-step robotic tasks is how to represent relationships between entities in the world, particularly as they pertain to preconditions for various skills the robot might employ. In goal-directed sequential manipulation tasks with long-horizon planning, it is common to use a state estimator followed by a task and motion planner or other model-based system. A variety of powerful approaches exist for explicitly estimating the state of objects in the world. However, it is challenging to generalize these approaches to an arbitrary collection of objects. In addition, the objects are often in contact in manipulation scenarios, where explicit state estimation struggles from the problem of generalizing to unseen objects.
Fortunately, knowing exact poses of objects may not be necessary for manipulation. End-to-end methods leverage that fact and build networks that generate actions directly without explicitly representing objects. Nevertheless, these networks are very specific to the tasks they are trained on. For example, it is non-trivial to use a network trained on stacking blocks to unstack blocks.
In this talk, I will talk about our recent work where we take an important step towards a manipulation framework that generalizes few-shot to unseen tasks with unseen objects. Specifically, we propose a neural network that extracts implicit object embeddings directly from raw RGB images. Trained from large amounts of simulated robotic manipulation data, the object-centric embeddings produced by our network can be used to predict spatial relationships between the entities in the scene to inform a task and motion planner with relevant implicit state
information toward goal-directed sequential manipulation tasks.
Humans routinely learn new concepts through natural language communications. Learning to ask good questions is a key step towards effective learning. Can machines do the same? In this talk, we will discuss how can a machine learn to ask good natural language questions and plan dynamically on what questions to ask next to learn more effectively in low-resource learning settings. We will use a fine-grained classification task and a simulated robotics task as our applications.
NVIDIA
We seek to program a robot to autonomously complete complex tasks in a variety of real-world settings involving different environments, objects, manipulation skills, degrees of observability, initial states, and goal objectives. In order to successfully generalize across these settings, we take a model-based approach to building the robot's policy, which enables it to reason about the effects of it executing different sequences of parameterized manipulation skills. Specifically, we introduce a general-purpose hybrid planning framework that uses streams, modules that encode sampling procedures, to generate continuous parameter-value candidates. We present several domain-independent algorithms that efficiently combine streams in order to solve for parameter values that jointly satisfy the constraints necessary for a sequence of skills to achieve the goal. Each stream can be either engineered to perform a standard robotics subroutine, like inverse kinematics and collision checking, or learned from data to capture difficult-to-model behaviors, such as pouring, scooping, and grasping. Streams are also able to represent probabilistic inference operations, which enables our framework to plan in belief space and intentionally select actions that reduce the robot's uncertainty about the unknown world. We demonstrate the generality of our approach by applying it to several real-world tabletop, kitchen, and construction tasks and show that it can even be effective in settings involving objects that the robot has never seen before.
Caelan Garrett is a research scientist at NVIDIA's Seattle Robotics Lab which is led by Professor Dieter Fox. He received his PhD at MIT in the Learning and Intelligent Systems group within CSAIL where he was advised by Professors Tomás Lozano-Pérez and Leslie Pack Kaelbling. His research is on integrating robot motion planning, discrete AI planning, and machine learning to flexibly and efficiently plan for autonomous mobile manipulators operating in human environments. He recently authored the first survey paper on integrated task and motion planning. He is a recipient of the NSF Graduate Research Fellowship. He has previously interned in the autonomous vehicle industry while at Optimus Ride and in the autonomous fulfillment industry while at Amazon Robotics.
Carnegie Mellon
In the future, we want to create robots with the robustness and versatility to operate in unstructured and everyday environments. To achieve this goal, robots will need to learn manipulation skills that can be applied to a wide range of objects and task scenarios. In this talk, I will be presenting recent work from my lab on structuring
manipulation tasks for more efficient learning. I will discuss how modularity can be used to break down
challenging manipulation tasks to learn general object-centric solutions.
Oliver Kroemer received the bachelor's and master's degrees in engineering from the University of Cambridge, Cambridge, U.K., in 2008, and the Ph.D. degree in computer science from the Technische Universitaet Darmstadt, Darmstadt, Germany, in 2014.,He was a Postdoctoral Researcher with the University of Southern California (USC), Los Angeles, CA, USA, for two and a half years. He is currently an Assistant Professor with the Robotics Institute, Carnegie Mellon University (CMU), Pittsburgh, PA, USA, where he leads the Intelligent Autonomous Manipulation Lab. His research focuses on developing algorithms and representations to enable robots to learn versatile and robust manipulation skills. [1]
University of Innsbruck
The execution of robotic manipulation tasks requires sophisticated tasks and motion planning. In this domain, the problem arises of generating physically feasible plans. This problem has been typically addressed in the robotic community by exploiting geometric reasoning and intensive, physics-based simulation. In this talk, I present recent work to tackle this problem.
An object-centered description of geometric constraints is used for task planning, allowing to generate physically plausible plans in changing domains. Action grounding is implemented using a task and motion planning approach that hierarchically decomposes a symbol to generate executable robotics commands. The talk describes the developed approach and shows promising results in complex manipulation tasks.
DeepMind
Effective communication is an important skill for enabling information exchange and cooperation in multi-agent settings, in which AI agents coexist in shared environments with other agents (artificial or human). Indeed, emergent communication is now a vibrant field of research, with common settings involving discrete cheap-talk channels. One limitation of this setting however is that it does not allow for the emergent protocols to generalize beyond the training partners. Furthermore, the typical problem setting of discrete cheap-talk channels may be less appropriate for embodied agents that communicate implicitly through physical action. This talk presents research that investigates methods for enabling AI agents to learn general communication skills through interaction with other artificial agents. In particular, the talk will focus on my ongoing work within Multi-Agent Reinforcement Learning, investigating emergent communication protocols, inspired by communication in more realistic settings. We present a novel problem setting and a general approach that allows for zero-shot communication (ZSC), i.e., emergence of communication protocols that can generalize to independently trained agents. We also explore and analyze specific difficulties associated with finding globally optimal ZSC protocols, as complexity of the communication task increases or the modality for communication changes (e.g. from symbolic communication to implicit communication through physical movement, by an embodied artificial agent). Overall, this work opens up exciting avenues for learning general communication protocols in more complex domains.
Kalesha Bullard recently completed a Postdoctoral Fellowship at Facebook AI Research and is soon to begin as a Research Scientist with the Multi-Agent team at DeepMind. Her research is generally in the space of multi-agent artificial intelligence. It focuses on developing principled methods for interactive and reinforcement learning for artificial agents in cooperative multi-agent settings. Over the course of her career, Kalesha’s work has enabled learning in shared environments with both human partners (PhD) and other artificial agents (Postdoc). Kalesha received her PhD in Computer Science from Georgia Institute of Technology in 2019; her doctoral research was in interactive robot learning and focused on active learning with human teachers. Beyond research, Kalesha has participated in a number of service roles throughout her research career: currently serving as the Program Chair for 2021 NeurIPS Workshop on Cooperative AI. Recently, she also served as an organizing committee member for the 2020 NeurIPS Workshop on Zero-Shot Emergent Communication, a Program Committee member for the 2020 NeurIPS Cooperative AI Workshop, and an Area Chair for the 2019 NeurIPS Women in Machine Learning Workshop. This past year, Kalesha was selected as one of the 2020 Electrical Engineering and Computer Science (EECS) Rising Stars.