4th Workshop on

Semantic Policy and Action Representations for Autonomous Robots (SPAR)

November 8th, 2019 - Macau, China

at IROS 2019


Invited Speakers

Kei Okada, The University of Tokyo.

http://www.jsk.t.u-tokyo.ac.jp/~k-okada/index-e.html

Stefanos Nikolaidis, University of Southern California.

https://stefanosnikolaidis.net/

Joseph Lim, University of Southern California.

https://viterbi-web.usc.edu/~limjj/

Darius Burschka, Technical University of Munich.

http://robvis01.informatik.tu-muenchen.de/

Chris Paxton, Nvidia Robotics lab.

https://research.nvidia.com/person/chris-paxton

Jesse Thomason, University of Washington

https://jessethomason.com/

Information about talk from the invited speakers

Kei Okada, The University of Tokyo.

Task Instantiation from Life-long Episodic Memories of Service Robots

In this talk, I will introduce a quick overview of JSK robotics research and present the task instantiation based on long-term experience memory, which aims to describe household robotics task in a more abstract way, independent of environment and contexts. The task instantiation formulates a concrete task description, which can be processed by a conventional symbolic and geometric planner, from an abstract one considering the user's preference or usually of the environment. The complete robotic systems are presented and experimental results on tidy up the table task based on over 1500 hours experience memory are shown.

Chris Paxton, Nvidia Robotics lab.

From Pixels to Task Planning and Execution

In order for robots to act intelligently in human environments, they must be able to connect multiple primitive actions together to accomplish new long-term goals through the composition of primitive skills. Our goal is to learn representations that can be used with task planning, with the goal being to enable generalization of learned behaviors to new tasks and new environments using sensor data. We propose Deep Planning Domain Learning to capture the preconditions and effects of various actions, which allows us to search for goal states. The representation learned by DPDL can be used for high-level task planning, and also gives us a set of low-level policies for task execution, all from sensor data. These methods bring us closer to the goal of general-purpose robots operating on sensor data in human environments.

Tanja Schultz, University of Bremen.

Biosignal Processing for Modeling Human Everyday Activities

To facilitate robotic mastery of everyday activities (Collaborative Research Center EASE “Everyday Activities Science and Engineering,” http://ease-crc.org), we are creating a large-scale empirical data and knowledge basis of how humans plan, structure, and execute everyday activities. For this purpose we recently established the Biosignal Acquisition Space and Environment (BASE), an interaction space equipped with a wide variety of recording sensors and devices including cameras, microphones, a motion capture system, as well as body-worn sensors for inertia, electromyography, electroencephalography, and eyegaze. In my talk, I describe our efforts to process and interpret the synchronously recorded high-dimensional Biosignal data which represent human motion, gaze, speech, muscle and brain activities while performing everyday household chores like setting a table. In particular, I will present an analysis of two think-aloud protocols used by the human subjects to describe their activities during and after performance – these data inform machine learning methods to automatically segment, structure, and label the high-dimensional Biosignal data to provide a valuable resource for EASE.

Darius Burschka, Technical University of Munich.

Understanding the Static and Dynamic Scene Context for Human-Robot Collaboration in Households

A successful collaboration between robots and humans requires a good knowledge about the currents scene, which does not follow any procedural description in case of service scenarios. While industrial setups are structurally well defined and follow in general fix task description, it is very difficult to the system to understand what is currently happening in a household environment. The system needs not only to understand the dynamic actions happening in the scene to match them to known tasks and to ground them into the current scene observation, it needs also to understand, how the objects detected in the current scene are used. We refer to it as dynamic context describing the dynamic changes in the environment and as static context describing how the 3D structure in the scene is used.

The dynamic context requires a lifting of the observed actions to human-conform descriptions, like eating, cleaning, watching, etc. It allows to better follow human descriptions of action and it allows a natural communication with humans about the current scene. The static context analysis allows the robot to guess, how the current scene is used. We will show in the talk that the same number of recognised objects that are arranged in different ways can infer the current context to be, e.g. a meeting or a dinner. The robot requires this second type of environment to understand how to help to clean or setup a scene and how to detect failures that need to be fixed or reported.

Jesse Thomason, University of Washington.

Action Learning from Realistic Environments with Directives

A robot assistant given the high-level instruction "Put a plate of toast on the table" must infer many steps, from finding a knife to operating a toaster. We are creating the first large-scale dataset of simulated plans including cooking, cleaning, and tidying in interactive home environments (i.e., requiring navigation, pick-and-place, toggling appliances, and opening/closing), where each plan is annotated with high- and low-level language instructions. The agent uses low-level language instructions like "Walk forward to the counter, turn to your left and get the knife from the second drawer" to build hierarchical planning models that infer actions directly from high-level commands.

Stefanos Nikolaidis, University of Southern California.

Learning Collaborative Action Plans from YouTube Videos

People often watch videos on the web to learn how to cook new recipes, assemble furniture or repair a computer. Our wish is to enable robots with the very same capability. I will present a framework for learning and executing individual and collaborative actions from full-length, unconstrained videos on the web. The framework leverages the temporal and spatial information that hands provide about tools, actions and manipulated objects. I will then show how the actions performed by two humans in a cooking task are reproduced by two robotic arms using our open-sourced platform for robotic manipulation and I will address current challenges and opportunities.

Georg von Wichert, Siemens.

Top-down and bottom-up: Formal representations of things and procedures in autonomous systems

Classical automation is good for mass production, but currently the world of production is massively changing. Ever shorter product life cycles and low-volume/high-mix production require unprecedented flexibility and versatility of production equipment. We therefore intend to create autonomous automation systems, that can adapt to changing products and production volumes without human intervention based on formal specifications of the products to be manufactured and the capabilities of the production systems used. In this setup, the expressiveness of the state and action representations is crucial. We need both, abstract, symbolic representations (for scalability) and subsymbolic representations (for everything related to performance) applied and exploited in a consistent manner. In this talk we will show our approach to do this and present some real world examples.

Joseph Lim, University of Southern California.

Towards Solving Complex Physical Tasks Via Learning Methods

Many robotics tasks, even seemingly simple procedural tasks like assembly and cleaning, require a continuous cycle of planning, learning, adapting and executing of diverse skills and sub-tasks. It is thus hard to scale and generalize learning agents and hard-programed agents to long-horizon complex tasks. To this end, my research is about enabling autonomous agents to perform long-horizon, complex physical tasks. More specifically, I propose a hybrid paradigm that augments classic rule-based methods (program) with the flexibility of learning-based approaches. The key insights are that the program representation enforces an explicit split between the hierarchical subtask structure of any long-horizon task and the exact execution of each of the subskills, and hence learning methods can focus on elemental components, such as skill acquisition, program inference, and task execution, rather than learning everything end-to-end. When combined with recent learning methods, such as deep reinforcement learning and meta-learning, our approach can generate flexible and interpretable long-horizon plans, and adaptively follow these plans using a set of learned subskills. In this talk, I will talk about three layers on my work: (1) learning to infer a program, (2) developing learning methods for skill acquisition and generalization, and (3) learning to execute a program-guided task.