Graph-based Hierarchical Knowledge Representation for Robot Task Transfer from Virtual to Physical World

Zhenliang Zhang, Yixin Zhu, Song-Chun Zhu

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020

We study the hierarchical knowledge transfer problem using a cloth-folding task, wherein the agent is first given a set of human demonstrations in the virtual world using an Oculus Headset, and later transferred and validated on a physical Baxter robot. We argue that such an intricate robot task transfer across different embodiments is only realizable if an abstract and hierarchical knowledge representation is formed to facilitate the process, in contrast to prior literature of sim2real in a reinforcement learning setting. Specifically, the knowledge in both the virtual and physical worlds are measured by information entropy built on top of a graph-based representation, so that the problem of task transfer becomes the minimization of the relative entropy between the two worlds. An And-Or-Graph (AOG) is introduced to represent the knowledge, induced from the human demonstrations performed across six virtual scenarios inside the Virtual Reality (VR). During the transfer, the success of a physical Baxter robot platform across all six tasks demonstrates the efficacy of the graph-based hierarchical knowledge representation.

Pipeline for robot learning using virtual reality systems

Overview of the proposed framework for learning abstract knowledge for robot task transfer. (a) Using the Oculus headset and Touch controller, (b) a subject can demonstrate a sequence for the task of clothes folding in a physically realistic VR environment. Our algorithm is able to (c) induce a hierarchical graph-based knowledge representation based on human demonstrations, and (d) transfer it to a physical Baxter robot for execution by minimizing the entropy.

High-level abstraction with And-Or Graph

Illustration of the knowledge representation by an STC-And-Or-Graph (AOG). A parse or an instance of an AOG is termed as a parse graph (pg). Spatial-pg (S-pg) models the entities and their relations in the scene, Temporal-pg (T-pg) represents the action sequence, and Causal-pg (C-pg) extracts the perceived causality from the human demonstrations. In this example, the probability p5 determines the order of the sub-tasks, whereas p1,... ,p4 denote the probability of node to be executed.

Low-level abstraction with atomic action

Trajectory analysis using Gaussian fitting. Given a human demonstration of cloth-folding sequences in terms of (a) grasp points and (b) trajectories, our algorithm aggregates the raw data and fits the (c) start points and (d) end points with a Gaussian distribution; (e) folding trajectories are further estimated.

Virtual scenes for experiments

Six virtual environments with different fidelity for learning and evaluating the knowledge of folding clothes. (a) Grasp areas are recorded. (b) Grasp point is visible and recorded. (c) Full physics-based simulation but with only grasp points recorded. (d) Grasp point is recorded but the trajectory is the pre-defined line. (e) Grasp point is recorded but the trajectory is estimated. (f) Full physics-based simulation with the complete trajectory recorded.

Knowledge visualization for six different virtual scenes

Visualization of the learned knowledge represented by AOGs in six different scenes with various fidelity of physics and interaction. The horizontal axis is the fidelity of interaction, and the vertical axis is the fidelity of physics. In every block parameterized by alpha and beta, the left column is the induced C-AOG based on the observed data, and the number next to an edge is the branching probability. The middle column is the grasp point collected in virtual scenes. The right column is the final knowledge of atomic actions.

Factors that affect knowledge transfer

Utility landscape of folding clothes based on human subject ratings. (a) The fitted mesh based on the ranking results of users, which shows the basic landscape of human utility. (b) Fixing the realism of the interaction (beta), the mean score varies with the realism of the physics. (c) Fixing the realism of the physics (alpha), the mean score varies with the realism of the interaction.