Leveraging Demonstrator-perceived Precision for Safe Interactive Imitation Learning of Clearance-limited Tasks

Hanbit Oh and Takamitsu Matsubara

Paper:[LINK ] | Arxiv: [LINK] | Youtube: [LINK]

Abstract

Interactive imitation learning is an efficient, model-free method through which a robot can learn a task by repetitively iterating an execution of a learning policy and a data collection by querying human demonstrations. However, deploying unmatured policies for clearance-limited tasks, like industrial insertion, poses significant collision risks. For such tasks, a robot should detect the collision risks and request intervention by ceding control to a human when collisions are imminent. The former requires an accurate model of the environment, a need that significantly limits the scope of IIL applications. In contrast, humans implicitly demonstrate environmental precision by adjusting their behavior to avoid collisions when performing tasks. Inspired by human behavior, this paper presents a novel interactive learning method that uses demonstrator-perceived precision as a criterion for human intervention called Demonstrator-perceived Precision-aware Interactive Imitation Learning (DPIIL). DPIIL captures precision by observing the speed-accuracy trade-off exhibited in human demonstrations and cedes control to a human to avoid collisions in states where high precision is estimated. DPIIL improves the safety of interactive policy learning and ensures efficiency without explicitly providing precise information of the environment. We assessed DPIIL's effectiveness through simulations and real-robot experiments that trained a UR5e 6-DOF robotic arm to perform assembly tasks. Our results significantly improved training safety, and our best performance compared favorably with other learning methods.

Fig1. In clearance-limited tasks, demonstrator-perceived precision is in the mind of humans. By capturing this precision level from demonstration data and incorporating it into IIL, a robot can cede control to a human (expert mode, bottom) in high-precision areas while executing its policy (auto mode, top) in low-precision areas, thus enhancing safety.

Our key contributions of this paper are as follows:

We develop a novel method to estimate collision risk associated with environmental precision by leveraging demonstrator-perceived precision.
We present a safe IIL algorithm, DPIIL (Fig1), which uses collision risk as criteria to request human interventions when significant risk is estimated.
We validate our method (DPIIL) in clearance-limited simulations (e.g., aperture-passing and ring-threading tasks) and in real-robot experiments (e.g., shaft-reaching and ring-threading tasks). The results show significantly improved training phase safety compared to other learning methods.

Aperture-passing in Simulation

An aperture-passing task involving multiple narrow apertures was initially performed in the OpenAI gym environment. In this experiment, interactive and robot-autonomous performances are evaluated in a challenging environment that includes states where such physical contacts are likely to occur as passing through narrow apertures, although no contacts are allowed for task success. The task goal is to move the agent (circles with a 0.25cm radius) clock-wise from the starting position through the apertures (each of which has a width of 3.0cm and 1.5cm sequentially) to the goal without colliding with the walls (gray). The system state and action are the agent's position (e.g., x, y-axis coordinates) and velocity (e.g., x, y-axis).

Initial Demonstration

Precision

Uncertainty

DAgger

Intervention criteria: Random

❌ Fail

EnsembleDAgger

Intervention criteria: Uncertainty of policy

❌ Fail

DPIIL (Ours)

Intervention criteria: Uncertainty of policy + Precision

✅ Success

Ring-threading in Simulation

To evaluate DPIIL's scalability, a second experiment was conducted for learning a ring-threading task with a 6-DOF UR5e robot in a Robosuite environment. The goal is to grasp a ring with the random initial positions and insert it through a peg with a fixed position, regardless of the physical contact. The dimension of the state is 51D, consisting of the robot's joint angles and the ring's position, as described in Table 1. The action is 6D, specifying the end-effector translation (e.g., x, y, z-axes), rotation (e.g., y, z-axes), and gripper manipulation (e.g., open or closed).

Initial Demonstration

DAgger

Intervention criteria: Random

❌ Fail

EnsembleDAgger

Intervention criteria: Uncertainty of policy

❌ Fail

DPIIL (Ours)

Intervention criteria: Uncertainty of policy + Precision

✅ Success

Real-Robot Experiments with Human Experts

Shaft-reaching Task

We assessed the robot's skill to reach and grasp a shaft while avoiding fixed obstacles. Successfully performing this task within the time limit (150 steps) is challenging since the environment is prone to physical contact (e.g. robot vs. obstacles)

EnsembleDAgger

Intervention criteria: Uncertainty of policy

❌ Fail

DPIIL (Ours)

Intervention criteria: Uncertainty of policy + Precision

✅ Success

Ring-threading Task

We assessed the robot's skill of inserting a ring into a peg without bumping into another peg for the assembly. This scenario is more complicated than the shaft-reach task since the clearance for inserting the ring is smaller (only 2mm), requiring more precise control and a larger time limit (200 steps).

EnsembleDAgger

Intervention criteria: Uncertainty of policy

❌ Fail

DPIIL (Ours)

Intervention criteria: Uncertainty of policy + Precision

✅ Success

DPIIL's Robot-autonomous Performance in Real-robot Experiments

Shaft-reaching Task

Ring-threading Task

Hyperparameters of Comparison Methods