Temporal Logic Imitation: Learning Plan-Satisficing Motion Policies from Demonstrations

Abstract: Learning from demonstration (LfD) has shown promising results for learning multi-step tasks. However, state-of-the-art approaches still do not guarantee the successful reproduction of a task in the event of perturbations, or have the ability to recover from failures. In this work, we identify the roots of this challenge as the failure of a learned continuous policy to satisfy the discrete plan implicit in the demonstration. By utilizing modes (rather than subgoals) as the discrete abstraction and motion policies with both mode invariance and goal reachability properties, we prove our learned continuous policy can simulate any discrete plan specified by a linear temporal logic (LTL) formula. Consequently, an imitator is robust to both task- and motion-level perturbations and guaranteed to achieve task success.

Our Method (LTL-DS) inputs (1) an LTL formula that specifies all valid mode transitions for a task and (2) demonstrations that successfully complete the task, and outputs (1) a task automaton that can reactively sequence (2) a set of learned dynamical systems (one DS per mode) to reply the task despite perturbations.

Neural-Network-Based Behavior Cloning (BC) vs Dynamical Systems (DS)

Supplementary videos showing the difference between state-based BC methods with and without stability guarantee when learned from a few demonstrations (red trajectories.)

bc_policy.mp4

BC without stability leads to spurious attractors and diverging flows

ds_policy.mp4

DS with global stability guarantee ensures goal reachability

Iterative Boundary Estimation of Unknown Mode with Cutting Planes

Supplementary video showing the construction of cutting planes and boundary estimation.

boundary_estimation.mp4

Invariance failures are used to find cutting planes to bound the mode and DS flows are modulated to stay inside the estimated boundary

DS (without modulation) vs DS-mod (with modulation)

Supplementary video showing the necessity of mode invariance when sequencing DS with a soup-scooping task automaton (shown above in the method figure.) The task is to transition through the white, yellow, pink, and green regions consecutively. The pink region can only be entered from the yellow region, and the green region can only be entered from the pink region.

no_mod_stuck.mp4

DS does not ensure mode invariance without boundary estimation and modulation, leading to looping despite LTL's reactivity

with_mod_no_stuck.mp4

DS with modulation enjoys both reachability and invariance, which is necessary to simulate any discrete plan output by the reactive LTL

Generalization to New Tasks by Reusing DS Skills

Supplementary videos showing LTL-DS' generalization to new task structures (encoded by LTL) by flexibly combining individual DS skills learned in demonstrations. Consider a demonstration of adding chicken (visiting the yellow region) and then broccoli (visiting the green region) to a pot (visiting the gray region). After individual DS of visiting the yellow, the green, and the gray region are learned, they can be recombined given a new LTL (refer to the paper) to solve new tasks such as (1) adding broccoli and then chicken, (2) adding only chicken, (3) continuously adding chicken. Note the white region represents an empty spoon and crossing from yellow/green to white means spilling the food.

get_chicken_first.mp4

Adding chicken and broccoli: The order of yellow -> gray -> green -> gray is guaranteed despite perturbations

get_broccoli_first.mp4

Adding broccoli and chicken: The order of green -> gray -> yellow -> gray is guaranteed despite perturbations

getting_only_chicken.mp4

Adding only chicken: The order of yellow -> gray is guaranteed despite perturbations

continuously_getting_chicken.mp4

Adding chicken continuously: The order of yellow -> gray -> yellow... is guaranteed despite perturbations

Robot Scooping Experiment

Supplementary video showing the soup-scooping task implemented on a real robot. The task is to scoop at least one red bead from one bowl and transfer it to the other bowl. We collected fewer than five demonstrations and use the task automaton shown above for this experiment.

LTLDS.mp4

NEW HUMAN-ROBOT EXPERIMENTS

[New results: additional human experiments] Since external perturbations are an integral part of our task complexity, we recruited five human subjects without prior knowledge of our LTL-DS system to perturb the robot scooping setup. Each subject is given five trials of perturbations. In total, we collected 25 trials, each of which is seen as an unbiased i.i.d. source of perturbations. In our newly submitted video, we show all 25 trials succeed eventually. We did not cherry-pick the results and we shoot all videos in one go. This empirical 100% success rate further corroborates our theoretic success guarantee. Interestingly, common perturbation patterns (we annotate with the same colored text) emerge from different participants. Specifically, we see adversarial perturbations where humans fight against the robot and collaborative perturbations where humans help the robot to achieve the goal of transferring at least one bead from one bowl to the other. In the case of adversarial perturbations, DS reacts and LTL replans. In the case of collaborative perturbations, DS is compliant and allows humans to guide the motion as well. In the case where humans are not perturbing yet the robot makes a mistake (e.g. during scooping), LTL replans the scooping DS until the robot enters the transferring mode successfully. The fact that we don’t need to hard code different rules to handle invariance failures caused by perturbations and the robot’s own execution failures in absence of perturbations highlights the strength of our LTL-powered sensor-based task reactivity.

human_perturb.mp4

NEW Line Inspection Task

exp2.mp4

NEW Color Tracing Task

exp3.mp4