Temporal Logic Imitation: Learning Plan-Satisficing Motion Policies from Demonstrations
NEW PROJECT WEBSITE https://yanweiw.github.io/tli/
Paper link https://arxiv.org/abs/2206.04632
Abstract: Learning from demonstration (LfD) has shown promising results for learning multi-step tasks. However, state-of-the-art approaches still do not guarantee the successful reproduction of a task in the event of perturbations, or have the ability to recover from failures. In this work, we identify the roots of this challenge as the failure of a learned continuous policy to satisfy the discrete plan implicit in the demonstration. By utilizing modes (rather than subgoals) as the discrete abstraction and motion policies with both mode invariance and goal reachability properties, we prove our learned continuous policy can simulate any discrete plan specified by a linear temporal logic (LTL) formula. Consequently, an imitator is robust to both task- and motion-level perturbations and guaranteed to achieve task success.
Our Method (LTL-DS) inputs (1) an LTL formula that specifies all valid mode transitions for a task and (2) demonstrations that successfully complete the task, and outputs (1) a task automaton that can reactively sequence (2) a set of learned dynamical systems (one DS per mode) to reply the task despite perturbations.
Neural-Network-Based Behavior Cloning (BC) vs Dynamical Systems (DS)
Supplementary videos showing the difference between state-based BC methods with and without stability guarantee when learned from a few demonstrations (red trajectories.)
BC without stability leads to spurious attractors and diverging flows
DS with global stability guarantee ensures goal reachability
Iterative Boundary Estimation of Unknown Mode with Cutting Planes
Supplementary video showing the construction of cutting planes and boundary estimation.
Invariance failures are used to find cutting planes to bound the mode and DS flows are modulated to stay inside the estimated boundary
DS (without modulation) vs DS-mod (with modulation)
Supplementary video showing the necessity of mode invariance when sequencing DS with a soup-scooping task automaton (shown above in the method figure.) The task is to transition through the white, yellow, pink, and green regions consecutively. The pink region can only be entered from the yellow region, and the green region can only be entered from the pink region.
DS does not ensure mode invariance without boundary estimation and modulation, leading to looping despite LTL's reactivity
DS with modulation enjoys both reachability and invariance, which is necessary to simulate any discrete plan output by the reactive LTL
Generalization to New Tasks by Reusing DS Skills
Supplementary videos showing LTL-DS' generalization to new task structures (encoded by LTL) by flexibly combining individual DS skills learned in demonstrations. Consider a demonstration of adding chicken (visiting the yellow region) and then broccoli (visiting the green region) to a pot (visiting the gray region). After individual DS of visiting the yellow, the green, and the gray region are learned, they can be recombined given a new LTL (refer to the paper) to solve new tasks such as (1) adding broccoli and then chicken, (2) adding only chicken, (3) continuously adding chicken. Note the white region represents an empty spoon and crossing from yellow/green to white means spilling the food.
Adding chicken and broccoli: The order of yellow -> gray -> green -> gray is guaranteed despite perturbations
Adding broccoli and chicken: The order of green -> gray -> yellow -> gray is guaranteed despite perturbations
Adding only chicken: The order of yellow -> gray is guaranteed despite perturbations
Adding chicken continuously: The order of yellow -> gray -> yellow... is guaranteed despite perturbations
Robot Scooping Experiment
Supplementary video showing the soup-scooping task implemented on a real robot. The task is to scoop at least one red bead from one bowl and transfer it to the other bowl. We collected fewer than five demonstrations and use the task automaton shown above for this experiment.
NEW HUMAN-ROBOT EXPERIMENTS
[New results: additional human experiments] Since external perturbations are an integral part of our task complexity, we recruited five human subjects without prior knowledge of our LTL-DS system to perturb the robot scooping setup. Each subject is given five trials of perturbations. In total, we collected 25 trials, each of which is seen as an unbiased i.i.d. source of perturbations. In our newly submitted video, we show all 25 trials succeed eventually. We did not cherry-pick the results and we shoot all videos in one go. This empirical 100% success rate further corroborates our theoretic success guarantee. Interestingly, common perturbation patterns (we annotate with the same colored text) emerge from different participants. Specifically, we see adversarial perturbations where humans fight against the robot and collaborative perturbations where humans help the robot to achieve the goal of transferring at least one bead from one bowl to the other. In the case of adversarial perturbations, DS reacts and LTL replans. In the case of collaborative perturbations, DS is compliant and allows humans to guide the motion as well. In the case where humans are not perturbing yet the robot makes a mistake (e.g. during scooping), LTL replans the scooping DS until the robot enters the transferring mode successfully. The fact that we don’t need to hard code different rules to handle invariance failures caused by perturbations and the robot’s own execution failures in absence of perturbations highlights the strength of our LTL-powered sensor-based task reactivity.
NEW Line Inspection Task
NEW Color Tracing Task