Zihan Zhou1,2, Animesh Garg1,3, Dieter Fox1, Caelan Garrett1*, Ajay Mandlekar1*
1NVIDIA, 2University of Toronto, Vector Institute, 3Georgia Institute of Technology
* equal contribution
We propose SPIRE, a system that first uses Task and Motion Planning (TAMP) to decompose tasks into smaller learning subproblems and second combines imitation and reinforcement learning to maximize their strengths. We develop novel strategies to train learning agents when deployed in the context of a planning system. We evaluate SPIRE on a suite of long-horizon and contact-rich robot manipulation problems. We find that SPIRE outperforms prior approaches that integrate imitation learning, reinforcement learning, and planning by 35% to 50% in average task performance, is 6 times more data efficient in the number of human demonstrations needed to train proficient agents, and learns to complete tasks nearly twice as efficiently.
SPIRE reaches 80% success rate in 8 out of 9 tasks, while BC and RL only reach 80% in 3 tasks each.
In Tool Hang, our method reaches 94% success rate despite the BC counterpart only having 10%, which is over 9 times improvement. Remarkably, this low-performing BC agent is enough to help address the exploration burden (unlike RL, 0% success) and train a near-perfect agent. Across all 9 tasks, SPIRE averages a 87.8% success rate, while BC and RL only average 52.9% and 37.6% respectively.
SPIRE agents have lower average completion times than their BC and RL counterparts. Even in tasks such as Square, Square Broad, Coffee, Three Piece, where BC policies already have high success rates, our method improves the efficiency by only using an average of 59% completion time.
We notice that in Coffee, RL policy has a much shorter completion time than our method. This is at the cost of ignoring safety concerns. The RL-trained policy attempts to close the lid by knocking the coffee machine with the arm, which can potentially damage the robot and the coffee machine and even cause danger to humans; while our method preserves safety awareness by following the demonstration's practice of closing the lid with its fingers.