Play-LMP imitates only the end-effector position and fails to accomplish the task. On the other hand, TACO-RL is able to reason how to perform the task correctly and can even perform an additional task
As in the previous example, Play-LMP fails to open the drawer, in this case, it just follows a common trajectory used to open it, but it cannot correct for the sensor error. In contrast, our method opens the drawer in a correct manner.
Both algorithms are capable of performing the task successfully. Play-LMP achieves it through retrying behavior, and TACO-RL does it on the first try.