CDMPC learns to chain short-horizon skills from long-horizon trajectories across demonstrations from diverse source domains, including various skill combinations. The policy learned from CDMPC adapts to tasks from any source domain and makes the agent able to tackle new tasks that require novel skill combinations.
Room Order: [2,1,3]
Room Order: [2,1,3]
Room Order: [2,1,3]
Room Order: [4,2,1]
Room Order: [4,2,1]
CDMPC is able to enable the agents follow Room ID and MiniGrid demostration avoid obstivales and go to the correct rooms.