The agent is required to complete a set of 4 out of 7 possible tasks in a kitchen environment. The official documentation can be found here. We use the goal-conditioned image-based variant, where the environment is perceived from a single view and the goal is specified by the last image in the demonstration trajectory.
Below, we visualize the trajectory rollouts of our method completing all 4 subtasks specified by the goal (image).
Goal: {top burner, bottom burner, slide cabinet, hinge cabinet}
Goal: {microwave, bottom burner, kettle, hinge cabinet}
Goal: {microwave, stove light, slide cabinet, bottom burner}
We also visualize the top-16 particles (based on transparency) along with the execution