Above are the rollouts for the sequential behavior composition plots shown below. The policies used in the left video were generated using measure conditioning, while the policies used in the right video were generated with language prompts
Visualization of the measure values resulting from a sequence of four randomly selected desired measures (left) and text labels (right). The desired behavior in both cases is changed four times throughout the episode. The measure values from a policy sequence (Section 4 of the paper) are shown on the left as a function of time. These were run 10 times and the corresponding error plots are shown. For both measure and language conditioning, we found that our generated policies and their resulting behavior sequence closely match the desired behavior sequence. We found that our model is capable of generating and sequencing diverse policies despite large changes in foot-contact time and sudden changes in behavior between trajectory segments. By labeling policies that fall over as such, our language-conditioned diffusion model is capable of pruning away poorly-performing policies and dramatically increasing our performance in sequential behavior composition.
Temporal behavior sequencing from text labels. A humanoid (left) controlled by a policy sequence beginning with “slide forward on your right foot while kicking with your left foot” (top left), then “run forward on left foot while dragging right foot”, then “quickly shuffle forward on your left foot”, and finally “wildly hop forward on left foot while lifting your right foot up” (bottom right). Heatmap of the archive (right) showing the sequence of text labels overlaid on the measure space.