A ball is attached to an actuated cup by a string, the agent must move the cup to catch/get the ball inside the cup.
These visualizations help illustrate how the VoSI metric is realized by comparing the difference in performance of mixed loop (MixL) execution strategies (that forgo sensory information for different horizons h and subsequently operate closed loop) from a closed loop execution strategy (denoted as CL). The open loop phase of the rollouts are represented with a gray indicator on the top-right, which the closed loop phase represented with a green indicator.
Media 1(a). We observe a sharp degradation in performance on MixL execution horizon around 5 at which point open-loop execution of the swing phase of the ball results in collision with the lip of the cup. The degradation subsequently increases linearly as the agent takes more time to observe failure and recover with closed loop execution.
Media 1(b). Similar to Media 1(a), a sharp degradation occurs at the point when the swing reaches a phase where the ball is close to the lip of the cup.
Media 1(c). In addition to observations that hold for Media 1(a-b), from this state we also observe that committing to open loop execution can result in degradation of the total return achievable as a consequence of delayed reaching of the goal configuration as observed in MixL trajectories from executions with horizons 10 and 20.
Media 1(d). Once the catch is complete the agent suffers close to no performance degradation (there are little performance drops arising due to stochasticity of the policy post task completion).
Media 2. Visualization of the VoSI profiles over the course of a rollout. The VoSI profiles exhibit a temporal structure where they shift to the left over the course of the swing indicating that it is beneficial for the agent to sense the state of the environment when the ball is close to the lip of the cup, but as it observes that ball has cleared the wall of the cup the catch can be completed open loop and there is no additional value to sensing.
Following a similar rationale we developed with swingup we project the VoSI profiles VoSI(s, h) obtained for the TD-MPC2 agent to a 1D component and present a visualization of all the VoSI profiles sorted in the descending order of the projected component in Figure 1(a), we observe that at most states there is no additional value gained from sensing i.e. when the catch is straightforward to complete or is already complete. Figure 1(b) serves to re-highlight the representative states discussed broadly in Media 1.
Figure 1(a). Illustration of the VoSI profiles for all the evaluated states
Figure 1(b). Highlights of the VoSI profiles along a few representative states from the full collection on the left. In gray are illustrations of representative states at which some change in the VoSI profile is observed.