On this page, we intend to use RL training to assess responsiveness. We consider 2 experimental setting. In Action Horizon Setting, RL agent directly outputs next Ta actions (similar to DP setting); in Action Repeat Setting, RL agent outputs 1 action which is then executed in the environment for Ta steps (a strong challenge to responsiveness). We experiment with Adroit Relocate (classified as response-sensitive), Adroit Pen (classified as regular), and ManiSkill2 StackCube (classified as regular).
Figure 19: RL training results on Adroit Relocate (classified as response-sensitive task). Both the action horizon and action repeat experiments fail to achieve any success on Adroit Relocate, indicating that the task has very low tolerance to outdated actions and strongly depends on the most recent observations. This supports our classification of Adroit Relocate as a response-sensitive task.
Figure 20: RL training results on ManiSkill2 StackCube (classified as regular task). Action horizon runs can achieve decent success rate, while action repeat runs cannot, indicating that ths task has some tolerance to outdated actions. This supports our classification of ManiSkill2 StackCube as a regular task.
Figure 21: RL training results on Adroit Pen (classified as regular task). Both action horizon and action repeat runs have reasonable success rate, indicating that the task has very high tolerance to outdated actions. This supports out classification of Adroit Pen as a regular task.