Standing still
Moving forward
More Complex Tasks with Larger Dynamics Gaps
Standing Still with a larger dynamics gap
(modifies the weight of the robot in simulation from 12.1kg to 4.6kg while the mass of the real robot is 12.0kg)
Moving Forward with a larger dynamics gap
Control Performance Comparisons
Standing Still
Moving Forward
Only H2O+ and IQL policies successfully maintain the balance of the robot for over 30 seconds (s).
H2O+ regulated the displacement of the robot within 0.2m, whereas IQL only barely maintains balance, yet with a large range swinging even reaches 1.6m.
H2O+ achieves the desired forward movement with precise velocity control, maintains smooth speed changes and a steady pitch angle.
In contrast, the IQL policy manages to maintain balance but causes backward movement and considerable effort in doing so, leading to a shaky period lasting over 7 seconds.
More Complex Tasks with Larger Dynamics Gaps
Standing Still with a Larger Dynamics Gap
H2O+ outperforms IQL in terms of the robot’s capacity to maintain a stationary stance. H2O+ effectively confines the robot’s displacement within approximately -1 meter, whereas IQL leads the robot to oscillate between its original position and a broader range of approximately 1.5 meters.
Moving Forward with a Larger Dynamics Gap
None of the methods except H2O+ are able to control the robot to move forward, and only IQL maintains equilibrium for a long period of time. H2O+, despite its moving backward at first, moves forward at a steady speed close to 0.2 m/s for a long period of time.
Online Simulation Data Quality
Standing Still
Comparison of H2O+ and H2O simulated data quality on the real-world robot “standing still” task. We visualize the coverage and the normalized value of reward, displacement, velocity, angle, angular velocity, and action. In the ''standing still'' task, we observe that H2O explores a more focused high-value area, whereas H2O+ spans a broader high-value area, thus demonstrating superior diversity characteristics in simulated data, which would benefit the overall performance.
Reward
Displacement
Velocity
Angle
Angular Velocity
Action
Moving Forward
Comparison of H2O+ and H2O simulated data quality on the real-world robot “moving forward” task. We visualize the coverage and the normalized value of reward, displacement, velocity, angle, angular velocity, and action. In the “moving forward” task, H2O+ provides wider coverage across the state-action space and displays better diversity in its data, leading to a more robust and thorough exploration of the state-action space.
Reward
Displacement
Velocity
Angle
Angular Velocity
Action
Distribution Analysis of Offline Dataset
We visualize the state, action and reward distribution of the two real-world robot tasks. For the standing still task, we collect 16588 transitions of data based on the real robot. And for the offline dataset of moving forward, we collect 16588 transitions of data from the moving process of the real robot.
State, action and reward distribution of the standing still dataset
State, action and reward distribution of the moving forward dataset