Ball in a Cup in 4 Minutes

MuJoCo Simulation

Description:

The conducted simulation studies of the Ball in a Cup experiments. The red ball is simulated using MuJoCo while the yellow shows the planned optimal ball trajectory using the internal model. The green and red frames, highlight whether the transfer was successful or not. Each roll-out shows the optimal trajectory of a different reinforcement learning seed using the identical model. The data acquisition is identical to the real robot experiments. In simulation we do not average across ball-trajectories as they are mostly deterministic in simulation.

Results:

Only the learned whitebox model can successfully transfer solutions learned via offline model-based reinforcement learning. The other learnt dynamics models, i.e. LSTM network and feed-forward network, do not achieve a single successful transfer across 50 different reinforcement learning seeds. The predicted ball trajectories (yellow) for these models are not plausible and directly leave the scene. At the end of the trajectory the planned ball magically comes back to the Cup. This shows that the policy optimization exploited the out of distribution dynamics of the local LSTM & FFNN. In contrast the predicted ball trajectory of the learned whitebox model is plausible and a good prediction. The models cannot be perfect as MuJoCo simulates the rope as many small connected rigid bodies, while the whitebox model uses an inequality constraint.

Learned Whitebox Model

LSTM Network

Feed-Forward Neural Network

Page updated

Google Sites

Report abuse