Appendix for the paper
SPP-TD3 agent
vanilla TD3 agent
Doggo Goal
SPP-TD3 agent
vanilla TD3 agent
Doggo Button
SPP-TD3 agent
vanilla TD3 agent
Doggo Columns
SPP-TD3 agent
vanilla TD3 agent
Car Push
Example trajectories obtained from trained agents in AntPush environment
Example of an optimal Ant path solving the AntPush task (the left path on the figure above). Ant traverses to the left and pushes the movable block away, to open the entrance to the goal.
Example of a sub-optimal Ant path (the right path on the figure above).
Ant traverses to the right and blocks the entrance to the goal with the movable block.
Experiment result plots from MuJoCo benchmark environments
Ant, (SPP-)DDPG
Ant, (SPP-)TD3
Ant, (SPP-)SAC
Humanoid, (SPP-)DDPG
Humanoid, (SPP-)TD3
Humanoid, (SPP-)SAC
Experimental evaluation of the TD3 shadow agent
Doggo Goal TD3 shadow agent
Doggo Button TD3 shadow agent
Doggo Columns shadow agent
Car Push shadow agent
Experiment results from the ablation study
ablation study of SPP-TD3 features in Ant
ablation study of SPP-TD3 features in Doggo Goal
Investigating SPP-RL & vanilla RL replay buffers
Encoded state density from vanilla SAC buffer
Encoded state density from SPP-SAC buffer
Encoded state density from vanilla TD3 buffer
Encoded state density from SPP-TD3 buffer