Supplementary material for the paper entitled
State Planning Policies Online RL

Appendix for the paper

cdc_appendix.pdf

SPP-RL code github repository

goal_spptd3_530.mp4

SPP-TD3 agent

goal_td3_496.mp4

vanilla TD3 agent

Doggo Goal

buttonfull_spptd3_554.mp4

SPP-TD3 agent

button_td3_501.mp4

vanilla TD3 agent

Doggo Button

spp-td3-Safexp-DoggoCustom30-v0-1614109055.46159-0_6.mp4

SPP-TD3 agent

td3-Safexp-DoggoCustom30-v0-1614015303.883001-0.mp4

vanilla TD3 agent

Doggo Columns

spp-td3-Safexp-CarPush0-v0-1615543287.0482926-0_6.mp4

SPP-TD3 agent

td3-Safexp-CarPush0-v0-1615644722.7111344-0.mp4

vanilla TD3 agent

Car Push

Example trajectories obtained from trained agents in AntPush environment

antpush_good.mp4

Example of an optimal Ant path solving the AntPush task (the left path on the figure above). Ant traverses to the left and pushes the movable block away, to open the entrance to the goal.

antpush_bad.mp4

Example of a sub-optimal Ant path (the right path on the figure above).
Ant traverses to the right and blocks the entrance to the goal with the movable block.

Experiment result plots from MuJoCo benchmark environments

Ant, (SPP-)DDPG

Ant, (SPP-)TD3

Ant, (SPP-)SAC

Humanoid, (SPP-)DDPG

Humanoid, (SPP-)TD3

Humanoid, (SPP-)SAC

Experimental evaluation of the TD3 shadow agent

Doggo Goal TD3 shadow agent

Doggo Button TD3 shadow agent

Doggo Columns shadow agent

Car Push shadow agent

Experiment results from the ablation study

ablation study of SPP-TD3 features in Ant

ablation study of SPP-TD3 features in Doggo Goal

Investigating SPP-RL & vanilla RL replay buffers

Encoded state density from vanilla SAC buffer

Encoded state density from SPP-SAC buffer

Encoded state density from vanilla TD3 buffer

Encoded state density from SPP-TD3 buffer

Page updated

Google Sites

Report abuse