Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards

Performance of DTSIL on Montezuma's Revenge and Pitfall

output.mp4

DTSIL+EXP (30900)

PPO+EXP (10500)

ours_exp.mp4

ppo_exp.mp4

[0:11~0:17] With a sword in hand, the agent kills the spider faraway from where it get the sword to receive reward 3000.

[0:07~0:09] With a sword in hand, the agent kills the nearby skull to get reward 2000.

[0:30~0:40] With two keys in the inventory, the agent holds the two keys to open last two doors to finish this level.

[0:35~0:37] With two keys in the inventory, the agent is distracted by the reward to open the nearby-door

Google Sites

Report abuse