Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards
Performance of DTSIL on Montezuma's Revenge and Pitfall
output.mp4
DTSIL+EXP (30900)
PPO+EXP (10500)
ours_exp.mp4
ppo_exp.mp4
Performance of DTSIL on Montezuma's Revenge and Pitfall
DTSIL+EXP (30900)
PPO+EXP (10500)