Estimating Q(s, s') with Deep Deterministic Dynamics Gradients
[Paper] [Code]
Ashley D. Edwards, Himanshu Sahni, Rosanne Liu, Jane Hung, Ankit Jain, Rui Wang, Adrien Ecoffet, Thomas Miconi, Charles Isbell, Jason Yosinski
A D3G trained dynamics model gradually learns to make grid-like predictions from the start to goal when trained to solve a grid world task.
Given a sequence of states, rewards, and termination conditions (i.e. no actions!) obtained from a random policy, a D3G trained dynamics model imagines balancing a pole*.
...it also imagines moving reacher to a target location.
* The model predicts state vectors which we render in the MuJoCo simulator.
Paper to appear in the 2020 International Conference on Machine Learning (ICML).