On the model-based stochastic value gradient for continuous RL

Supplementary Videos