Variational Reparametrized Policy Learning with Differentiable Physics