Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning