Deisenroth & Rasmussen (ICML, 2011): PILCO
Deisenroth et al. (PAMI): Gaussian processes for data-efficient learning in robotics and control
How do you control systems for which you do not have a good model for?
- Bayesian Machine Learning
We want to learn policies fully autonomously:
- Infinite number of combinations of state and control
- Millions of experiments are impractical
We could use three approaches to learn a control
- Reinforcement learning
- Imitation learning
- Inverse RL - Ng & Russell 2000
- Behavioral cloning - Pomerleau 1989
- Probabilistic imitation learning - Interesting that the "guidance" can actually affect the learning.
- Bayesian optimization
- Jones 2011
- Brochu et al 2010
- Hennig & Schuler 2012
- Similar to active learning
- Calandra - Seyfarth : Bipedal robot
- Limited to 10 - 20 parameters
- Build a model of the objective function
- Find the minimum
- Evaluate the true objective function
- Update the model objective function
- x_t+1 = f(x_t, u_t) + w
- u_t = p(x_t, z)
- min J(z)
- Probabilistic model for transition function "f" to be robust to model errors
- Gaussian process: mean and covariance function
- Compute long-term predictions of p(x,z)
- Policy improvement
- Apply controller