Task-Oriented Active Learning of Model Preconditions for Inaccurate Dynamics Models

Alex Lagrassa, Moonyoung Lee, Oliver Kroemer

Carnegie Mellon University, Robotics Institute

{alagrass,moonyoul,okroemer}@andrew.cmu.edu

A dynamics model that works in one environment may lead to noticeable deviations when applied to a different environment. Knowing where the model is accurate can help compute plans that can successfully reach the goal.


Here's an illustrative example showing a robot trying to pour water into a plant container. 

Due to the plant's presence, water may or may not enter the container. As such, the robot can actively learn where the model is accurate by using a planner and acquisition function to iteratively select informative trajectories. 


The resulting learned model precondition is then used at test time to only perform actions in the model precondition. 

Abstract

When planning with an inaccurate dynamics model is necessary, a promising strategy involves confining planning effort to regions of state-action space where the model is accurate: sometimes referred to as a model precondition. Many model forms such as simulators and analytical models lack inherent criteria for regions where the model will be accurate in the test environment, which motivates the utility of defining model preconditions using small amounts of data collected in the test environment. This paper presents an algorithm that actively selects trajectories to learn a model precondition for planning with an inaccurate pre-specified dynamics model. The main contributions of this work are the proposed techniques for actively learning model deviation estimators and the experimental analysis of algorithmic properties in three planning domains: icy gridworld, simulated plant watering, and realworld plant watering environments 

Method


Three domains used showing the experimental setup and their corresponding dynamics models. 

(a) Slippery grid world where movement may result in slipping backwards over ice (blue) or not moving (grey). The analytical dynamics model assumes unimpeded movement with grid bounds in four directions.

(b) Simulated plant watering using a learned dynamics model trained on a simple water transport domain without a plant 

(c) Real-world robot pouring water in a plant pot where the analytical dynamics model is based on container geometry 

Additional Experiments

Qualitative Analysis of MDE over Online Learning Runs

We can observe successful balance of exploration and exploitation through additional qualitative analysis. The plots below show model preconditions and acquisition function values in the simulated watering environment for individual online learning runs for a successful run. 


Acquisition function values (top)

Graph explanation


Results show:


Model preconditions (bottom)

Graph explanation


Results show:

Qualitative Analysis for Active Learning 

Although our acquisition function can bias data collection to low-deviation trajectories given enough data, the random generation process of RRT may not provide a sufficient set. 


Constraining the trajectories using the MDE during training can improve the selection. 


Experiment: We empirically evaluate the effect of four different beta schedules, two of which are fixed, and two of which start permissive and become more conservative. 


Summary of results: When evaluated in the simulated plant-watering domain, we observed no significant performance difference between using these schedules.

Effect of risk tolerance schedules on test performance. Schedule A varies β from -2 to 2 using a sigmoid function. Schedule B is the same as schedule A, but with a maximum at 1. Schedule C fixes β = −2 and schedule D fixes β = 1