The aleatoric noise in the model deviation in a more complex problem varies in S × A, which we account for with a heteroscedastic noise model.
We use a standard GP regression model with a Matern kernel with the addition of a noise model that captures the heteroscedasticity of the data. We will assume that the variance of the noise term ϵi at each data point xi is a function of the input features xi , i.e., ϵi = σ(xi)2. The noise function σ(xi) will be modeled using a separate GP that takes the input features xi as input. We approximate the observation by cross-validating subsets of ϵi using a homoscedastic model, and then using that to fit the heteroscedastic noise model.
We use the BoTorch implementation of a GP regression model with a gpytorch backend but optimize the hyperparameters by alternating optimization using BoTorch for 5 iterations for training step and marginal log-likelihood at each training iteration. We needed to set lower and upper bounds on the kernel lengthscales for both the noise model and for the homoskedastic. Additionally, due to space limits for using an exact GP, we randomly sampled a subset of 300 points from the full set of training data.
The state in the grid world is a two-dimensional (x,y) environment. The robot can move up, down, left, and right. It cannot move through walls, and moving left or right across the "slippery" squares causes it to move backward by one step. This is the same environment as was used in "Planning and execution using inaccurate models with provable guarantees".
The dynamics model for the Grid World is: ↑ (up) increases y by 1, ↓ (down) decreases y by 1, ← (left) decreases x by 1, and → (right) increases x by 1. If the next location is outside the bounds of the grid, the dynamics model assumes that the robot stays in place. The dynamics model does not account for the "slippery" squares, however.
The state is represented as x, y, θ. In the target environment, a plant is constructed out of voxels to mimic a Monstera plant. Water can pass through holes in the top of the leaves, but not through solid sections.
This is the only task where we use a learned model. The learned model was trained on a variation (meaning only one randomly selected geometry) of the PourWater Softgym environment, which does not contain a plant. Seven actions are sampled (some with θd = 0 and some randomly sampled between θd ∈ [1,3]) and executed to form a dataset of 1200 trajectories or 8400 transitions. 10% were held out as a validation set.
The model was trained using the Adam optimizer with a learning rate of 0.004 with a batch size of 64, 128 × 128 × 128 fully connected layers with Dropout and ReLU activation. Training was ended after 300 epochs, which is when the validation loss reached an accuracy corresponding to approximately 1 cm of position error, and 10% volume accuracy.
Because there is no plant in the dataset that the model was trained on, the predictions overestimate the amount of water in the target container for pouring actions above the leaves, and in actions that interact with the leaves (potentially leading to the controlled container not being able to move to the desired pose).
We use a Franka Emika Panda robot. To protect the robot from water that splashes out of the container, we cover it in the Fluid Resistant Cover from Robo-Gear. Outside equipment is protected using Vention bars surrounded by polycarbonate with a slideable door. The base of the environment is covered in highly absorbent PIG Mat in 15-inch by 20-inch squares that absorb 14 oz of water each. Resets are performed by pouring out water in the plant container, replacing wet mat with a dry mat, or waiting for the water to evaporate.
The design for which can be found at this link.
The state is represented as the poses of all containers containing water and their volumes. The target container pose is fixed at xtarget , ytarget , ztarget. The pose of the source container is xsource , ysource , zsource and contains an orientation parameter representing the rotation about the z-axis: θsource. The volume of the source and target container respectively are vsource and vtarget . Actions are defined as target locations for ysource, zsource, θsource. The desired location is denoted as yd, zd, θd. xd is fixed to a single value to keep motion in one plane. Trajectories are defined for the controlled container, but the motions are generated kinematically using the grasp transform between the bottle and the end-effector. There are broad preconditions for the actions, but do not specify where the model will be accurate. For example, yd, zd cannot be in collision with the target container. If θtarget > 1 , then yd and zd must be equal to (ysource, zsource), and yd, zd must be within a fixed distance of ytarget, ztarget.
The dynamics model for real-world watering is an idealized analytical model. The reason we can specify this is because the robot was pouring with a measured pourer, which dispenses 17 mL. It is “idealized” in that it assumes:
The robot always reaches the desired container position
17 mL always leaves the container when poured greater than 130 degrees
17 mL enters the target container if the location of the controled container is above the area of the target container
In this scenario, the state of the plant is represented only by the pose of the target container and assumed to be fixed relative to the target container. The target container has a bright pink rim and is measured using an overhead Azure Kinect DK camera. The target container is detected in the image by using the OpenCV circle detector and color thresholding. The pixels inside the circle (after filtering out pixels that belong the plant) are converted to a pose by using the camera intrinsics, which are then deprojected into camera frame. The camera frame is calibrated extrinsically to world frame, so the pose of the target container is known in world frame. In many configurations, the leaves obstruct the tape, in which case we use the last measured pose.
The state of the container is observed indirectly using the proprioception of the robot, assuming a fixed transform between the container and robot, and two U.S. Solid Precision Laboratory 3000 g High Precision Analytical Balances are used for measuring the weight of the water. The scales are not reset between trajectories, so the volume of the target container typically increases as more data is collected, and the measured volume of the bottle decreases as more data is collected.
Many design decisions in this environment were made to reduce the need for human intervention in the data collection procedure, which we describe here. At the beginning of data collection, the water bottle needs to be filled and the walls to the environment lowered to prevent splash damage to electronic components. Scales are tared programmatically through the serial interface and do not need to be touched. At the beginning of a trajectory, the robot measures the state of the scene, including the water volume of each container. It uses the same fixed side grasp to grasp the central part of the water bottle. Planning and trajectory execution are all autonomous. Once the trajectory for data collection or evaluation is executed, the robot returns the water bottle to the scale by placing it down, and then moves above the scale to enable an accurate measurement.
Despite our attempts to minimize human supervision, there were a few cases where a human needed to occasionally intervene (about one in fifteen trajectories). The most common one is if the bottle slips or is placed in an unstable configuration, which happened sometimes when the bottle was close to empty. Such conditions can be remedied by using a bottle with a more stable base. The second case where human intervention was necessary was if the plant fell due the robot getting stuck in the leaves, which occurred rarely.
All state estimation, including detecting whether the desired volume of water was in the target container, and whether a significant portion was spilled, is done autonomously.
The paths from planning are represented as controlled container poses and the trajectories are generated to follow that path. The desired end-effector poses can be obtained straightforwardly by applying the inverse grasp transform to the desired controlled container poses. For motions where θd = 0, a joint-space trajectory is generated with 10 waypoints, using an IK solver to convert the poses to joint configurations. The same joint-space trajectory generation process is followed for actions where θd > 0, which results in the pouring motion as the bottle changes angle. In this environment, we found that a simple IK solver would frequently find solutions where the robot was in self-collision, or in collision with obstacles. Rejection sampling was insufficient to avoid these solutions, but we were able to find solutions using the PyTorch Robot Kinematics library which accounts for collisions in the solver.
Problems are described in terms of their low-level state space. We use an RRT planner in a fairly standard manner: sample a random state, and then add actions to the tree that move toward that state. The predicted results of the actions are computed using the forward dynamics model, s′ ← f̂ (s, a). s′ is added to the tree if the edge toward it does not violate the constraints. The constraints are that the original (s, a) are within the model precondition of f̂, and that s′ does not have collisions. Collision checking is implemented using pybullet through the PyBullet planning library for the real-world scenario, and using Flexible Collision Library for the simulated scenario. There are no constraints in the GridWorld problem.
We use the Open Motion Planning Library implementation of RRT for its speed and popularity. The planner can perform up to 5000 expansions before it times out, except during the online learning phase, where the planner times out after 30 seconds.