Further Experimental Results

Dynamic Obstacle Avoidance

Effect of Number of Particles

We study the effect of number of trajectories sampled per iteration of optimization (or particles) on the controller performance in simulation. For this experiment, 10 end-effector pose targets were used that require significant changes in orientation. Each episode is 700 timesteps long after which the manipulator is reset to the base orientation. The horizon is kept constant at 30 timesteps and all cost function weights are also fixed.

Position Accuracy

The box plot below shows the median (solid line) with confidence interval (box) of position errors in the last 50 timesteps of every episode as a function of changing number of particles. We chose the last 50 timesteps to show the convergence of the controller to the goal. Increasing number of particles the controller is able to achieve more accurate median error with a tight confidence interval.

Orientation Accuracy

The box plot below shows the median (solid line) with confidence interval (box) of quaternion errors in the last 50 timesteps of every episode as a function of changing number of particles. Our framework achieves a confidence interval over quaternion errors within 5% with even 200 particles.

Jerk

The box plot below shows the median (solid line) with confidence interval (box) of the jerk in the robot motion over all timesteps. Our sampling strategy generates smooth (low jerk) motions even with 200 particles.

Maximum Joint Velocity

The lineplot below demostrates that with increasing number of particles, our B-spline sampling strategy is able to ramp up the robot's joint velocity while maintaining low jerk.

Effect of Cost Terms

Next, we study the effect of different cost terms. We first test the constraint based terms i.e. self collision and joint limit avoidance followed by the behavior based manipulability and stop costs.

Self Collision Cost

For this experiment, we chose 5 end effector target poses that are in collision with the robot and report the number of timesteps the robot spent in self collision. Here we observe that when the self collision cost is used, the robot never enters a state of self collision at the cost of not reaching the goal pose.

Joint Limit Avoidance Cost

For testing joint limit avoidance we use 10 end-effector targets with 500 particles and vary the weight on the joint limit cost. As the weight on the joint limit avoidance cost is increased, the number of violations decreases to zero for a large weight of 500 and above.

Manipulability Cost

The manipulability cost acts as a regularizer to keep the manipulator away from singular configurations. The box plot below shows that as the weight on the manipulability cost is increased, the pose reaching accuracy improves. However, after a certain threshold, the manipulability cost interferes with pose reaching and the position accuracy decreases. Maintaining high manipulability allows the robot to reach different end effector orientations accurately.

Stop Cost

The stop cost penalizes joint velocities that are too high for the robot to safely stop within horizon based on a maximum acceleration threshold. An important consequence of this term is that it allows the robot to smoothly stop at the goal even with a short horizon. We demonstrate this effect in the plot below where a lower weight on the stop cost leads to undesirable oscillations near the goal.

Effect of Horizon Length

Here we show a fundamental limitation of the MPC paradigm due to finite lookahead horizon. If the horizon is too short as in the video on the left (H=15), the robot can get stuck in local minima induced by the large wall obstacle and is unable to reach the goal. In practice, the horizon needs to be carefully tuned in order to avoid such corner cases which can be cumbersome. A possible method to overcome this is to use a learned Q function as the terminal cost which increases the effective horizon of MPC.

Horizon =15 steps

Horizon =30 steps

Effect of Sampling Strategy

Pseudo-random vs Halton Sampling

Halton Sampling is a Quasi Monte-Carlo method that provides better coverage of the action space as compared to pseudo-random sampling that exhibits and undesired clustering of samples. We study the improvement gain by using Halton sampling in low particle regime. The plots below show the quaternion errors achieved by pseudo random and Halton samples with 100 and 500 particles respectively. With lesser number of particles, pseudo-random sampling with comb filter is unable to achieve a quaternion error with a confidence interval within 5% whereas Halton sampling achieves less than 5% quaternion error with both 100 and 500 particles.

100 Particles
500 particles

Comb Filtering v/s B-Splines

Comb filtering smoothens out the sampled trajectories using user defined filtering coefficients. In order to ensure smooth (low jerk) motions a very strong filtering is required which prevents the robot from ramping up this velocity. Fitting B-Splines to sampled actions is able to ensure smooth acceleration profiles which allows the robot to smoothly ramp up the velocity but comes at the price of reduced pose accuracy. However, since our sampling strategy allows to arbitrarily mix different kinds of trajectories, we create a hybrid sampling strategy (denoted as "mixed") with a ratio of 0.6 and 0.4 of B-Spline and comb filtering. This provides comparable accuracy to comb filtering while also achieving high joint velocities.

Timing Benchmark

Learned v/s Baseline Self Collison Detection

Below we present a timing benchmark for an increasing batch size of query configurations. The learned function is over 40x faster on average than baseline self collision detection that uses forward kinematics to compute link poses and calculates minimum distance between them. Further, the learned self-collision detector maintains a very low latency of 0.4-0.6ms even for large batch sizes.

Gain from Tensorized Forward Model on GPU

The timing benchmark below shows the computational gains from our tensorized GPU based implementation of forward model versus a CPU baseline for varying horizon and number of particles used in MPC.

Comparison to Riemannian Motion Policies (RMPs)

In order to show the gain from forward lookahead in MPC, we compare our framework with RMPs [Ratliff et. al., 2018] in simulation. In the examples shown below, a Franka robot arm is trying to reach different goal points inside the cabinet. This example demonstrates how RMPs can get stuck in local minima when the goal location is on the other side of the cabinet wall. On the contrary, MPC, owing to its forward lookahead, is able to escape such local minima and successfully reach the goal.

RMP

STORM