Coming soon
We investigate the problem of teaching a robot manipulator to perform dynamic non-prehensile object transport, also known as the ‘robot waiter’ task, from a limited set of real-world demonstrations. We propose an approach that combines batch reinforcement learning (RL) with model-predictive control (MPC) by pretraining an ensemble of value functions from demonstration data, and utilizing them online within an uncertainty-aware MPC scheme to ensure robustness to limited data coverage. Our approach is straightforward to integrate with off-the-shelf MPC frameworks and enables learning solely from task space demonstrations with sparsely labeled transitions, while leveraging MPC to ensure smooth joint space motions and constraint satisfaction. We validate the proposed approach through extensive simulated and real-world experiments on a Franka Panda robot performing the robot waiter task and demonstrate robust deployment of value functions learned from 50-100 demonstrations. Furthermore, our approach enables generalization to novel objects not seen during training and can improve upon suboptimal demonstrations. We believe that such a framework can reduce the burden of providing extensive demonstrations and facilitate rapid training of robot manipulators to perform non-prehensile manipulation tasks.
Real World Videos
MPC Demonstrator
While our approach does not make any assumptions about the source of demonstrations, in this work we use an algorithmic demonstrator because it enables us to easily collect data for different ablation studies. We employ a Model Predictive Control (MPC)-based demonstrator that uses friction cone constraints to enforce object stability, building on the formulation described in [1].
We model the object as a rigid body that adheres to the Newton-Euler equation, which states that the sum of the gravitoinertial wrench and the contact wrench equals zero. The gravitoinertial wrench represents the combined effects of gravity and inertia on the object, while the contact wrench represents the forces and torques exerted on the object due to contact with the environment.Our goal is to create an expert that implicitly satisfies the Newton-Euler equation and the friction cone constraints to prevent the object from sliding significantly while reaching the target goal. For this, we assume access only to the robot states and nominal values for the object's friction and inertial properties.We utilize these dynamic equations to formulate a cost function that penalizes end-effector states violating the constraints, while assigning zero cost elsewhere. The gravitoinertial wrench is computed based on the object's mass, inertia, velocities, and accelerations, taking gravity into account. By assuming the object moves minimally on the tray, we approximate its orientation to be similar to that of the end effector. This allows us to rewrite the Newton-Euler equations in the end-effector frame.
To ensure the object does not slip on the tray surface, we require the gravitoinertial wrench to be balanced by the contact wrench in the object's body frame. To model the wrench resulting from the object's contact with the tray, we calculate forces at predefined contact points on the object's surface using a point contact with friction model as used in [2]. We consider a set of n contact points, and for each point, we define a force vector consisting of tangential and normal force components.We relate the stacked contact forces to the gravitoinertial wrench using the inverse of the grasp matrix. The grasp matrix encapsulates the relationship between individual contact forces and the resultant wrench acting on the object. It is constructed using adjoint transformation matrices that relate each contact point to the object frame, along with basis matrices that project the transmissible components of the contact forces into a six-dimensional space. This relationship enables us to compute the contact forces resulting from end-effector motions.
To prevent slipping, the contact forces at each contact point must satisfy the friction cone constraints. These constraints specify that the magnitude of the tangential forces must be less than or equal to the product of the friction coefficient and the normal force component. Additionally, the normal force must be non-negative to ensure contact is maintained. To optimize robot trajectories that satisfy these constraints, we formulate a cost function that penalizes any violations of the friction cone constraints. This cost function is zero when the constraints are satisfied and increases when they are not. In our implementation, we integrate this cost into the running cost within the STORM framework, allowing us to collect demonstrations for various experiments. Importantly, while the algorithmic demonstrator assumes access to the object's inertial and friction properties, the learned value function does not. In our experiments, we also investigate how learning can improve upon suboptimal demonstrations when the nominal object properties are incorrect.
[1] A. Heins and A. P. Schoellig, “Keep it upright: Model predictive control for nonprehensile object transportation with obstacle avoidance on a mobile manipulator,” IEEE Robotics and Automation Letters, 2023.
[2] M. Selvaggio, J. Cacace, C. Pacchierotti, F. Ruggiero, and P. R. Gior- dano, “A shared-control teleoperation architecture for nonprehensile object transportation,” IEEE Transactions on Robotics, 2022