FlowBot++: Learning Generalized Articulated Objects Manipulation via Articulation Projection
Abstract
Understanding and manipulating articulated objects, such as doors and drawers, is crucial for robots operating in human environments. We wish to develop a system that can learn to articulate novel objects with no prior interaction, after training on other articulated objects. Previous approaches for articulated object manipulation rely on either modular methods which are brittle or end-to-end methods, which lack generalizability. This paper presents FlowBot++, a deep 3D vision-based robotic system that predicts dense per-point motion and dense articulation parameters of articulated objects to assist in downstream manipulation tasks. FlowBot++ introduces a novel per-point representation of the articulated motion and articulation parameters that are combined to produce a more accurate estimate than either method on their own. Simulated experiments on the PartNet-Mobility dataset validate the performance of our system in articulating a wide range of objects, while real-world experiments on real objects' point clouds and a Sawyer robot demonstrate the generalizability and feasibility of our system in real-world scenarios.
Method
We formally define a 3D dense visual representation of articulation parameters on top of Articulation Flow. In this work, we introduce another 3D representation that densely represents the object's articulation parameters, Articulation Projection - for each point on the articulated part of the object, we define a vector that represents the displacement from the point itself to its projection onto the part's articulation axis. We use the two representations together to derive a full, smooth trajectory to open the articulated part.
Articulation Flow
Articulation Projection
Real-World Experiments
FlowBot++ (Ours)
Using a model trained exclusively in simulation, we are able to open a real-world oven smoothly. Note that the gripper's trajectory is compliant with the oven door's kinematic constraint because the inferred articulation parameters (axis) can be used to analytically calculate the gripper's orientation in each step of the trajectory.
FlowBot3D
FlowBot3D relies on per-step prediction. Oven was an unseen category in training, in FlowBot3D, the robot was never able to open a real-world oven. The predicted Articulation Flow directions were very off, causing failures.
FlowBot++ (Ours)
Using a model trained exclusively in simulation, we are able to smoothly open a fridge without causing much unwanted movements of the fridge itself. The video is shown in real time, and we are able to open it to the fullest in 20 seconds. This is about 30% of the time spent using FlowBot3D
FlowBot3D
FlowBot3D's per-step prediction is prone to error. Each flow direction prediction error made is going to result in unexpected motion of the gripper and thus of the object itself. Wrong flow prediction causes the gripper to yank the fridge so hard that it almost tips over. The re-planning time is also a bottleneck, resulting in a 3.5x execution time compared to FlowBot++.
FlowBot++ (Ours)
Opening the other door of the fridge by inputting a different mask. Again, the motion is very smooth and execution time is very short compared to FlowBot3D.
Results on Real-World Point Clouds
Simulated Experiments
To evaluate our method in simulation, we implement a suction gripper in PyBullet. We consider the same subset of PartNet-Mobility as in previous work. Each object starts in the "closed'' state (one end of its range of motion), and the goal is to actuate the joint to its "open'' state (the other end of its range of motion). During our experiments, we use the same Normalized Distance metric defined in prior work. We compare our proposed method with several baseline methods: UMP-Net, Normal Direction, Screw Parameters, DAgger End2End \cite{ross2011reduction}, FlowBot3D (AF Only) which only uses Articulation Flow, and our model without the Gram-Schmidt correction ("AP Only"), which only uses the inferred Articulation Parameters. Each method above consists of a single model trained across all PartNet-Mobility training categories.