Articulated Object Manipulation with Coarse-to-fine Affordance for Mitigating the Effect of Point Cloud Noise

Suhan Ling*, Yian Wang*, Shiguang Wu, Yuzheng Zhuang, Tianyi Xu, Yu Li, Chang Liu, Hao Dong†

*: equal contribution, †: corresponding author

Paper

Code

Abstract

3D articulated objects are inherently challenging for manipulation due to the varied geometries and intricate functionalities associated with articulated objects. Point-level affordance, which predicts the per-point actionable score and thus proposes the best point to interact with, has demonstrated excellent performance and generalization capabilities in articulated object manipulation. However, a significant challenge remains: while previous works use perfect point cloud generated in simulation, the models cannot directly apply to the noisy point cloud in the real-world. To tackle this challenge, we leverage the property of real-world scanned point cloud that, the point cloud becomes less noisy when the camera is closer to the object. Therefore, we propose a novel coarse-to-fine affordance learning pipeline to mitigate the effect of point cloud noise in two stages. In the first stage, we learn the affordance on the noisy far point cloud which includes the whole object to propose the approximated place to manipulate. Then, we move the camera in the front of the approximated place, scan a less noisy point cloud containing precise local geometries for manipulation, and learn affordance on such point cloud to propose fine-grained final actions. The proposed method is thoroughly evaluated both using large-scale simulated noisy point clouds mimicing real-world scans, and in the real world scenarios, with superiority over existing methods, demonstrating the effectiveness in tackling the noisy real-world point cloud problem.

Video

final.mp4

Overview

Figure 1. Our proposed coarse-to-fine affordance learning framework articulated object manipulation with real-world noisy observations.

Pipeline

Figure 2. Framework overview. Given the noisy point cloud in the far view as observation, our framework extracts per-point features using PointNet++ containing multi-scale pointnet, and then predicts a per-point coarse affordance map. We move the camera to the front of the point with the highest coarse affordance score, and take a less noisy point cloud. The framework uses another PointNet++ to extract per-point features of the fine point cloud, with the integration of the features of the far point cloud. The predicted affordance proposes the fine-grained actions.

Visualization

Figure 3. Visualization of far view coarse affordance and near view fine affordance on noisy point clouds. The task for the microwave is ``push close'', while the task for others is ``pull open''. The first three shapes come from simulation, while the last two come from real-world scans.

Experiments

Page updated

Google Sites

Report abuse