Discussion

What inspiration can we learn from the A4T framework?

(1) The existing works divide the manipulation of transparent objects into two steps: depth reconstruction and manipulation planning. Our A4T method is the first one that couples depth reconstruction and manipulation planning. It would be interesting to develop some other methods that can infer the depth and plan the manipulation together.

(2) Our A4T method is the first one that reconstructs every individual transparent object in a multi-step way. Different reconstruction methods might be suitable for different parts of one object considering their geometric features. It would be also interesting to use other reconstruction methods (more than the optimisation method) to achieve a multi-step depth reconstruction.

Can A4T generalise to different backgrounds?

We reduce the dependency on background colour by putting transparent objects on several tables with different textures. In this way, our affordance detection model can perform well on some other tables even with complex textures, as shown in Fig. 1. However, the generalisation to unseen tables is not valid all the time due to the limited number of tables used for training.

Figure 1. Generalisation tests on different unseen backgrounds. The first three rows use solid colour backgrounds and the last row uses a complex background.

Can A4T generalise to unseen objects?

Our method would generalise well not only to new graspable containers, but also to geometries beyond graspable containers. To validate the generalisation of our proposed A4T approach to new graspable containers, we use two unseen containers (the left two objects in Fig. 2) to conduct the experiments of both affordance detection and depth reconstruction. As shown in Fig. 3, the affordance map and the depth maps of the unseen objects can be predicted with our hierarchical AffordanceNet and reconstructed with our multi-step depth reconstruction method, respectively. As for objects beyond graspable containers, two example objects from [1] are shown in Fig. 2 (the right two objects). For such objects, they do not have “contain” affordance regions as they do not have cavities to contain content. This makes the plane fitting method in our multi-step reconstruction unneeded and hence our multi-step reconstruction will degenerate into the ClearGrasp method [1].

Figure 2. Generalisation tests on different objects. Four novel objects. From left to right: the first two objects are two new graspable containers with similar geometries to the objects we used in the dataset and are used to validate the generalisation of our A4T approach to unseen graspable containers. The last two objects without “contain” regions were tested in ClearGrasp [1] and could be reconstructed well with the vanilla global optimisation method in [2].

Figure 3. Generalisation tests on unseen graspable containers. The first to five columns represent the two unseen graspable containers, input RGB images, predicted affordance maps, input depth maps, and output depth maps. By comparing the input and output depths, it can be seen that the depth of the two novel objects can be reconstructed well using our proposed A4T approach.

References:

[1] Sajjan, S., Moore, M., Pan, M., Nagaraja, G., Lee, J., Zeng, A., & Song, S. (2020). Clear grasp: 3d shape estimation of transparent objects for manipulation. In 2020 IEEE International Conference on Robotics and Automation (ICRA) (pp. 3634-3642). IEEE.

[2] Zhang, Y., & Funkhouser, T. (2018). Deep depth completion of a single rgb-d image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 175-185).