SCONE: A Food Scooping Robot Learning Framework with Active Perception

Yen-Ling Tai¹, Yu Chien Chiu¹, Yu-Wei Chao², Yi-Ting Chen¹

¹National Yang Ming Chiao Tung University, ²NVIDIA

Conference on Robot Learning (CoRL) 2023

The ability to successfully scoop up food items presents a significant challenge for existing robot systems due to the complex states and physical properties of food, including deformability, fragility, fluidity, or granularity, pose significant challenges for existing representations. In this paper, we investigate the potential of active perception for learning meaningful food representations in an implicit manner to improve close-loop robot policy learning. We present SCONE, a food SCOoping robot learNing framEwork that leverages the representations gained from active perception to provide the food-scooping model.

Backround

Interacting (stir)

Manipulation (scoop up)

Considering the unique attributes of food items, utilizing active perception proves to be a highly effective approach. This holds true for a wide range of foods, including those with granular, liquid, or solid textures, as the interactive process facilitates food understanding.

SCONE consist of two stages: the first is the Interacting stage, where the robot interacts with foods to receive the dynamical sensing feedback, inferring the physical properties. In the following Manipulation stage, the policy model can exhibit higher efficacy by leveraging these additional information.

Framework Overview

SCONE utilize active perception to improve the target manipulation tasks in a coarse plus fine manner, consisting of three main feature extraction modules:

(a) Global temporal encoder, which is able to capture the overall environment information from the current observations.

(b) Interactive encoder (coarse), which effectively gathers information and integrates it to form a holistic understanding of the physical properties of the food items.

(c) State retrieval module (fine), which explores and extracts the task-related state information based on current state to facilitate manipulation.

Visualization

2D t-SNE of Embeddings from Interactive Encoder (coarse). Food items belonging to the same category or possessing similar properties tend to cluster together. This result suggests that the interactive encoder effectively captures and represents the inherent similarities among the food items

Attention Scores from State-retrieval Module (fine). The visualization shows that the model, equipped with the state retrieval module, effectively identifies and captures the relative importance of food states within the interacting data at each time step.

Real World Experiment

Experiment 1: Single Scooping

The robot must scoop up food items and transfer them to the target container without causing any spillage or damaging the food in a single attempt. Overall, our proposed method achieves a task success rate of 71% across three difficulty levels, outperforming other baselines.

Sago (basic)

Red Bean (basic)

Sago (extend)

Red Bean (extend)

Penne (peculiar)

Orange (basic)

Macadamia (basic)

Orange (extend)

Macadamia (extend)

Fruit Candy (peculiar)

Experiment 2: Continuous Scooping

The robot needs to repetitively transfer food from one bowl to another until it fails (scoops up nothing). From the results, our framework showcases the ability to successfully scoop food across intricate variations in food states, with higher task success rate and the lower proportion of food/setting spillage and damage.

SCONE can scoop continuously up to 14 times in the red bean scooping task. With more food in the bowl, successful scooping is possible towards the center or edges. When food decreases, the strategy shifts to scooping along the bowl's edge to push it onto the spoon.

Experiment 3: Out-of-Distribusion Food Scooping

To validate the generalization ability of our method, we applied the proposed approach to scoop up the black tea and the pudding. This task is highly challenging as our training data did not include any liquids or semi-solid foods. From the results, he proposed method can generalize to unseen foods and achieve a success rate of over 50%.

Black Tea

Pudding

Acknowledgments

The work is sponsored in part by the Higher Education Sprout Project of the National Yang Ming Chiao Tung University and Ministry of Education (MOE), the Yushan Fellow Program Administrative Support Grant, and the National Science and Technology Council (NSTC) under grants 110-2222-E-A49-001-MY3, 110-2634-F-002-051, 111-2634-F-002-022-, and Mobile Drive Technology Co., Ltd (MobileDrive).

Citation

@inproceedings{tai2023scone,

title={SCONE: A Food Scooping Robot Learning Framework with Active Perception},

author={Yen-Ling Tai and Yu Chien Chiu and Yu-Wei Chao and Yi-Ting Chen},

year={2023},

booktitle={Conference on Robot Learning},

}

If you have any questions, please feel free to contact Yen-Ling Tai

Page updated

Google Sites

Report abuse