Learning Sequential Acquisition Policies for Robot-Assisted Feeding
Priya Sundaresan, Jiajun Wu, Dorsa Sadigh
Abstract
A robot providing mealtime assistance must perform specialized maneuvers with various utensils in order to pick up a range of food items for downstream feeding. Beyond performing these dexterous low-level skills, an assistive robot must also plan these strategies in sequence over a long horizon to clear a plate and complete a meal. Previous methods in robot-assisted feeding introduce highly specialized primitives for food handling without a means to compose them together. Meanwhile, existing approaches to long-horizon manipulation lack the flexibility to embed highly specialized primitives into their frameworks. We propose Visual Action Planning OveR Sequences (VAPORS), a framework for long-horizon food acquisition. VAPORS learns a policy for high-level action selection by leveraging learned latent plate dynamics in simulation. To carry out sequential plans in the real-world, VAPORS delegates action execution to visually parameterized primitives. Experimentally, we validate our approach on complex real-world acquisition trials involving noodle acquisition and bimanual scooping of jelly beans. Across 38 plates, VAPORS demonstrates the ability to acquire much more efficiently than baselines, generalize across realistic plate variations such as toppings and sauces, and qualitatively appeal to user feeding preferences in a survey conducted across 49 individuals.
Overview Video
VAPORS: High-Level Actions
Given a library of pre-trained manipulation primitives, such as grouping, twirling, or scooping, a high-level policy learns to plan sequences of skills using model-based planning over a learned latent plate dynamics model. We train the policy:
Entirely in simulation, which need not capture intricate real food dynamics
From segmented image observations, which captures the rough distribution of food items and is transferable between simulation and reality
Training
During training, we learn to model high-level plate dynamics via a learned encoder, compressing segmented images to latent states, and a learned transition model which predicts action-conditioned latent state transitions.
Planning
Once trained, we sample different high-level action sequences, select the one which maximizes predicted reward under a learned reward model, and execute the first action in an MPC-style loop.
VAPORS: Low-Level Actions
In order to execute high-level action plans, VAPORS employs a low-level policy which infers the parameters of selected primitives like grouping, twirling, and scooping given visual input. We train:
A self-supervised segmentation model, which learns to infer binary segmentation masks of food from a global RGB image
A food orientation model, which estimates the local orientation of a food item from a local RGB crop
Finally, we use these pose estimates to instantiate actions parameterized by utensil roll, pitch, and position (right).
Simulator
We implement a custom food manipulation simulator to train the high-level policy in Blender 2.92, and expose an Open AI gym-style API. The simulator currently supports RGB rendering, segmentation masks, deformable noodles, and granular piles of food items.
Hardware
We implement two real feeding scenarios, real noodle acquisition and bimanual scooping with Franka Panda 7DoF robots, custom 3D-printed mounts, wrist-mounted Realsense cameras, and an actuated fork.
Results
We evaluate VAPORS in the real world on the tasks of noodle acquisition and scooping of granular items.
Quantitative Plate Clearance
We compare VAPORS against 2 baseline approaches:
Heuristic adopts a naive strategy of grouping until the plate coverage is below a pre-defined threshold, then acquiring. This leads to a greedy group-then-acquire strategy which takes wasteful grouping actions and results in excessive bite sizes.
Acquire Only greedily takes an acquisition action at each timestep (twirl or scoop), which results in slow plate clearance.
VAPORS in contrast is able to effectively leverage multiple strategies, and compose them via learned model-based planning, for the most efficient plate clearance.
OURS
Heuristic
Acquire Only
Qualitative Evaluation
In qualitative user studies, VAPORS is the preferred method with statistical significance across various criteria, overall, and in terms of pairwise comparisons.
Generalization Testing
We evaluate VAPORS on 3 tiers of difficulty for noodle acquisition:
Tier 1: Differently shaped plain noodles (Udon, Dan Dan, Pappardelle, Spaghetti)
Tier 2: Same noodles as Tier 1, with added sauce + garnishes
Tier 3: Noodles ordered on Doordash (Panda Express + Pasta)
Below, we show trials across all tiers, and report the % of the plate cleared within 8 actions.