Learning Sequential Acquisition Policies for Robot-Assisted Feeding

Priya Sundaresan,       Jiajun Wu,      Dorsa Sadigh

Abstract

A robot providing mealtime assistance must perform specialized maneuvers with various utensils in order to pick up a range of food items for downstream feeding. Beyond performing these dexterous low-level skills, an assistive robot must also plan these strategies in sequence over a long horizon to clear a plate and complete a meal. Previous methods in robot-assisted feeding introduce highly specialized primitives for food handling without a means to compose them together. Meanwhile, existing approaches to long-horizon manipulation lack the flexibility to embed highly specialized primitives into their frameworks. We propose Visual Action Planning OveR Sequences (VAPORS), a framework for long-horizon food acquisition. VAPORS learns a policy for high-level action selection by leveraging learned latent plate dynamics in simulation. To carry out sequential plans in the real-world, VAPORS delegates action execution to visually parameterized primitives. Experimentally, we validate our approach on complex real-world acquisition trials involving noodle acquisition and bimanual scooping of jelly beans. Across 38 plates, VAPORS demonstrates the ability to acquire much more efficiently than baselines, generalize across realistic plate variations such as toppings and sauces, and qualitatively appeal to user feeding preferences in a survey conducted across 49 individuals.

Overview Video

rss_supp.mp4

VAPORS: High-Level Actions

Given a library of pre-trained manipulation primitives, such as grouping, twirling, or scooping, a high-level policy learns to plan sequences of skills using model-based planning over a learned latent plate dynamics model. We train the policy:

Training 

During training, we learn to model high-level plate dynamics via a learned encoder, compressing segmented images to latent states, and a learned transition model which predicts action-conditioned latent state transitions. 

Planning

Once trained, we sample different high-level action sequences, select the one which maximizes predicted reward under a learned reward model, and execute the first action in an MPC-style loop.

VAPORS: Low-Level Actions

In order to execute high-level action plans, VAPORS employs a low-level policy which infers the parameters of selected primitives like grouping, twirling, and scooping given visual input. We train:

Finally, we use these pose estimates to instantiate actions parameterized by utensil roll, pitch, and position (right). 

Simulator

We implement a custom food manipulation simulator to train the high-level policy in Blender 2.92, and expose an Open AI gym-style API. The simulator currently supports RGB rendering, segmentation masks, deformable noodles, and granular piles of food items.

Hardware

We implement two real feeding scenarios, real noodle acquisition and bimanual scooping with Franka Panda 7DoF robots, custom 3D-printed mounts, wrist-mounted Realsense cameras, and an actuated fork. 

Results

We evaluate VAPORS in the real world on the tasks of noodle acquisition and scooping of granular items.

Quantitative Plate Clearance

We compare VAPORS against 2 baseline approaches:

VAPORS in contrast is able to effectively leverage multiple strategies, and compose them via learned model-based planning, for the most efficient plate clearance.

OURS

ours.mp4

Heuristic

heuristic.mp4

Acquire Only

twirl only.mp4
ours.mp4
heuristic.mp4
scoop only.mp4

Qualitative Evaluation

In qualitative user studies, VAPORS is the preferred method with statistical significance across various criteria, overall, and in terms of pairwise comparisons.

Generalization Testing

We evaluate VAPORS on 3 tiers of difficulty for noodle acquisition:

Below, we show trials across all tiers, and report the % of the plate cleared within 8 actions.

spaghetti_tier1.mp4

Tier 1: 90 ± 6%

spaghetti_tier2.mp4

Tier 2:  68 ± 16%

spaghetti_tier3.mp4

Tier 3:  64 ± 13%