SMTO

Skills Made to Order: Efficient Acquisition of Robot Cooking Skills Guided by Multiple Forms of Internet Data

Mrinal Verghese and Christopher Atkeson

Abstract

This study uses various internet data sources to select template robot behaviors to perform skills. Learning contact-rich skills involving tool use from sources of internet data has typically been challenging due to the lack of physical information present in this data. We hypothesize that internet data and foundation models trained on this data may be better suited to selecting among a set of basic robot behaviors to perform these contact-rich skills. We explore three methods of template selection: large language models, comparison to retrieved human video using features from pretrained a video encoder, and comparison to human video using learned optical flow features. Our results show that LLMs are surprisingly capable template selectors despite their lack of visual information, optical flow encoding significantly outperforms video encoders trained with an order of magnitude more data, and important synergies exist between various forms of internet data for template selection. By exploiting these synergies, we are able to create a template selector using multiple forms of internet data that achieves a 79% success rate on a set of 16 different tool-use cooking skills.

Below are videos of templates selected by each of the three template selection methods, as well as a method combining LLM selection with optical flow encoding selection. Note some videos are duplicates where multiple methods selected the same template.

Cut a Bell Pepper with a Knife

02.mp4

LLM Selection

01.mp4

Pretrained Video Encoder

03.mp4

Optical Flow Encoder

02.mp4

LLM + Optical Flow

Cut a Carrot with a Knife

01.mp4

LLM Selection

03.mp4

Pretrained Video Encoder

02.mp4

Optical Flow Encoder

02.mp4

LLM + Optical Flow

Cut a Cucumber with a Knife

01.mp4

LLM Selection

03.mp4

Pretrained Video Encoder

02.mp4

Optical Flow Encoder

01.mp4

LLM + Optical Flow

Cut a Mushroom with a Knife

01.mp4

LLM Selection

03.mp4

Pretrained Video Encoder

02.mp4

Optical Flow Encoder

02.mp4

LLM + Optical Flow

Peel a Carrot with a Peeler

02.mp4

LLM Selection

03.mp4

Pretrained Video Encoder

01.mp4

Optical Flow Encoder

01.mp4

LLM + Optical Flow

Peel a Cucumber with a Peeler

02.mp4

LLM Selection

03.mp4

Pretrained Video Encoder

01.mp4

Optical Flow Encoder

02.mp4

LLM + Optical Flow

Scrape a Cutting Board with a Bench Scraper

01.mp4

LLM Selection

01.mp4

Pretrained Video Encoder

02.mp4

Optical Flow Encoder

02.mp4

LLM + Optical Flow

Scrape a Cutting Board with a Knife

02.mp4

LLM Selection

01.mp4

Pretrained Video Encoder

01.mp4

Optical Flow Encoder

03.mp4

LLM + Optical Flow

Scrub a Cutting Board with a Sponge

01.mp4

LLM Selection

02.mp4

Pretrained Video Encoder

02.mp4

Optical Flow Encoder

02.mp4

LLM + Optical Flow

Scrub a Plate with a Sponge

01.mp4

LLM Selection

02.mp4

Pretrained Video Encoder

03.mp4

Optical Flow Encoder

03.mp4

LLM + Optical Flow

Slice a Pizza with a Pizza Cutter

02.mp4

LLM Selection

01.mp4

Pretrained Video Encoder

03.mp4

Optical Flow Encoder

03.mp4

LLM + Optical Flow

Spread Sauce with a Spoon

01.mp4

LLM Selection

01.mp4

Pretrained Video Encoder

02.mp4

Optical Flow Encoder

02.mp4

LLM + Optical Flow

Stir a Pan with a Spatula (Peppers)

02.mp4

LLM Selection

03.mp4

Pretrained Video Encoder

03.mp4

Optical Flow Encoder

02.mp4

LLM + Optical Flow

Stir a Pan with a Spatula (Sauce)

01.mp4

LLM Selection

03.mp4

Pretrained Video Encoder

02.mp4

Optical Flow Encoder

01.mp4

LLM + Optical Flow

Wipe a Cutting Board with a Towel

01.mp4

LLM Selection

02.mp4

Pretrained Video Encoder

02.mp4

Optical Flow Encoder

02.mp4

LLM + Optical Flow

Wipe a Plate with a Towel

02.mp4

LLM Selection

02.mp4

Pretrained Video Encoder

01.mp4

Pretrained Video Encoder

03.mp4

LLM + Optical Flow

Page updated

Google Sites

Report abuse