Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models

Ron Vainshtein Zohar Rimon Shie Mannor Chen Tessler

Technion - Israel Institute of Technology

NVIDIA, Israel

https://arxiv.org/abs/2503.22886

Direction (with prior joint conditioning)

Reach (with prior joint conditioning)

OOD Perturbations Generalization

Ground Friction

MultiModal Prompting

Joint Conditioning - Direction Correction

Text Prompting & Joint Conditioning - Kick Strike

Visual Results

Below, we show the behavior of Masked Mimic adapted to different downstream tasks using Tokens

Direction (with prior joint conditioning)

The agent walkes in a randomly chosen direction

Steering (with prior joint conditioning)

The agent walks in and faces randomly chosen directions

Reach (with prior joint conditioning)

The agent reaches for a randomly placed goal with the right hand

Strike

The agent strikes a target at a random location

Long Jump

The agent maximizes the jump distance from a jump location

Pure RL

Below, we show the behavior of directly optimizing PPO for the downstream tasks

OOD Perturbations Generalization

Gravity

From left to right, gravity is set to: -14, -16, -18, -20

Ground Friction

From left to right, ground friction is set to: -0.5, -0.4, -0.3, -0.2

MultiModal Prompting

Joint Conditioning - Direction Correction

Text Prompting & Joint Conditioning - Kick Strike

Page updated

Google Sites

Report abuse