Behavior-agnostic Task Inference for Robust Offline In-context Reinforcement Learning

ICML 2025

Long Ma, Fangwei Zhong^, Yizhou Wang

(^corresponding author)

Abstract

The ability to adapt to new environments with noisy dynamics and unseen objectives is crucial for AI agents. In-context reinforcement learning (ICRL) has emerged as a paradigm to build adaptive policies, employing a context trajectory of the test-time interactions to infer the true task and the corresponding optimal policy efficiently without gradient updates. However, ICRL policies heavily rely on context trajectories, making them vulnerable to distribution shifts from training to testing and degrading performance, particularly in offline settings where the training data is static. In this paper, we highlight that most existing offline ICRL methods are trained for approximate Bayesian inference based on the training distribution, rendering them vulnerable to distribution shifts at test time and resulting in poor generalization. To address this, we introduce Behavior-agnostic Task Inference (BATI) for ICRL, a model-based maximum-likelihood solution to infer the task representation robustly. In contrast to previous methods that rely on a learned encoder as the approximate posterior, BATI focuses purely on dynamics, thus insulating itself against the behavior of the context collection policy. Experiments on MuJoCo environments demonstrate that BATI effectively interprets out-of-distribution contexts and outperforms other methods, even in the presence of significant environmental noise.

Visualizations

We show some visualizations of policy rollouts in our evaluation environments.

AntDir

We show here the behaviors for task 1.935 (~pi/2), corresponding to moving approximately upwards. BATI and CSRO go in the right direction while FOCAL and UNICORN do not.

BATI

CSRO

FOCAL

UNICORN

HalfCheetahVel

The true target velocity is 1.277, corresponding to moving slowly. BATI correctly recognizes the task and executes the desired behavior, while baselines either run much faster than desired (FOCAL, UNICORN) or get stuck (CSRO).