Learning Temporally Composable Task Segmentations with Language

Abstract

In this work, we present an approach to identify sub-tasks within a demonstrated robot trajectory with the supervision provided by language instructions. Learning longer horizon tasks is challenging with techniques such as reinforcement learning and behavior cloning. Previous approaches have split these long tasks into shorter tasks that are easier to learn by using statistical change point detection methods. However, classical changepoint detection methods function only with low dimensional robot trajectory data and not with high dimensional inputs such as vision. Our goal in this work is to split longer horizon tasks, represented by trajectories into shorter horizon tasks that can be learned using conventional behavior cloning approaches using guidance from language. In our approach we use techniques from the video moment retrieval problem on robot trajectory data to demonstrate a high-dimensional generalizable change-point detection approach. Our proposed moment retrieval-based approach shows a more than 30% improvement in mean average precision (mAP) for identifying trajectory sub-tasks with language guidance compared to that without language. We perform ablations to understand the effects of domain randomization, sample complexity, views, and sim-to-real transfer of our method. In our data ablation we find that just with a 100 labelled trajectories we can achieve a 61.41 mAP, demonstrating the sample efficiency of using such an approach. Further, behavior cloning models trained on our segmented trajectories outperform a single model trained on the whole trajectory by up to 20%.

Dataset

Code

combined_output.mp4

Model Architecture

Dataset Analysis

This table presents the statistics for the 10,000 episodes generated for the experiment, encompassing both simulated and real-world robot scenarios.

How to read the table

Baseline

The baseline experiment involves training and evaluating the trajectory segmentation model using the same methodology as the language-conditioned approach, but without providing language instructions as input. Instead of using language instructions to guide the segmentation process, each segment of the trajectory is numbered sequentially from 1 through the total number of segments.

Ablation Study 1

In ablation study 1, the evaluation involves using new episodes for tasks within the same scene without domain randomization. This means that the trajectory segmentation model is tested on unseen episodes of tasks that are already familiar in terms of scene configuration but with different variations or instances. The purpose of this study is to assess the generalization ability of the segmentation model to new instances of tasks within familiar scenes.

Ablation Study 2

In ablation study 2, the evaluation involves using new episodes for tasks within the same scene, but with the inclusion of domain randomization. Domain randomization refers to the technique of introducing variability or perturbations in the simulation environment to enhance the model's ability to generalize to unseen variations. This study aims to assess how domain randomization impacts the segmentation model's performance compared to the scenario without domain randomization.

Ablation Study 3

In ablation study 3, the evaluation involves varying the granularity of sub-task segmentation within the trajectories. Specifically, instead of segmenting the trajectory into fine-grained sub-tasks like "pick a red block" and "place on yellow block," the trajectory segmentation model is evaluated on trajectories where these actions are combined into a single, higher-level sub-task, such as "place the red block on the yellow block." This study aims to demonstrate the model's ability to generalize across different levels of sub-task granularity.

Evaluation 1

In evaluation 1, the trajectory segmentation models developed in ablation studies 1 and 2 are evaluated using real-world data collected from the Franka Emika Research 3 (FR3) robot. This evaluation aims to assess the models' ability to generalize from simulated environments to real-world scenarios, thereby testing their sim-to-real capabilities. It's essential to emphasize that the real-world data collected from the Franka Emika Research 3 (FR3) robot has not been leaked to the training set in any manner.

Ablation Study 4

In ablation study 4, the evaluation focuses on assessing the trajectory segmentation models under conditions where specific tasks are withheld during training. This study aims to evaluate the models' generalization capabilities when encountering unseen tasks during the evaluation phase.

Ablation Study 5

In ablation study 5, the evaluation focuses on assessing the trajectory segmentation models under conditions where specific tasks are withheld during training by incorporating domain randomization into the evaluation process. Domain randomization introduces variations in the simulated environment, such as changes in object textures, and other environmental factors.

Evaluation 2

In evaluation 2, we continue the assessment of trajectory segmentation models by evaluating the performance of models derived from ablation studies 4 and 5. The objective remains focused on evaluating the models' capabilities to generalize to real-world scenarios, particularly when exposed to data collected from a physical Franka robot. Importantly, there is a strict adherence to ensuring no data leakage between the training and evaluation datasets, maintaining the integrity of the evaluation process.

Ablation Study 6

In ablation study 6, we conduct a sample efficiency evaluation of the best-performing trajectory segmentation model. This evaluation is carried out without domain randomization and includes the withholding of episodes, utilizing wrist and proprioceptive data configuration. The primary objective is to assess the model's performance across varying levels of data availability, ranging from 2% to 100%, while using the same evaluation data as employed in ablation study 1.

Behavior Cloning Results

Tasks/Demonstrations: The tasks used "put_item_in_drawer", "put_shoes_in_box", "ball_in_hoop", and "stack_blocks". The numbers (250, 500, 750) represent the number of demonstrations each model was trained with. Single Policy shows results for a single policy trained on the entire trajectory. Short Horizon (GT) shows results for multiple short horizon policies stitched together. The policies are trained on ground truth segmented data. Short Horizon (Model) shows results for policies trained on data generated by the learnt language guided trajectory segmentation model. All the tasks under consideration were a part of the training distribution for the trajectory segmentation model.

Page updated

Report abuse