Multi-task robot data for dual-arm fine manipulation

Heecheol Kim, Yoshiyuki Ohmura, Yasuo Kuniyoshi

Intelligent Systems and Informatics Laboratory

The University of Tokyo


Contact: multi-task-fine [at] isi.imi.i.u-tokyo.ac.jp

http://www.isi.imi.i.u-tokyo.ac.jp

 Abstract

In the field of robotic manipulation, deep imitation learning is recognized as a promising approach for acquiring manipulation skills. Additionally, learning from diverse robot datasets is considered a viable method to achieve versatility and adaptability. In such research, by learning various tasks, robots achieved generality across multiple objects. However, such multi-task robot datasets have mainly focused on single-arm tasks that are relatively imprecise, not addressing the fine-grained object manipulation that robots are expected to perform in the real world. This paper introduces a dataset of diverse object manipulations that includes dual-arm tasks and/or tasks requiring fine manipulation. To this end, we have generated dataset with 224k episodes (150 hours, 1,104 language instructions) which includes dual-arm fine tasks such as bowl-moving, pencil-case opening or banana-peeling,  and this data is publicly available. Additionally, this dataset includes visual attention signals as well as dual-action labels, a signal that separates actions into a robust reaching trajectory and precise interaction with objects, and language instructions to achieve robust and precise object manipulation. We applied the dataset to our Dual-Action and Attention (DAA), a model designed for fine-grained dual arm manipulation tasks and robust against covariate shifts. The model was tested with over 7k total trials in real robot manipulation tasks, demonstrating its capability in fine manipulation.

Paper Link

Gaze-based visual attention

Fine manipulation with dual-action

Dual-arm manipulation

Autonomous Dual-Action Task Examples

Pencil-case

Open a zipper of a pencil case

Needle-threading

Thread a needle

Banana-peeling

Peel a banana

Handkerchief

Fold a handkerchief

Move-bowl

Move a bowl to a tray

Pick

Pick up a thin coin

Grasp

Grasp small objects

Bottle

Upright a bottle

Place

Place an object in a bowl

Dual-Action and Attention Dataset

The Dual-Action and Attention (DAA) Dataset comprises 224,210 episodes of demonstrations, featuring dual-arm and/or fine manipulation skills, accompanied by gaze attention signals and dual-action labels that aid in learning fine manipulation. You can download the data from the following link (divided into each task group). The code for data conversion and data descriptions are available on our Github page.

Terms of use

By downloading the dataset from Intelligent System & Information Laboratory in the University of Tokyo, you are agreeing to the following terms and conditions.

1. Permission is granted to use the dataset downloaded from this site, henceforth "the data"set, only for non-commercial, ethical and peaceful purposes.

2. Other than the rights granted herein, ISI retains all rights, titles, and interests regarding the dataset subject to this Terms of Use.

3. ISI will not be responsible for any problems whatsoever resulting from using the dataset.

4. You must not distribute copies of the dataset, or copies of dataset(s) derived from the dataset, to others without prior written permission from ISI.

5. You must not remove or alter any copyright or other proprietary notices associated with the dataset.

6. The dataset is provided "as is" and ISI, all its superior organizations and the collaborators of ISI do not make any warranty, express or implied, including but not limited to warranties of merchantability and fitness for a particular purpose, nor do they assume any liability or responsibility for the use of this dataset.

7. In your publication (academic papers, books, and others) describing the results produced using the dataset, the following paper must be cited.

[]

8. The Tokyo District Court of Japan shall apply to all disputes under this agreement.

Robot Configuration

You can build the robot used in this research by following this document.

Dual-Action and Attention

DAA leverages a gaze predictor to selectively focus on high-resolution pixels pertinent to the task, ensuring robust manipulation in the presence of unrelated objects or background noise. This system distinguishes between two types of actions: global-action, which generates a robust overall trajectory, and local-action, which allows for precise object manipulation. During both the demonstration phase and the model inference, the robot's trajectory is divided into these two components: the trajectory-based global-action for reaching movement and the reactive local-action for precise object manipulation. This dual-action mechanism ensures efficient and precise manipulation in dynamic environments.

Result

We conducted an evaluation of our multi-task model's generalization ability across a variety of tasks and objects. This involved attempting 103 tasks with objects from the training set and 39 tasks with new objects not included in the training set. Task-specific models were trained for each object in each task group. The results reveal that the multi-task model achieved a 69.6% success rate across all tasks, significantly outperforming the task-specific models which only achieved a 12.2% success rate. Additionally, the multi-task model demonstrated effective generalization with a 61.5% success rate on tasks involving new objects.

Citation

If the DAA dataset is beneficial, please cite:

@article{kim2024multi,

  title={Multi-task robot data for dual-arm fine manipulation},

  author={Kim, Heecheol and Ohmura, Yoshiyuki and Kuniyoshi, Yasuo},

  journal={arXiv preprint arXiv:2401.07603},

  year={2024}}