Memory, Benchark & Robots: A Benchmark for Solving Complex Tasks with RL

Egor Cherepanov1,2, Nikita Kachaev1, Alexey K. Kovalev1,2, and Aleksandr I. Panov1,2

1AIRI, Moscow, Russia, 2MIPT, Dolgoprudny, Russia

{cherepanov,kachaev,kovalev,panov}@airi.net

arXiv

Code MIKASA-Robo

Code MIKASA-Base

MIKASA-Robo Datasets

Try it now! (and see details in github)

pip install mikasa-robo-suite

MIKASA-ROBO/ShellGameTouch-v0

MIKASA-ROBO/ChainOfColors7-v0

MIKASA-ROBO/TakeItBack-v0

TL;DR

MIKASA (Memory-Intensive Skills Assessment Suite for Agents) is a novel benchmark for memory-intensive RL. It includes:

A classification of memory-intensive RL tasks into four types: object memory, spatial memory, sequential memory, and memory capacity
MIKASA-Base, a unified benchmark for evaluating memory-enhanced RL agents
MIKASA-Robo, a benchmark featuring 32 tasks designed to assess memory in tabletop robotic manipulation

Why do we need a benchmark for memory-intensive RL in robotics?

In daily life, humans routinely perform tasks that rely on memory, such as recalling the location of the bread while making a sandwich or remembering which box the laundry detergent was stored in. However, robots designed to take over these simple tasks are often trained on scenarios that don't require memory, limiting their ability to handle such tasks effectively.

MIKASA-Robo

MIKASA-Robo includes 32 tasks spread across 12 groups, each with varying levels of difficulty. It retains all the advantages of ManiSkill3, such as efficient GPU parallelization. Training can be conducted using either state information (in oracle mode) or RGB+joint data (in memory-intensive mode).

ShellGameTouch-v0

RememberColor9-v0

RotateStrictPos-v0

InterceptMedium-v0

TakeItBack-v0

RememberShape9-v0

InterceptGrabMedium-v0

ChainOfColors7-v0

RememberShapeAndColor5x3-v0

SeqOfColors7-v0

RotateLenientPos-v0

BunchOfColors7-v0

Memory Tasks Classification

We develop a comprehensive yet practically simple classification of memory-intensive tasks, inspired by cognitive sciences. Our classification framework distills the complex landscape of memory challenges into four essential categories, enabling systematic evaluation while avoiding unnecessary complexity. The proposed classification consists of:

Object Memory. Tasks that evaluate an agent's ability to maintain object-related information over time, particularly when objects become temporarily unobservable. These tasks align with the cognitive concept of object permanence, requiring agents to track object properties when occluded, maintain object state representations, and recognize encountered objects.
Spatial Memory. Tasks focused on environmental awareness and navigation, where agents must remember object locations, maintain mental maps of environment layouts, and navigate based on previously observed spatial information.
Sequential Memory. Tasks that test an agent's ability to process and utilize temporally ordered information, similar to human serial recall and working memory. These tasks require remembering action sequences, maintaining order-dependent information, and using past decisions to inform future actions.
Memory Capacity. Tasks that challenge an agent's ability to manage multiple pieces of information simultaneously, analogous to human memory span. These tasks evaluate information retention limits and multi-task information processing.

We built our MIKASA-Robo and MIKASA-Base benchmarks on this classification.

MIKASA-Base

As the table on the right shows, , the RL domain lacks standard diagnostic benchmarks for memory evaluation, and existing environments are scattered and narrowly focused. To address this, we introduce MIKASA-Base, a unified, practical benchmark that systematically tests memory across diverse tasks.

We have gathered the main memory-intensive RL benchmarks under one framework MIKASA-Base with a universal interface that allows you to use memory-intensive environments quickly and without additional settings.

In addition, we derived the minimum set of environments required to maximize the RL agent's memory validation quality.

Citation

If you find our work useful, please cite our paper:

@misc{cherepanov2025mikasa,

title={Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning},

author={Egor Cherepanov and Nikita Kachaev and Alexey K. Kovalev and Aleksandr I. Panov},

year={2025},

eprint={2502.10550},

archivePrefix={arXiv},

primaryClass={cs.LG},

url={https://arxiv.org/abs/2502.10550},

}

Page updated

Google Sites

Report abuse