Egor Cherepanov1,2, Nikita Kachaev1, Alexey K. Kovalev1,2, and Aleksandr I. Panov1,2
1AIRI, Moscow, Russia, 2MIPT, Dolgoprudny, Russia
{cherepanov,kachaev,kovalev,panov}@airi.net
pip install mikasa-robo-suite
MIKASA-ROBO/ShellGameTouch-v0
MIKASA-ROBO/ChainOfColors7-v0
MIKASA-ROBO/TakeItBack-v0
MIKASA (Memory-Intensive Skills Assessment Suite for Agents) is a novel benchmark for memory-intensive RL. It includes:
A classification of memory-intensive RL tasks into four types: object memory, spatial memory, sequential memory, and memory capacity
MIKASA-Base, a unified benchmark for evaluating memory-enhanced RL agents
MIKASA-Robo, a benchmark featuring 32 tasks designed to assess memory in tabletop robotic manipulation
In daily life, humans routinely perform tasks that rely on memory, such as recalling the location of the bread while making a sandwich or remembering which box the laundry detergent was stored in. However, robots designed to take over these simple tasks are often trained on scenarios that don't require memory, limiting their ability to handle such tasks effectively.
MIKASA-Robo includes 32 tasks spread across 12 groups, each with varying levels of difficulty. It retains all the advantages of ManiSkill3, such as efficient GPU parallelization. Training can be conducted using either state information (in oracle mode) or RGB+joint data (in memory-intensive mode).
ShellGameTouch-v0
RememberColor9-v0
RotateStrictPos-v0
InterceptMedium-v0
TakeItBack-v0
RememberShape9-v0
InterceptGrabMedium-v0
ChainOfColors7-v0
RememberShapeAndColor5x3-v0
SeqOfColors7-v0
RotateLenientPos-v0
BunchOfColors7-v0
We develop a comprehensive yet practically simple classification of memory-intensive tasks, inspired by cognitive sciences. Our classification framework distills the complex landscape of memory challenges into four essential categories, enabling systematic evaluation while avoiding unnecessary complexity. The proposed classification consists of:
Object Memory. Tasks that evaluate an agent's ability to maintain object-related information over time, particularly when objects become temporarily unobservable. These tasks align with the cognitive concept of object permanence, requiring agents to track object properties when occluded, maintain object state representations, and recognize encountered objects.
Spatial Memory. Tasks focused on environmental awareness and navigation, where agents must remember object locations, maintain mental maps of environment layouts, and navigate based on previously observed spatial information.
Sequential Memory. Tasks that test an agent's ability to process and utilize temporally ordered information, similar to human serial recall and working memory. These tasks require remembering action sequences, maintaining order-dependent information, and using past decisions to inform future actions.
Memory Capacity. Tasks that challenge an agent's ability to manage multiple pieces of information simultaneously, analogous to human memory span. These tasks evaluate information retention limits and multi-task information processing.
We built our MIKASA-Robo and MIKASA-Base benchmarks on this classification.
As the table on the right shows, , the RL domain lacks standard diagnostic benchmarks for memory evaluation, and existing environments are scattered and narrowly focused. To address this, we introduce MIKASA-Base, a unified, practical benchmark that systematically tests memory across diverse tasks.
We have gathered the main memory-intensive RL benchmarks under one framework MIKASA-Base with a universal interface that allows you to use memory-intensive environments quickly and without additional settings.
In addition, we derived the minimum set of environments required to maximize the RL agent's memory validation quality.
If you find our work useful, please cite our paper:
@misc{cherepanov2025mikasa,
title={Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning},
author={Egor Cherepanov and Nikita Kachaev and Alexey K. Kovalev and Aleksandr I. Panov},
year={2025},
eprint={2502.10550},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2502.10550},
}