Contributors: Aleksandra Faust, Olivier Pietquin, Shiyu Huang, Wei-Wei Tu
Initial draft copied from Aleksandra's document, please elaborate.
RL benchmarks tend to be clustered based on the general application area. Here are the major benchmarks per category.
Gymnasium (replacement of OpenAI Gym)
Real World RL suite (RW4RL), [Dulac-Arnold et al., 2019] control benchmark with perturbations to evaluate robustness
Hanabi [Bard et al. (2020)]
Hide-And-Seek Domain [Baker et al. (2019)]
Multi-agent Particle World [Lowe et al. (2017)]
StarCraft II Micromanagement (SMAC) [Samvelyan et al. 2019]
Data for RL (D4RL) [Fu et al., 2021]
Air Learning for aerial robots [Krishnan et al., 2021]
GridAlive for Power Grids control [Marot et al, 2021]