DM Control Experiments

We evaluated BeigeMaps as a drop-in modification of existing behavioral distance based RL algorithms, and compared their performance on 7 environments from the Deep Mind for Control Suite. Here are some of the results. Check the paper for full details.

We perform experiments using the following baseline algorithms:

Deep Bisimulation for Control (DBC), Robust-DBC, Kernel Similarity Metric (KSME), and Reducing Approximation Gap (RAP).

Aggregate Metrics

Here are some aggregate performance metrics for all algorithms, averaged over 3 training seeds, 30 evaluation seeds and 7 environments.

Higher values are better for Median, Interquartile-Median (IQM), and Mean. Lower values are better for the Optimality Gap (OptGap). Error bars correspond to 95% CI.

Select icons (Shift+Click) in the legend to focus on specific models.

Performance
Profiles

Here are performance profiles for all algorithms showing the proportion of runs where an algorithm's average return was above a given threshold.

If the curve for a model is strictly above another, the former is said to statistically dominate the latter.

Select icons (Shift+Click) in the legend to focus on individual curves.

Videos

Here are some videos of trained agents for baseline models. In each video, three different evaluation seeds have been stacked together.

DBC

Robust DBC

KSME

RAP

Here are videos for the BeigeMap counterparts of the baseline models above.

DBC+
BeigeMaps

Robust DBC+
BeigeMaps

KSME+
BeigeMaps

RAP+
BeigeMaps

DM Control Experiments

Aggregate Metrics

PerformanceProfiles

Videos

Performance
Profiles