Performance of our method on four benchmark environments. Higher reward ratios correspond to better performing policies. Lower JS divergence corresponds to more precise measure reconstruction.
For each task, the y-axis represents the percentage of the archive grid that has at least the corresponding amount of returns on rollout (shown on the x-axis).