Atari

The average results on 56 atari games

The average results on Atari games. We compare different switching criteria across 56 Atari games with 3 million training steps. We visualize the human normalized reward on the left. The figure on the right shows the average switching cost, which is normalized by the switching cost of "none'' and shown in a log scale.



Results on single atari game

We visualize the training reward over the steps on the top and the switching cost in a log scale at the bottom.

Ablation study

We change the switching interval of the non-adaptive switching criterion, where FIX_n means we switch the deployed policy every n steps. Note that “none” is equivalent to “FIX_1”. Larger n can reduce the switching cost, but may cause the training to fail. 1000 seems like an appropriate interval for this criterion.

We change the similarity threshold for feature based criterion, where a smaller threshold can reduce the switching cost, but may hurt the performance. Note that “none” is equivalent to “Feature_1.

We change the action distribution threshold for policy based criterion, where a larger threshold can reduce the switching cost, but may hurt the performance. Note that “none” is equivalent to “Policy_0”.