FR-SAC learned behaviors and reward learning curves.
FR-SAC in Section 4.2.1
FR-SAC in Section 4.2.2
FR-SAC in Section 4.2.2
Averaged reward over the whole episode during the training. They all increase, indicating the RL optimization part is doing well.Â