A Distributional View on Multi-Objective Policy Optimization
A Distributional View on Multi-Objective Policy Optimization
In this task, trading off multiple reward terms is difficult with scalarized V-MPO and frequently leads to jittery movements. In contrast, multiobjective V-MPO leads to visibly less jittery policies, that smoothly follow the mocap reference motion (in gray). This is most visible when the reference motion is still (at the end of the clip). This task is taken from [1].
[1] Anonymous. CoMic: Co-Training and Mimicry for Reusable Skills. 2020. Under submission.