A Distributional View on Multi-Objective Policy Optimization

In this task, trading off multiple reward terms is difficult with scalarized V-MPO and frequently leads to jittery movements. In contrast, multiobjective V-MPO leads to visibly less jittery policies, that smoothly follow the mocap reference motion (in gray). This is most visible when the reference motion is still (at the end of the clip). This task is taken from [1].

V-MPO

multi-objective V-MPO (ours)

[1] Anonymous. CoMic: Co-Training and Mimicry for Reusable Skills. 2020. Under submission.

Page updated

Google Sites

Report abuse