Hopper-v3: It considers three objective functions including the forward speed (f1), the jumping height (f2), and the
energy consumption (f3). The definitions of the state space and the action space of Hopper-v3 are as follows:
The non-dominated policies obtained by different algorithms with different preferences are shown blow. We also demonstrate the video clips of the best policy found by PBMORL.
Non-dominated policies obtained by PBMORL versus PGMORL, RA with different preferences on each objective.
Non-dominated policies obtained by PBMORL versus MORL-Adaptation, META-MORL, MOMPO and MORAL (f1 is preferred).
Non-dominated policies obtained by PBMORL versus MORL-Adaptation, META-MORL, MOMPO and MORAL (f2 is preferred).
Non-dominated policies obtained by PBMORL versus MORL-Adaptation, META-MORL, MOMPO and MORAL (f3 is preferred).