Hopper-v2: It considers two objective functions: one is the forward speed (f1) while the other is the jumping height (f2). The definitions of the state space and the action space of Hopper-v2 are as follows:
The non-dominated policies obtained by different algorithms with different preferences are shown blow. We also demonstrate the video clips of the best policy found by PBMORL.
Non-dominated policies obtained by PBMORL versus PGMORL, RA with different preferences on each objective.
Non-dominated policies obtained by PBMORL versus MORL-Adaptation, META-MORL, MOMPO and MORAL (f1 is preferred).
Non-dominated policies obtained by PBMORL versus MORL-Adaptation, META-MORL, MOMPO and MORAL (f2 is preferred).