Ant-v2: It considers two objective functions: one (f1) is the speed at the x coordinate while the other (f2) is the speed at the y axis. The definitions of the state space and the action space of Ant-v2 are as follows:
The non-dominated policies obtained by different algorithms with different preferences are shown blow. We also demonstrate the video clips of the best policy found by PBMORL.
Non-dominated policies obtained by PBMORL versus PGMORL, RA with different preferences on each objective.
Non-dominated policies obtained by PBMORL versus MORL-Adaptation, META-MORL, MOMPO and MORAL (f1 is preferred).
Non-dominated policies obtained by PBMORL versus MORL-Adaptation, META-MORL, MOMPO and MORAL (f2 is preferred).