Ant-v2

Ant-v2: It considers two objective functions: one (f1) is the speed at the x coordinate while the other (f2) is the speed at the y axis. The definitions of the state space and the action space of Ant-v2 are as follows:

The non-dominated policies obtained by different algorithms with different preferences are shown blow. We also demonstrate the video clips of the best policy found by PBMORL.

Non-dominated policies obtained by PBMORL versus PGMORL, RA with different preferences on each objective.

Non-dominated policies obtained by PBMORL versus MORL-Adaptation, META-MORL, MOMPO and MORAL (f1 is preferred).

Non-dominated policies obtained by PBMORL versus MORL-Adaptation, META-MORL, MOMPO and MORAL (f2 is preferred).

Page updated

Google Sites

Report abuse