A Distributional View on Multi-Objective Policy Optimization

Run

1: exaggerated joint movements

2: slower, with less penalty

3: slower, with less penalty

4: similar speed as 1, but less penalty

5: similar penalty as 2, but faster

6: human-like running!

1: walks well, but with large action norm penalty

2: walks with low action norm

3: doesn't learn to walk

4: as good at task as 2, but with less action norm

5: walks slower, with less action norm

6: walks slower, with less action norm

1: stands, but with high action norm used for balancing

2: stands with less action norm

3: as good at task as 1 and 2, but with less action norm

4: less action norm

Page updated

Google Sites

Report abuse