A Distributional View on Multi-Objective Policy Optimization
A Distributional View on Multi-Objective Policy Optimization
1: exaggerated joint movements
2: slower, with less penalty
3: slower, with less penalty
4: similar speed as 1, but less penalty
5: similar penalty as 2, but faster
6: human-like running!
1: walks well, but with large action norm penalty
2: walks with low action norm
3: doesn't learn to walk
4: as good at task as 2, but with less action norm
5: walks slower, with less action norm
6: walks slower, with less action norm
1: stands, but with high action norm used for balancing
2: stands with less action norm
3: as good at task as 1 and 2, but with less action norm
4: less action norm