Research

Symmetry in Multi-goal Reinforcement Learning

Supervised by Prof. Paul Weng at Shanghai Jiao Tong University (SJTU).

[PDF | Report 👇]

• Explored a new data augmentation method leveraging the symmetry in the robot control domain. Designed an invariant transformation method which maps the observed observation to a smaller space to improve the learning efficiency.

• Mainly focused on DDPG algortihm.

Invariant State Transformation

All the states on the plane can be transformed (e.g., by rotation) into the states on the middle line, by which the agent just needs to learn a policy which deals with states on this line and generalizes it to the entire plane.

Multi-dimensional State

If the state is composed of multiple vectors (e.g., multiple position vectors), a similar transformation can be applied to states as well. Every state is transformed to a state that has the minimum "distance" to a referece point (the green point in the figure).

Report

KangleMu_final_report.pdf

Abstract

Data augmentation via invariant transformations generates virtual trajectories from observed ones to improve sample efficiency. However, the existence of invariant transformations implies that there is a certain amount of redundancy in policies if agents try to react to any possible situations. This report introduces a policy generalization method which makes agents be able to deal with any transformed states once they have learned how to deal with any of them. In other words, we exploit invariant transformations to reduce state space resulting in a faster convergence rate compared to data augmentation methods. Besides that, leveraging symmetries in tasks, we propose an image-based data augmentation method which performs reflection to images directly to generate new trajectories. The experiment is still in progress.

"Learn and then generalize."