Choosing between PAL and MAL
To illustrate the relative strengths of PAL and MAL, we study their learning performance in two non-stationary environments.
We consider a task of reaching spatial goals with a 7DOF robot arm. Midway through the learning, we introduce a dynamics perturbation by changing the length of the elbow from scenario 1 to scenario 2 illustrated below. At the point of the perturbation, all algorithms suffer a performance degradation. Since PAL utilizes only recent data, it quickly adapts to the dynamics change and enables the policy to recover. In contrast, MAL adapts the model conservatively and does not forget old inconsistent data, thereby biasing and slowing down policy learning.