Conditional action distribution visualization from our experiments. Our method demonstrates clear bimodal distribution, showing that RNR-DP preserves multi-modality property.
To demonstrate the preservation of multi-modality in practice, we visualize action distributions from a state in the ManiSkill2 StackCube task, using our RNR-DP. We sample 1000 actions from our policy then applies PCA dimensionality reduction for visualization. We use histograms to visualize the discrete relative density of these action samples and use kernel density estimation (KDE) to visualize the estimated probability density function. The results are shown in left figure.
Figure 17: Detailed illustration of the process of Mixture Scheduling.
We provide a detailed discussion of the mechanism and motivation behind Mixture Noise Scheduling. As illustrated in Figure 17, the random schedule is teaching the model to denoise actions independently, where each action is assigned a random noise level. The linear schedule, on the other hand, maintains an increasing noise level across actions, closely aligning with our inference process through the noise-relaying buffer. Our mixture schedule not only trains the model to denoise actions independently, as in the DP setting, but also ensures smooth transitions between consecutive actions. This better aligns with the noise-relaying buffer structure, resulting in more diverse and robust trainings.
Figure 18: Detailed illustration of the process of Laddering Initialization. Everything happens before our policy interacts with the environment.
We provide a clear and concise discussion of Laddering Initialization. As illustrated in Figure 18, to transition from random noise to an increased noise level suitable for inference through the noise-relaying buffer, we perform several denoising steps. This process results in a buffer with laddered noise, ensuring a more structured and effective initialization.