Anonymous Authors
Submitted to ICLR 2024
Below are short videos showing the deblending process in the instructive example experiment described in Sec. 4.2 of the paper.
Figure 1 (Source distribution affects deblending): The target distribution is bi-modal Gaussian distribution centered at (1,1) and (3,3) and 10 steps are taken at inference time.
Fig. 1a: Source distribution is N((-2,-2), 0.1I)
Fig. 1b: Source distribution is N((1,1), 0.1I)
Fig. 1c: Source distribution is N((2,2), 0.1I)
Figure 2 (two-stage deblending): The target distribution is bi-modal Gaussian distribution centered at (1,1) and (3.5,3.5). Initial source distribution (red) is N((-2,-2), I)
Fig. 2b: 10 step deblending
Fig. 2c: 3 step deblending
Fig. 2d: two-stage deblending (3 + 7)
Figure 3 (effects of conditioning): target distribution of a bi-modal Gaussian conditioned on a Bernouli random variable C: w.p. 0.5 the bi-modal Gaussian is centered around (−6, −6) and (−4, −4), or around (4, 4) and (6, 6).
Fig. 3a: the source is unconditional and always centered at (0,0)
Fig. 3b: the source is conditioned on the same random variable as the target distribution and centered at (-5,-5) or (5,5)
Below are short videos showing the rectified flow process in the same setting as the instructive example experiment described in Sec. 4.2 of the paper, but based on the Rectified Flow algorithm instead of α-(de)blending. All experiments are after three reflow operation. The results show that our method is independent of the mapping algorithm between distributions - correct selection of the initial distribution is advantageous for improved results, regardless of whether the underlying algorithm is Rectified Flow or Iterative α-(de)blending.
Figure 1 (Source distribution affects rectified flow): The target distribution is bi-modal Gaussian distribution centered at (1,1) and (3,3) and 10 steps are taken at inference time.
Fig. 1a: Source distribution is N((-2,-2), 0.1I)
Fig. 1b: Source distribution is N((1,1), 0.1I)
Fig. 1c: Source distribution is N((2,2), 0.1I)
Figure 2 (two-stage rectified flow): The target distribution is bi-modal Gaussian distribution centered at (1,1) and (3.5,3.5). Initial source distribution (red) is N((-2,-2), I)
Fig. 2b: 10 step sampling of rectified flow
Fig. 2c: 3 step sampling of rectified flow
Fig. 2d: two-stage sampling of rectified flow (3 + 7)
Figure 3 (effects of conditioning): target distribution of a bi-modal Gaussian conditioned on a Bernouli random variable C: w.p. 0.5 the bi-modal Gaussian is centered around (−6, −6) and (−4, −4), or around (4, 4) and (6, 6).
Fig. 3a: the source is unconditional and always centered at (0,0)
Fig. 3b: the source is conditioned on the same random variable as the target distribution and centered at (-5,-5) or (5,5)
Below, we present 15-frame videos generated by our algorithms and different baselines conditioned on the same initial frame (for a total 16 frames). The condition frame is outlined in red. The 25-DDIM movement is at uneven speed. ASPeeD FVD value is better than 25-DDPM FVD (see Sec. 5.1 of the paper).
Below, we present 50-frame videos generated by our algorithms and different baselines conditioned on the same 15 frames (total 65 frames). The condition frames are outlined in red. ASPeeD FVD value is better than the baselines (see Sec. 5.1 of the paper).
Video 1: In both the 10 step and 500 step DDPM baselines the gray pot disppears, while ASPeeD keeps it intact.
Video 2: In both the 10 step and 50 step DDPM baselines the stick bends and in the first the ball also disappears.
Video 3: ASPeeD generated video is more natural than the 10 step DDPM baseline (note the blue ball).
The following videos show trajectories with the same initial conditions on the Tool-Hang task generated by ASPeeD and 10-step DDPM baseline. In this task, a robot arm assembles a frame consisting of a base piece and hook piece by inserting the hook into the base, and hangs a wrench on the hook. See section 5.2 of the paper for more details.
10-step ASPeeD with the output of 3 DDIM steps as a source distribution, and 7 deblending steps
10-step DDPM
10-step ASPeeD with the output of 3 DDIM steps as a source distribution, and 7 deblending steps
10-step DDPM - the wrench is placed incorrectly and the reward for this trajectory is 0
In these videos different trajectories on the Push-T task generated by ASPeeD are presented, demonstrating its multi-modal predictions. The goal is to cover the green "T" shape mark with the gray "T" shape object. The blue dot is the end-effector of the robot. The red plus sign marks the next action of the robot. See subsection 5.2.1 in the paper.