Additional Experimental Results for Time-Aware World Models

Meta-World Performance Comparisons Across Different Evaluation ∆t's

Figure 1: Meta-World tasks: Success Rate under varying observation/action rates. Average Episode Success Rate as a function of evaluation time steps (unit: millisecond). The dashed lines represent the default time step sizes (∆t = 2.5ms). Our time-aware model outperforms the baseline (trained on fixed default ∆t) on most evaluation time steps across all tasks while requiring fewer training steps (ours trained with 1.5M steps in RED (with RK4 integration) and MAROON (with Euler integration) vs baseline trained with 2M steps in BLUE) with the same hyperparameters. For fair evaluation, we also adjusted the evaluations for the baselines by repeatedly applying the baselines ∆t_eval/∆t_train times every time step, which is shown in PURPLE curves. The mean and 95% confidence intervals are plotted over 3 seeds, each with 10 evaluation episodes.

Meta-World Learning Curve Across Different Evaluation ∆t's

Figure 2a: Meta-World tasks: Success Rate Curve under different evaluation time step sizes. At each training step, the models are evaluated on various inference ∆t's. Despite having to learn the dynamics under varying time step sizes, our time-aware models in RED (with RK4 integration) and MAROON (with Euler integration) still converges faster when evaluated on ∆t_default = 2.5ms all tasks compared to the baseline trained only on ∆t_default = 2.5ms (in BLUE). On large ∆t’s, the time-aware model significantly outperforms the baseline with the same number of training steps while the baselines fail to converge. The mean and 95% confidence intervals success rate are plotted over 3 seeds, each with 10 evaluation episodes.

Figure 2b: Meta-World tasks: Success Rate Curve under different evaluation time step sizes. At each training step, the models are evaluated on various inference ∆t's. Despite having to learn the dynamics under varying time step sizes, our time-aware models in RED (with RK4 integration) and MAROON (with Euler integration) still converges faster when evaluated on ∆t_default = 2.5ms all tasks compared to the baseline trained only on ∆t_default = 2.5ms (in BLUE). On large ∆t’s, the time-aware model significantly outperforms the baseline with the same number of training steps while the baselines fail to converge. The mean and 95% confidence intervals success rate are plotted over 3 seeds, each with 10 evaluation episodes.

Additional Experiments on PDE Control Problems

Figure 3a: Visualizations of systems dynamics of uncontrolled Allen-Cahn PDE. The spatial domain has length L = 2 with the diffusivity (viscosity) parameter ν = 10e−4 and potential constant V = 5.0. The initial state is u(x, t = 0) = (x − 1)^2 · cos(π(x − 1)). The field u(x,t) is the observation at each time step. The action a(x,t) is the distributed control force over the PDE field.

The PDE problems are one-dimensional PDE control problems featuring periodic boundary conditions and spatially distributed control inputs. The spatial domain is defined as Ω = [0, L] ⊂ R. The continuous field of the PDE is defined as u(x, t) : Ω × R+ → R, where x, t represents the spatial coordinates in the field and time, respectively. The control force is composed of na scalar control inputs aj (t), each influencing a specific subset of the domain Ω through its corresponding forcing support function Φ_j (x). Generally, each action introduces external forces/energy to the PDE fields to control its dynamics and steer the dynamics toward the target state s_target 0.

Figure 3b: Visualizations of systems dynamics of uncontrolled Burgers PDE. The spatial domain has length L = 1 with the diffusivity

(viscosity) parameter ν = 10e−3. The initial state is u(x, t = 0) = sech(10x − 5). The field u(x,t) is the observation at each time step.

Figure 3c: Visualizations of systems dynamics of uncontrolled Wave PDE. The spatial domain has spatial length L = 1 with c = 0.1. The initial state is u(x, t = 0) = sech(10x − 5) and ψ(x, t = 0) = 0. The field u(x,t) is the observation at each time step.

PDE-Control Performance Comparisons Across Different Evaluation ∆t's

Figure 4: PDE control tasks: Total episode reward (LQ-error) under varying observation/action rates. Average reward as a function of evaluation time steps (unit: millisecond). The dashed lines represent the default time step sizes (∆t = 50/10/100ms, respectively). Our time-aware model outperforms the baseline (trained on fixed default ∆t) on most evaluation time steps across all tasks while requiring fewer training steps (ours trained with 1M steps in RED (with RK4 integration) and MAROON (with Euler integration) vs baseline trained with 750k/1M/1M steps in BLUE) with same hyperparameters. For fair evaluation, we also adjusted the evaluations for the baselines by repeatedly applying the baseline's ∆t_eval/∆t_train times every time step, which is shown in PURPLE curves. The mean and 95% confidence intervals are plotted over 3 seeds, each with 10 evaluation episodes.

PDE-Control Learning Curve Across Different Evaluation ∆t's

Figure 5: PDE control tasks: Average Negative LQ Error under different evaluation time step sizes. At each training step, the models are evaluated on various inference ∆ts. Despite having to learn the dynamics under varying time step sizes, our time-aware model in RED (with RK4 integration) and MAROON (with Euler integration) still converges to better error when evaluated on ∆t_default = 50/10/100 ms of PDE Burgers / Allen-Cahn / Wave control tasks compared to the baseline trained only on ∆t_default (in BLUE). On large ∆t’s, the time-aware model significantly outperforms the baseline with the same number of training steps. The mean and 95% confidence intervals reward are plotted over 3 seeds, each with 10 evaluation episodes.

Page updated

Google Sites

Report abuse