Decision ConvFormer
(ICLR 2024 Rebuttal)
(ICLR 2024 Rebuttal)
Section 1. Applying Sliding Window Attention to DT - reply to reviewer Eec1
To properly observe the effect of reducing context length of DT as suggested by the reviewer Eec1, without losing the benefits that come from predicting multiple actions jointly, one possible approach is to adopt a form of local masking.
'DT (full attention)' represents DT with a context length of K=20, while 'DT (sliding window attention)' denotes DT utilizing sliding window attention with a window size of 6 and a context length of K=20.
Our DC structure is based on a principle similar to such local masking but is designed to be simpler. Also, DC uses static filters to facilitate more stable and effective learning of local associations.
Section 2. Return Distribution of Dataset - reply to reviewer Eec1
We present the distribution of returns of the MuJoCo dataset's trajectories alongside the achieved performance of DT and DC. In the figure, frequency denotes the number of trajectories with the corresponding normalized return.
Section 3. Visualizing the Attention of the Hybrid DC - reply to reviewer dRz2
We plot the two types of attention maps of hybrid DC for Atari Breakout and Assault. Since the attention module is used only in the last layer in hybrid DC, we drew the map for this final layer's attention module.
Atari Breakout
Atari Assault