MiniROAD: Minimal RNN Framework 

for Online Action Detection

Joungbin An1, Hyolim Kang1, Su Ho Han1, Ming-Hsuan Yang1,2,3, Seon Joo Kim1

1Yonsei University, 2UC Merced, 3Google Research

ICCV 2023

Paper Code

Abstract

Online Action Detection (OAD) is the task of identifying actions in streaming videos without access to future frames. Much effort has been devoted to effectively capturing long-range dependencies, with transformers receiving the spotlight for their ability to capture long-range temporal structures. In contrast, RNNs have received less attention lately, due to their lower performance compared to recent methods that utilize transformers. In this paper, we investigate the underlying reasons for the inferior performance of RNNs compared to transformer-based algorithms. Our investigation reveals that the discrepancy between training and inference is the primary cause that impedes the effective training of RNNs. To address this, we propose applying non-uniform weights to the loss computed at each time step, which allows the RNN model to learn from the predictions made in an environment that better resembles the inference stage. Extensive experiments on three benchmark datasets, THUMOS, TVSeries, and FineAction demonstrate that a minimal RNN-based model trained with the proposed methodology performs equally or better than the existing best methods with a significant increase in efficiency.

Citation

@inproceedings{miniroad,

title={MiniROAD: Minimal RNN Framework for Online Action Detection},

author={An, Joungbin and Kang, Hyolim and Han, Su Ho and Yang, Ming-Hsuan and Kim, Seon Joo},

booktitle={International Conference on Computer Vision (ICCV)},

year={2023}

}