A Large-Scale Analysis on Robustness in Action Recognition
In this work, we performa a large-scale robustness analysis of existing CNN and transformer-based video models for action recognition. We focus on the simulation of real-world perturbations as opposed to adversarial with four different benchmarking datasets: HMDB-51, UCF-101, Kinetics400, and SomethingSomething. We evaluate six different state-of-the-art action recognition models against a total of 90 visual perturbations.
We hope this study will serve as a benchmark and guide future research in robust action-recognition learning.
Performance against robustness of action recognition models on UCF-101P. y-axis: relative robustness γr (higher is better), x-axis: accuracy on clean videos, Model names appended with P indicate it is a pre-trained version of the model, and the size of circle indicates FLOPs
We split visual perturbations into 5 categories: Noise, Camera Motion, Digital, Temporal and Blur. These have severities that range from 1-5.
The mean performance based on action recognition accuracy for the different perturbed datasets across all models.
Madeline Chantry Schiappa¹*, Naman Biyani²*, Prudvi Kamtam¹, Shruti Vyas¹,Hamid Palangi³, Vibhav Vineet³, Yogesh S. Rawat¹
CRCV, UCF¹, IIT Kanpur², Microsoft Research³
@inproceedings{robustness2022large,
title={Large-scale Robustness Analysis of Video Action Recognition Models},
author={Schiappa, Madeline C and Biyani, Naman and Kamtam, Prudvi and Vyas, Shruti and Palangi, Hamid and Vineet, Vibhav and Rawat, Yogesh},
booktitle={The IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2023}
}