Project funded by a grant of the Romanian Ministry of Education and Research, CNCS - UEFISCDI, project no. PN-III-P2-2.1-PED-2021-0195, contract no. 690/2022, within PNCDI III.
Host institution: University of Bucharest
Abnormal event detection models are based on state-of-the-art neural networks containing millions of artificial neurons organized into complex architectures that try to simulate the human brain. However, considering the progress of neural networks over the recent years, e.g. the evolution from the AlexNet convolutional neural network (CNN) to vision transformers, we observe a prominent pattern towards enlarging the models to increase accuracy, while ignoring the increasing demand for computational resources. Unfortunately, this generic trend in AI applies to abnormal event detection as well. Some recent studies showed that state-of-the-art neural networks require significant computational resources. These studies even went down to estimating the CO2 emissions of such models, showing that training a single model can generate as many CO2 emissions as 5 cars for their entire life span. To our knowledge, there are no previous studies on green abnormal event detection. Our aim is to apply the principles of Green AI and to research and develop more efficient (and equally effective) deep neural models for abnormal event detection in video. We believe this challenge can be achieved by exploring novel architectures in conjunction with newly developed learning paradigms and optimizers that will allow us to train lighter architectures effectively. We will start from our preliminary anomaly detection models published at CVPR 2021 and PAMI 2022 (with results shown in the above figure) and explore the possibility of transferring knowledge from the existing neural models to smaller and efficient models, through knowledge distillation. Jointly with the novel learning paradigms, we will study various neural architectures, from CNNs to transformers, that are likely to be more efficient. Neural architectures can become more efficient in multiple ways, for example by reducing the depth (number of layers) or width (number of units per layer), while replacing existing layers or units with more efficient components.
TEAM
Prof. Radu Tudor Ionescu
Principal Investigator
Assoc. Prof. Marius Popescu
Senior Researcher
Florinel-Alin Croitoru
PhD Student
Nicolae-Cătălin Ristea
PhD Student
PAPERS
Florinel-Alin Croitoru, Nicolae-Catalin Ristea, Dana Dăscălescu, Radu Tudor Ionescu, Fahad Shahbaz Khan, Mubarak Shah
Computer Vision and Image Understanding, 2024
Florinel-Alin Croitoru, Nicolae-Catalin Ristea, Radu Tudor Ionescu, Nicu Sebe
International Journal of Computer Vision, 2024
Nicolae-Catalin Ristea, Florinel-Alin Croitoru, Radu Tudor Ionescu, Marius Popescu, Fahad Shahbaz Khan, Mubarak Shah
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
Neelu Madan, Nicolae-Cătălin Ristea, Radu Tudor Ionescu, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas Moeslund, Mubarak Shah
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Mubarak Shah
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023
Antonio Bărbălău, Radu Tudor Ionescu, Mariana-Iuliana Georgescu, Jacob Dueholm, Bharathkumar Ramachandra, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas Moeslund, Mubarak Shah
Computer Vision and Image Understanding, 2023
CODE
Repository for the "Lightning Fast Video Anomaly Detection via Adversarial Knowledge Distillation" paper is available here.
Repository for the "Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors" paper is available here.
Repository for the "Self-Supervised Masked Convolutional Transformer Block (SSMCTB)" paper is available here.
Repository for the "Diffusion Models in Vision" paper is available here.
SCIENTIFIC REPORTS
Tehnical report for the first stage is available here: Romanian version.
Tehnical report for the second stage is available here: Romanian version.
Tehnical report for the third stage is available here: Romanian version.
Final tehnical report is available here: Romanian version.
PROMOTIONAL MATERIAL
The main goal of our project was to research and develop a demonstration model for automatic detection of anomalous events in video using novel deep learning models, that is both efficient and effective. Our research led to the development of a demonstration model that can process over 60 video files in parallel at 25 FPS using a single Nvidia GeForce RTX 3090 GPU. As shown in the associated figure, the resulting model is at least 8 times more efficient than state-of-the-art models, maintaining a level of performance comparable to the latest methods. We believe that the demonstrator model will have a significant impact in the field of video surveillance, being the first model that is scalable to dozens of video streams.
The created model includes several novel elements, among which we list the self-distillation technique and the weighting of the masked patches according to the magnitude of the motion gradients. The obtained model (illustrated in the associated figure) was presented in the paper "Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors", which was published at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2024) held between 17- June 21 in Seattle, USA. The CVPR conference is the most prestigious in the field of computer science worldwide, being ranked by Google Scholar in the top 4 publications in the world.