Enhancing Crime Detection in Surveillance Videos 

(Anomaly Detection)

Overview:

Anomalies can be detected with the help of patterns and events that differ from the normal flow of events. The paradigms of surveillance may relate to abuse, fights, road accidents and snatching, etc. In real-world surveillance, finding unusual events in these massive video streams is a difficult endeavor, since they often occur inconsistently. However, deep learning-based anomaly detection helps reduce human labor and its decision-making ability can be compared to that of humans, thus ensuring the safety of the public. In the majority of reported studies, anomalies are detected from surveillance videos based on binary classification. The reported approaches did not cover other anomalous events from surveillance videos, including abuse, fights, vehicle accidents, shootings, stealing, vandalism, and robberies. This paper proposes an intelligent anomaly detection framework based on deep features that can operate more efficiently in surveillance networks. In the proposed framework, spatial-temporal features are first extracted from a series of frames by passing them through a CNN model that has been pre-trained. Analyzing the frames in a sequence can be beneficial in detecting anomalous events. Once the deep features have been extracted, the data is then passed to the Long Short-Term Memory (LSTM) model. The model can accurately classify ongoing anomalies/normal events in complex surveillance scenes of smart cities. A dataset from the University of Central Florida (UCF) Crime video dataset is used to perform extensive experiments on anomaly detection. We report an increase in data accuracy of 47.83% over state-of-the-art methods for UCF-Crime datasets.

Problem Statement

Enhancing crime detection and identification for monitoring real time surveillance videos using spatial features from transfer learning based on two stream networks.

Background

Motion is the simple state of a body if it is changing its position continuously with reference to some object and gesture is the collection of such movements such as ‘moving head’, ‘stretching Arm’, ‘pulling leg’ and ‘raising hand’ etc. Action is the collection of gestures performed at same time by same person but for some definable purpose such as ‘walking’, ‘waving’, ‘running’, ‘jogging’, and ‘punching’ are examples of human action categories. 

Interactions It is a collection of human actions of maximum two actors. One actor must be a human being and other one may be a person or an object. Also, such action is classified as human–human interactions and human–object interactions. For human–human interactions, two actors are human beings. In human–object interactions, one actor must be a human being and other one is an object. ‘Talking between two persons’, ‘fighting between two persons’, ‘hand shaking’, and ‘welcoming each other’ are the examples of human–human interaction, and ‘ATM theft’ and ‘doing work in front of a computer’ are the examples of human–object interaction.

Group activities It is a combination of gestures, actions or interactions where the number of actors is more than two and there may be single or multiple interactive objects. ‘Two groups playing some games or involving some activity’, ‘marches group of people’, ‘group meeting’ and ‘fighting between two groups’ are the examples of group activity. 

Application

Office Monitoring

Public Safety & Security

Intelligence Departments 

Detection of criminal activities from surveillance videos 

Crime detection: detecting whether a criminal activity occurred in a certain record, video or a frame.

Crime detection and classification: extending crime detection and then also classify the type of crime e.g. burglary, shooting, fighting etc.

Motivation

As we move towards a digital world, the concept of smart cities has forced us to use technology to decrease human intervention and increase automation where possible . One of the technical advancement for smart cities is extensive usage of Surveillance cameras also known as CCTV cameras that have enabled users to remotely monitor their properties and even work places. It has also enabled security agencies to monitor general areas, crime hot spots, rushy and sensitive areas that allows them to be proactive and vigilant to suspicious and criminal activities. Anomaly detection from these surveillance videos is very complicated as anomalies are short sudden events as compared to normal events that comprise over 99% of the videos and detection has to be done in real time for quick deterrence. There has been a lot of work done in this domain that has enabled for automated detection of such events to help law enforcement agencies but they are still far from being perfect. Our proposed work will aid the stakeholders by decreasing human intervention in detection of such events by using computer systems and algorithms in real time.

We drew our inspiration from the classification of these anomalies as it has been an open question for scientific community. The most recent work (Ma & Zhang, 2022; Montenegro & Chung, 2022) also highlights the work on detection only, therefore there is a lot of room to improve the previous work in terms of classification accuracy and real time detection. In this study, we propose a two-stream multistage architecture capable of identifying temporal patterns using attention mechanism in the classification stages which can be trained end to end with a smaller number of parameters.

Data set

Even though, there were several bench mark data sets available to test an anomaly detector, the better choice would be about the appropriateness of the data and also whether the data is recent enough to imitate the characteristics of today network traffic.  Useful Data sets for activity recognition are:

Trimmed, clipped and segmented data set spatially labeled on frame level and extra layers of augmentation applied for better understanding the patterns inside the dataset.

Workflow

Future Work

Related Projects

Surveillance cameras have become widespread. Many videos are recorded for watch-over and monitoring purposes, increasing security.

The increase in the number of videos and their lengths has a troublesome aspect. The amount of information contained in videos has rapidly grown, making it increasingly difficult for people to find what they should pay attention to. Ricoh has developed a technology to extract unusual things and behaviors from videos. 

Source 

 Here, weakly labeled anomaly videos for training are used .

Source 

Useful Reads:

Cherian, A. K., & Poovammal, E. (2021). Anomaly Detection in Real-Time Surveillance Videos Using Deep Learning. In Computational Vision and Bio-Inspired Computing (pp. 223–230). Springer.

CRCV | Center for Research in Computer Vision at the University of Central Florida. (n.d.). https://www.crcv.ucf.edu/projects/Abnormal_Crowd/

Cui, X., Goel, V., & Kingsbury, B. (2015). Data augmentation for deep neural network acoustic modeling. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(9), 1469–1477.

Farooq, M., Khan, N., & Ali, M. (2017). Unsupervised video surveillance for anomaly detection of street traffic. International Journal of Advanced Computer Science and Applications (IJACSA), 12(8), 270–275.

Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 580–587.

Joshi, A., & Namboodiri, V. P. (2019). Unsupervised synthesis of anomalies in videos: transforming the normal. 2019 International Joint Conference on Neural Networks (IJCNN), 1–8.

Khan, S. U., Haq, I. U., Rho, S., Baik, S. W., & Lee, M. Y. (2019). Cover the Violence: A Novel Deep-Learning-Based Approach Towards Violence-Detection in Movies. In Applied Sciences (Vol. 9, Issue 22). https://doi.org/10.3390/app9224963

Li, T., Chen, X., Zhu, F., Zhang, Z., & Yan, H. (2021). Two-stream deep spatial-temporal auto-encoder for surveillance video abnormal event detection. Neurocomputing, 439, 256–270.

Luo, W., Liu, W., & Gao, S. (2017). Remembering history with convolutional LSTM for anomaly detection. 2017 IEEE International Conference on Multimedia and Expo (ICME), 439–444.

Luo, W., Liu, W., Lian, D., & Gao, S. (2021). Future Frame Prediction Network for Video Anomaly Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence.

Lv, H., Zhou, C., Cui, Z., Xu, C., Li, Y., & Yang, J. (2021). Localizing anomalies from weakly-labeled videos. IEEE Transactions on Image Processing, 30, 4505–4515.

Ma, H., & Zhang, L. (2022). Attention-based framework for weakly supervised video anomaly detection. The Journal of Supercomputing.

Majhi, S., Das, S., Bremond, F., Dash, R., & Sa, P. K. (2021). Weakly-supervised Joint Anomaly Detection and Classification. http://arxiv.org/abs/2108.08996

Maqsood, R., Bajwa, U. I., Saleem, G., Raza, R. H., & Anwar, M. W. (2021). Anomaly recognition from surveillance videos using 3D convolution neural network. Multimedia Tools and Applications, 80(12), 18693–18716.

Mehran, R., Oyama, A., & Shah, M. (2009). Abnormal crowd behavior detection using social force model. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 935–942.

Mehta, P., Kumar, A., & Bhattacharjee, S. (2020). Fire and gun violence based anomaly detection system using deep neural networks. 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), 199–204.

Montenegro, J., & Chung, Y. (2022). Semi-supervised generative adversarial networks for anomaly detection. Innovative Economic Symposium 2021 – New Trends in Business and Corporate Finance in COVID-19 Era, 132.

Parab, A., Nikam, A., Mogaveera, P., & Save, A. M. (2020). A New Approach to Detect Anomalous Behaviour in ATMs. 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), 774–777.

Piza, E. L., Welsh, B. C., Farrington, D. P., & Thomas, A. L. (2019). CCTV surveillance for crime prevention: A 40-year systematic review with meta-analysis. Criminology and Public Policy, 18(1), 135–159. https://doi.org/10.1111/1745-9133.12419

Rezaee, K., Rezakhani, S. M., Khosravi, M. R., & Moghimi, M. K. (2021). A survey on deep learning-based real-time crowd anomaly detection for secure distributed video surveillance. Personal and Ubiquitous Computing, 1–17.

Sabokrou, M., Fayyaz, M., Fathy, M., Moayed, Z., & Klette, R. (2018). Deep-anomaly: Fully convolutional neural network for fast anomaly detection in crowded scenes. Computer Vision and Image Understanding, 172, 88–97.

Sultani, W., Chen, C., & Shah, M. (2018). Real-world Anomaly Detection in Surveillance Videos点击图标下载 App We should not only judge whether there is an accident in the image, but also determine the location of the accident accurately. http://arxiv.org/abs/1801.04264

Sun, J., Wang, X., Xiong, N., & Shao, J. (2018). Learning Sparse Representation With Variational Auto-Encoder for Anomaly Detection. IEEE Access, 6, 33353–33361. https://doi.org/10.1109/ACCESS.2018.2848210

Tiwari, A., Chaudhury, S., Singh, S., & Saurav, S. (2021). Video Classification using SlowFast Network via Fuzzy rule. 2021 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 1–6.

Ullah, F. U., Ullah, A., Muhammad, K., Haq, I. U., & Baik, S. W. (2019). Violence Detection Using Spatiotemporal Features with 3D Convolutional Neural Network. In Sensors (Vol. 19, Issue 11). https://doi.org/10.3390/s19112472

Ullah, W., Ullah, A., Haq, I. U., Muhammad, K., Sajjad, M., & Baik, S. W. (2020). CNN features with bi-directional LSTM for real-time anomaly detection in surveillance networks. Multimedia Tools and Applications, 1–17.

Ullah, W., Ullah, A., Haq, I. U., Muhammad, K., Sajjad, M., & Baik, S. W. (2021). CNN features with bi-directional LSTM for real-time anomaly detection in surveillance networks. Multimedia Tools and Applications, 80(11), 16979–16995.

Um, T. T., Pfister, F. M. J., Pichler, D., Endo, S., Lang, M., Hirche, S., Fietzek, U., & Kulić, D. (2017). Data augmentation of wearable sensor data for parkinson’s disease monitoring using convolutional neural networks. Proceedings of the 19th ACM International Conference on Multimodal Interaction, 216–220.

Vilamala, M. R., Hiley, L., Hicks, Y., Preece, A., & Cerutti, F. (2019). A pilot study on detecting violence in videos fusing proxy models. 2019 22th International Conference on Information Fusion (FUSION), 1–8.

Wu, S., Wong, H.-S., & Yu, Z. (2013). A Bayesian model for crowd escape behavior detection. IEEE Transactions on Circuits and Systems for Video Technology, 24(1), 85–98.

Zhu, Y., & Newsam, S. (2019). Motion-aware feature for improved video anomaly detection. ArXiv Preprint ArXiv:1907.10211.

Project Supervisor:

Dr. Usama Ijaz Bajwa

Co-PI, Video Analytics lab, National Centre in Big Data and Cloud Computing,Program Chair (FIT 2019),HEC Approved PhD Supervisor,Assistant Professor & Associate Head of DepartmentDepartment of Computer Science,COMSATS University Islamabad, Lahore Campus, Pakistanwww.usamaijaz.comwww.fit.edu.pkJob ProfileGoogle Scholar Profile
M Salman GhauriMSc Student(Computer Science, COMSATS Lahore)Data EngineerTrueData
fuqrayy@gmail.comms_ghauri@ymailc.om
GithubLinkedIn@ms_ghauri