SPECTRE
2022-2023
SPECTRE
2022-2023
Surveillance is an important aspect of computer science. CCTVs are majorly used for security purposes, but these systems, being not intelligent, cannot detect or identify suspicious activity, Westfield Century City Mall Robbery [1] and Puyallup South Hill Mall Burglary [2] and many others are examples. The need of the hour is to upgrade our security systems and make them more intelligent. A system is called intelligent by analyzing images using Computer vision techniques [3]. Computer vision will assist us from penalty of watching 24/7 video recording by notifying the users on identifying suspicious activity. Much work has been done in different areas and fields where computer vision has been successfully implemented. Our proposed project is basically to upgrade the security system by making them intelligent using computer vision. It would be capable of identifying these suspicious activities using different deep learning algorithms and recognizing the face using an attached database. Initially, we will train and test our system using the datasets like UCF [4], Fire [5], HMDB51 [6], and many other datasets. Many research papers have been published regarding human activities and surveillance systems. There are many proposed solutions to detect human suspicious activities in videos. The issue on which we are working is to make users comfortable, so they don’t require much effort to integrate the proposed idea’s solution. Previous solutions require the users to change their system architecture since they are computationally expensive. That’s why we are trying our best by providing a desktop application that has a less computational cost and that does not require the user to update their systems or use new and expensive technology.
We have collected our dataset from various places to train our deep learning object detection model. We have used UCF [4] , Fire [5] , HMDB51 [6] and for mobile phone, we have collected videos from YouTube and other resources from internet. 40 hours of all of the videos are converted into 3,77,000 frames and then we have manually labelled each and every frame. We have also trimmed those frames which were not useful i.e., 77000 were left which we labelled. Our labelling portion is done in the format which YOLO accepts, a bounding box format. Our dataset contains 7 categories, Fire, Smoking, Abuse, Assault, Intrusion, Fighting, Cell Phone usage.
We performed data augmentation on our dataset which includes rotation, flipping, shearing. Every single frame was augmented three times and its bounding box coordinates were also augmented respectively. Finally, we split our dataset into 80: 10: 10 ratios for train, test and validate.
The status of dataset frames can be viewed in the table below.
Dataset can be downloaded from these links:
1: UCF-Crime Dataset for burglary, fight, assault abuse
Link: www.crcv.ucf.edu/data1/chenchen/UCF_Crimes.zip
2: HMDB-5 Dataset for smoking dataset
Link: http://serre-lab.clps.brown.edu/wp-content/uploads/2013/10/hmdb51_org.rar
3: We made our own Mobile Dataset by downloading videos from Pexels.com
Link: https://www.pexels.com/video/businessman-talking-on-the-phone-8052438/
4: KMU Fire and Smoke Database
Nowadays our world is facing many challenges and out of which security is dominant one. The need of the hour is to minimize the threat of insecurity as security is primary right and preference for everyone. By considering the need it is required to develop a system that can provide security for our institutions, public and personal places. It does not provide security but is also easy to operate and has a low computational cost. The proposed idea is to develop a desktop application with the integration of deep learning activity recognition algorithms and video analysis which is continuously being captured by CCTV or selected from a directory. It then detects various activities at the same time such as Smoking, Fight, Fire, Phone usage and other suspicious activities, including theft or intrusion. After detection, Face recognition occurs using a database and generates the notification with complete details of the person detected and activity detected. Finally, a clip containing activity with the recognized faces is saved in memory automatically. It enables the users to view previous record anytime. The goal is to develop a low computational cost system and to communicate in real-time so that we can integrate it on low-end devices as well. Computer vision and deep learning will be implemented for model training and for detecting anomalies and notifying the user about suspicious activities.
We have taken some assumptions for our project that it is user friendly and less computationally expensive so that it could be used on any machine with lower specifications and will detect real time anomalies automatically and hence, security concerns would be dealt very efficiently
This product will benefit:
Organizations/companies to keep a thorough check on the activities of their employees
Schools/colleges to have a strict eye on the student’s activities
By fetching details of the suspects and real victims through face recognition
By reducing human force for monitoring 24/7
To detect real time anomalies and other inappropriate events
There are some of the cons of our proposed project even though it has many pros. Accuracy of detecting an anomaly is not up to the mark because of the unavailability of the labelled dataset so, it could not learn very much. As it would use live feed for detection, delay would be present in pre-processing this feed and giving to the model for inferencing. There would be a very minute delay while detecting and notifying an authorised person simultaneously. As it will be developed for low end computers, still it needs a mediocre hardware as inferencing requires some resources. Face detection and recognition may result in no detection at all, or wrong detection as live feed could be of very low resolution. Privacy issue is also big concern as employees/students could become uncomfortable while monitored thoroughly. Also, we have not given any option for the CCTV cameras to select for the type of anomaly to detect individually. Moreover, it would be not beneficial while live detection if the objects are overlapped or in crowd.
We first trained our model on a single category, Smoking, on Google colab for both YOLO version 7 and version 5 but, were able to get good frames per second for version 5 as it was elder and more stable. We did this to check if it was reliable to use this deep learning object detection model. Then after confirming, we trained our deep neural network on our whole dataset and with NVIDIA RTX 2080 Ti. According to our research, we found that YOLO was better than every other object detection model in terms of less computational and detection speed however, accuracy is compromised.
This functionality is on demand of the user whether he/she wants to recognize faces in the anomaly detected clip or not.
We have tested two techniques for face detection from an anomaly clip generated when anomaly is detected from live CCTV feed.
As we can see, ResNet Cafe Model is more accurate, so we have used this for our project. As it is on demand of the user, slower detection would not be a problem for his purpose but accuracy of detecting faces will be.
After the faces are detected, they are of very low quality and we cannot use them for face recognition. So, we have used a transformer-based model for restoring faces to some extent that they are recognizable.
We have used three models to finalize our face restoration module,, one is Transformer based (Robust Blind Face Restoration with Codebook Lookup Transformer) [7], others are GFP GAN (Real-World Blind Face Restoration with Generative Facial Prior) [8] and VQFR (Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder) [9]. We can see, Transformer based model outperformers the other two models which are more computationally expensive and take longer to restore the face. We have tested three low resolution image samples on Google Colab with only CPU runtime. We calculated three image quality matrices including Brisque Threshold, Entropy of an image and Tenengradient of an image.
Below table shows the comparison of our transformer-based restoration with GAN based restorations,
For an image to be a good image, it must have low brisque value, low entropy value and high tenengrad value. We can analyse that entropy and tenengrad are not helping us in selecting the best model for restoration, however brisque value is significantly showing the image quality for every model.
Similar to VQFR, CodeFormer is designed to restore degraded facial images to their original quality by enhancing the features and details of the input image. During the restoration process, CodeFormer does not generate new faces that do not exist in reality. Instead, it restores the original features and details of the face that may have been lost or degraded due to factors such as low resolution, compression artifacts, or noise
To conclude, Transformer based model is better than others as it has lower restoration time on the sample images and its brisque value is very much lower than the other two models, so we have used it in our project.
Finally, after face restoration, this image is compared with the images present in the database of the employees/students which will be saved by the user of the software when a new employee/student is registered. We have used a python module DeepFace [16] for this purpose due to its simplicity and faster comparison of the faces. It is lightweight and less computationally expensive and supports different state of the art face recognition models i.e., VGG-Face, FaceNet512, Dlib etc. We have used FaceNet512 as a face recognition model for this DeepFace library due to its higher accuracy and distance matrix as Euclidean for finding more similar faces. It can be noted that this library does not exactly matches the faces, but give near to similar faces as we have set its threshold.
Final Workflow
Tools and Technologies
Googe Colab
PyQt5
Pytorch
Firebase
Deepface
Mongodb
OpenCV
VSCode
Future Work
We will:
Consider Spectre as a mobile application
Add more categories for anomaly detection such as Vandalism
Find the location of a the incident in an organization or institute
Train YOLO-NAS or any latest Models on our dataset for better results
Project Resources
The Team
Project Supervisor
Dr. Usama Ijaz Bajwa
Co-PI, Video Analytics lab, National Centre in Big Data and Cloud Computing,HEC Approved PhD Supervisor,Associate Professor & Associate Head of DepartmentDepartment of Computer Science,COMSATS University Islamabad, Lahore Campus, Pakistanwww.usamaijaz.comwww.fit.edu.pkJob ProfileGoogle Scholar ProfileLinkedIn ProfileFYP Competetion
We are thrilled to announce that the Spectre Project has emerged as the undisputed champion at the prestigious FYP (Final Year Project) Competition, outshining over 130 competing projects. The event, which showcased the ingenuity and hard work of young minds, witnessed the Spectre Project being hailed as a beacon of innovation and excellence in its category. Not only did we secure the coveted first position in the project exhibition, but our team also bagged a remarkable second position in the fiercely contested poster competition. This double victory stands as a testament to the tireless efforts and dedication put forth by our team members.
The FYP Competition served as the ultimate platform for our team to showcase the fruits of their labor. Amidst intense competition, the Spectre Project's stellar presentation and live demonstration left an indelible impression on the judges and attendees alike. Our commitment to excellence, attention to detail, and passion for our work shone through, securing us the well-deserved first position.
But that's not all - the success of the Spectre Project extended beyond the project exhibition. The captivating poster we meticulously designed to represent our project caught the eye of the jury, earning us a remarkable second position in the poster competition. The poster not only encapsulated the essence of our project but also conveyed our dedication and creativity.
None of this would have been possible without the unwavering support of our supervisor Dr. Usama Ijaz Bajwa, who guided us at every step and provided invaluable insights. We extend our heartfelt gratitude to him for believing in our vision and encouraging us to push our boundaries.
Few glimpse of the Certificate Distribution Ceremony can be seen below.
[1] “KOMO NEWS,” [Online]. Available: https://komonews.com/news/local/gallery/security-video-shows-violent-south-hill-mall-burglary. [Accessed september 2022].
[2] “Foxla,” [Online]. Available: https://www.foxla.com/news/century-city-mall-robbery-14-suspects-wanted-for-stealing-purses-from-nordstrom. [Accessed September 2022].
[3] “sas,” [Online]. Available: https://www.sas.com/en_us/insights/analytics/computer-vision.html#:~:text=Computer%20vision%20is%20a%20field,to%20what%20they%20%E2%80%9Csee.%E2%80%9D. [Accessed September 2022].
[4] “UCF CRIME DATASET,” [Online]. Available: http://www.crcv.ucf.edu/data1/chenchen/UCF_Crimes.zip. [Accessed september 2022].
[5] “cvpr,” [Online]. Available: https://cvpr.kmu.ac.kr/Dataset/Dataset.htm. [Accessed September 2022].
[6] “serre-lab,” [Online]. Available: http://serre-lab.clps.brown.edu/wp-content/uploads/2013/10/hmdb51_org.rar. [Accessed September 2022].
[7] “Github/CodeFormer,” [Online]. Available: https://github.com/sczhou/CodeFormer.
[8] “Github/GFP-GAN,” [Online]. Available: https://github.com/TencentARC/GFPGAN.
[9] “Github/VQFR,” [Online]. Available: https://github.com/TencentARC/VQFR.