RAVEN

WHAT IS IT?

An application which detects, tracks and re-identifies the person.

INTRODUCTION

Tracking through video streams i.e. Visual Tracking is new to some extent as a lot of research is yet to be done in that field but tracking itself is not a newly discovered thing. we are using the newly emerging tracking technique which is Visual Tracking / Multiple Object Tracking. MOT is the process of locating multiple objects over a sequence of frames (video). The MOT problem can be viewed as a data association problem where the goal is to associate detections across frames in a video sequence. We have 2 types of datasets provided for training purposes, one is video streams and the other one is sequence of frames. If we have data in the form of video streams, first we convert that stream into frames and then pass these frames to the model.

The problem which we will be focusing on will be person re-identification. Re-identification generally is a very tough task due to its complexity which is caused due to the presence of low image resolutions, illumination changes, occlusions, complex camera environments, background clutter etc. This process of re-identification is widely used for specific object retrieval from multiple cameras. Re-identification basically is the process in which we associate images of the same object in different cameras on different occasions. At first, when the person is detected in the camera, that person will be labelled and then its relevant information will be stored in the database. Next time when that person will appear in the frame, our system checks whether that person has appeared before by identifying him from the database. If it has been appeared before, it assigns that same label again to that person which was assigned before and track its path by using different tracking technologies. Whereas, if that person has appeared for the first time in the frame, he/she will be assigned an ID and a new label to it and its information will be stored in the database. This cycle of labelling and tracking continues throughout the whole re-identification phase.

PROJECT SCOPE

We have proposed a person re-identification system which makes it easier for the security personnel to overcome the hectic routine of 24/7 presence in front of the surveillance cameras and if somehow they are unable to detect the person involved in some false activity either due to the negligence of a security personnel or due to any other reason, our system will assist them in tracking that person and will help them by showing his previous appearances in that area.

We will be focused on developing a lightweight application which could easily be deployed on low-end devices because it requires a lot of spending to have an access to heavy GPU's which the end user is not likely to spend. The user will require the system to be lightweight because no one needs to spend extra expenditure on devices. We will be using re-identification in place of other techniques as it is profitable due to its simplicity and low cost associated to it as it will be used with standard cameras. As said earlier, our main focus it to develop a lightweight application which could easily be run at low-end devices as it decreases the computational cost and eliminate other unnecessary expenditures associated with other tracking devices.

OBJECTIVES

The objectives for our proposed project will be as follows:

To build a system that will be able to re-identify and track multiple people in a crowded place as they move across field of different camera views that come under our system domain.
To record the track of a person through which he/she has passed in a particular time frame so that we would be able to retrieve it later on if we want to check his/her previous appearance on site. Furthermore, it will also tell us at which time he/she has passed through our camera view.
To build a system that will have low computational cost so that it would be easy to implement it on low-end devices as well.

HOW APPLICATION WORKS?

The application will get live stream from the CCTV cameras and will pass these streams through the trained model. The model will detect and track the person and then pass the generated video of tracking to re-id model for re-identification. Then, the re-id model will do re-identification and display the stream with person re-identification.

INTEGRATION OF MODELS

YOLO V3/V4

We have used YOLO V3/V4 for detection and tracking purposes. YOLOv4 is an object detection algorithm that is an evolution of the YOLOv3 model. If we use YOLO V4, it is twice as fast as Efficient Net with comparable performance.

OSNet

We have used OSNet model for Re-identification purpose. The reason for choosing OSNet was because we were aiming for building a product with low computational cost and OSNet as compared to other re-id models take lesser parameters.

Integrating Models

Firstly, Yolo V3/V4 will do detection and tracking on the video provided by the user and the tracked video will be saved in the directory.
The tracked video will be then passed to the OSNet model for the re-identification and then after reidentification, complete video will be saved in directory.

MAKING OUR PROJECT TIME EFFICIENT

PIPELINING

The main issue that we faced during our project was of time . As we have integrated two models Yolo and OsNet, they both take too much time which slows down the whole process. To overcome this issue we came up with the idea of pipelining.

Basically after getting video from the user, we divide the video into no. of chunks depending upon the no. of cores users CPU has, total frames of the video will then be divided by no. of chunks made.

Number of these chunks were handled dynamically i.e. they are dependent on no. of cores users system has. For example, if the video is run on a quad-core processor, video will be divided into four chunks as quad-core processor has four cores.

No. of chunks = No. of CPU cores

All these frames were passed to yolo at one time, tracking and detection is done on all these chunks simultaneously which saves us a lot of time. Pipelining concept is implemented in this way that in single clock cycle multiple instructions are executed at the same time.

TOOLS AND TECHNOLOGIES

SQLite

Qt Designer

Google Colab

OpenCV

PROJECT PRESENTATION

FYP Final Presentation.pptx

POSTER

RavenPoster (1).pdf

GitHub Repository

For complete code and documentation of the application.

Visit: https://github.com/SaimAshfaq/FYP-RAVEN

References

[1] U. Gawande, K. Hajari and Y. Golhara, “Pedestrian Detection and Tracking in Video Surveillance System,” in Recent Trends in Computational Intelligence, 2020.

[2] P. Chikersal, “Person Re-identification using Appearance: Final Report,” Nanyang Technological University, Singapore.

[3] F. Herzog, X. Ji, T. Teepe, S. Hurnman, J. Gilg and G. Rigoll, “LIGHTWEIGHT MULTIBRANCH NETWORK FOR PERSON RE-IDENTIFICATION,” Technical University of Munich, Munich.

[4] G. Wang, J.-H. Lai, P. Huang and X. Xie , “Spatial-Temporal Person Re-identification,” Guangdong Key Laboratory of Information Security Technology, China.

[5] R. Quispe and H. Pedrini, “Top-DB-Net: Top DropBlock for Activation,” Institue of Computing, University of Campinas, Campinas.