A Robust Multi-Camera Deep Person Re-Identification Framework using Spatiotemporal Context Modelling

Abstract

Person re-identification has gotten a lot of attention in recent years, due to the fact of increasingly growing number of surveillance cameras and an urgent need for public safety. It is a process of identifying a query person of interest across non-overlapping cameras. Person ReID has also seen significant advancements in recent years due to the development of deep learning algorithms, especially Convolutional Neural Networks (CNN). The majority of the work has been done by using ResNet50, a deep CNN architecture pre-trained on ImageNet as a backbone for automatic rich spatial feature extraction on widely used image-based and video-based person ReID benchmark datasets. The proposed research aims at finding the solution to the person ReID problem, where a robust, less complex multi-camera person ReID framework can be designed for the accurate and efficient identification and retrieval of query person of interest on three image- based i.e. Market-1501, DukeMTMC-ReID, MSMT17, and two video-based i.e. MARS and DukeMTMC-VideoReID datasets under different variable and complex scenarios using transfer learning of different pre-trained CNN architectures so that it can also be deployed into the real world.

Introduction

With the rapidly growing numbers of smart surveillance cameras and CCTV, video analytics is empowering automatic surveillance capabilities that once required continuous manual monitoring by humans. In general, video analytics is the extraction of useful and meaningful information from any given input video feeds. It mainly focuses on detailed understanding of what is exactly happening in a video. Video analytics builds upon the foundation of different research areas including Computer Vision (CV), Pattern Recognition (PR) and Machine Learning (ML) that bridge several industrial applications in surveillance, transportation and retail. The goal of applications based on video analytics is to observe a video content and draw some inference by means of intelligent ML and Deep Learning (DL) algorithms in order to learn, understand and interpret the content of a video. There are several applications of video analytics including smart automated surveillance for the monitoring of people or transportations and to detect any anomalous activities, Smoke or fire detection, object, vehicle or person counting, object tracking etc. Here, one of the applications of video analytics i.e. person re-identification is being focused.

Person re-identification (ReID) is a process of identifying a person of interest having specific ID from non-overlapping cameras. Given a query image, the goal of the learner is to identify that person by matching it with person images in the ranked gallery set as shown below in a figure. It basically tries to identify whether a person of interest has relocated or has been re-identified at another place by same or different cameras in a specific time frame at similar or different locations. Note that the query person of interest can be represented via either image or a video sequence. With the urgent need and demand of public safety, person ReID have gained increasing attention in recent years. There are several applications that involve person re-identification and tracking i.e. multi camera activity recognition, human behavior analysis, person retrieval, cross camera tracking and public safety in crowded and sensitive places.

Simple illustration of query to gallery matching

Application

Motivation

To automate the Person Re-identification process for automating many surveillance activities in airports, metro stations, Shopping Malls, Roads, Parks, etc.
To reduce the workload of humans via an automated system that once required continuous monitoring.
To reduce the workload of humans via an automated system that once required continuous monitoring.

Related Datasets

Dataset Download Links:

Image Based ReID

PRID11
CUHK03
Market-1501
CUHK-SYSU
DukeMTMC-reID (This dataset has been retracted)
MSMT17

Video Based ReID

ViPeR
ILiDS-ViD
MARS
DukeMTMC-VideoReID (This dataset has been retracted)

Detailed information regarding the configuration of each dataset can be found here

Evaluation Measures

1. Cumulative Matching Characteristics of Rank-k (CMC rank-k)

Cumulative Matching Characteristics (CMC) curves are the most popular evaluation metrics for person re-identification methods. Consider a simple single-gallery-shot setting, where each gallery identity has only one instance. For each query, an algorithm will rank all the gallery samples according to their distances to the query from small to large, and the CMC top-k accuracy is

which is a shifted step function. The final CMC curve is computed by averaging the shifted step functions over all the queries

2. Mean Average Precision (mAP)

Mean average precision for a set of queries is the mean of the average precision scores for each query.

where, Q is the number of queries in the set and AveP(q) is the average precision (AP) for a given query, q.

Open Challenges / Research Gap

Open Challenges

Different Viewpoints
Varying Low Image Resolution
Occlusion
Illumination Changes
Pose Variations
Heterogeneous Modalities
Complex Camera Environments
Background Clutter
Significant Domain Shift
Cross data Validation
Unseen testing Scenarios
Changing Clothes

Research Gaps

Uncontrollable Data Collection
Human Annotation Minimization
Domain Specific / Generalizable Architecture Design
Dynamic Model Update
Efficient Model Deployment

Problem Statement

A robust multi-camera person re-identification framework for accurate and efficient identification and retrieval of query person of interest among multiple non-overlapping cameras under different variable and dynamic scenarios using transfer learning of pre-trained Deep CNN architectures.

Architecture Diagram

Useful Reads / Related Work

Deep learning Strategies for Person Re-Identification

This Demo video is presented by Alessandro Borgia from University Defence Research Collaboration (UDRC) Edinburgh Consortium proposing architecture to be deployed as in real-world application, in order to solve issue of people disappearing for longer period of time and re-appearing in another field of view.

Survey Papers

Ye, Mang, et al. "Deep learning for person re-identification: A survey and outlook." IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).
Islam, Khawar. "Person search: New paradigm of person re-identification: A survey and outlook of recent works." Image and Vision Computing 101 (2020): 103970.

Papers Related to Image based Person ReID

L. Wei, S. Zhang, W. Gao, and Q. Tian, “Person Transfer GAN to Bridge Domain Gap for Person Re-Identiﬁcation_CVPR_2018_paper.pdf,” Cvpr 2018, pp. 79–88, 2018.
X. Zhu, X. Zhu, M. Li, P. Morerio, V. Murino, and S. Gong, “Intra-Camera Supervised Person Re-Identification,” International Journal of Computer Vision, Feb. 2021, doi: 10.1007/s11263-021-01440-4.
R. Hou, B. Ma, H. Chang, X. Gu, S. Shan, and X. Chen, “IAUnet: Global context-aware feature learning for person re-identification,” arXiv. arXiv, Sep. 02, 2020, doi: 10.1109/tnnls.2020.3017939.
M. Adil, S. Mamoon, A. Zakir, M. A. Manzoor, and Z. Lian, “Multi Scale-Adaptive Super-Resolution Person Re-Identification Using GAN,” IEEE Access, vol. 8, pp. 177351– 177362, Sep. 2020, doi: 10.1109/access.2020.3023594.
C. Neff, M. Mendieta, S. Mohan, M. Baharani, S. Rogers, and H. Tabkhi, “REVAMP2T: Real-Time Edge Video Analytics for Multicamera Privacy-Aware Pedestrian Tracking,” IEEE Internet of Things Journal, vol. 7, no. 4, pp. 2591–2602, Apr. 2020, doi: 10.1109/JIOT.2019.2954804.
Y. Lin, L. Xie, Y. Wu, C. Yan, and Q. Tian, “Unsupervised Person Re-identification via Softened Similarity Learning.”
Z. Zhu et al., “Viewpoint-Aware Loss with Angular Regularization for Person Re-Identification.” [Online]. Available: www.aaai.org.
B. ( Ning, ) Xia, Y. Gong, Y. Zhang, and C. Poellabauer, “Second-order Non-local Attention Networks for Person Re-identification.”
T. Chen et al., “ABD-Net: Attentive but Diverse Person Re-Identification.” [Online]. Available: https://github.com/TAMU-VITA/ABD-Net.
R. Quan, X. Dong, Y. Wu, L. Zhu, and Y. Yang, “Auto-ReID: Searching for a Part- Aware ConvNet for Person Re-Identification.”
H. Luo, Y. Gu, X. Liao, S. Lai, and W. Jiang, “Bag of Tricks and A Strong Baseline for Deep Person Re-identification.” [Online]. Available: https://github.com/michuanhaohao/reid- strong-baseline.

Papers Related to Video based Person ReID

A. A. Sekh, D. P. Dogra, H. Choi, S. Chae, and I. J. Kim, “Person Re-identification in Videos by Analyzing Spatio-temporal Tubes,” Multimedia Tools and Applications, vol. 79, no. 33–34, pp. 24537–24551, Sep. 2020, doi: 10.1007/s11042-020-09096-x.
L. Wu, Y. Wang, L. Shao, and M. Wang, “3-D PersonVLAD: Learning Deep Global Representations for Video-Based Person Reidentification,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 11, pp. 3347–3359, Nov. 2019, doi: 10.1109/TNNLS.2019.2891244.
R. Hou, B. Ma, H. Chang, X. Gu, S. Shan, and X. Chen, “VRSTC: Occlusion-Free Video Person Re-Identification.”
G. Wang, J. Lai, P. Huang, and X. Xie, “Spatial-Temporal Person Re-Identification.” [Online]. Available: www.aaai.org.

Additional Link for Lists of Latest CVPR/ICCV Conferences on Person ReID

Awesome Person Re-identification (Person ReID) is a repository for organizing articles related to person re-identification. Most papers are linked to the pdf address provided by "arXiv" or "OpenAccess". However, some papers require an academic license to browse. For example, IEEE, Springer, and Elsevier journal, etc.

Future Directions