Invited Talk: Prof. Marc van Droogenbroeck, (Institut Montefiore, Belgium)
Title: Metrics and Performance Evaluation of AI-based Moving Object Detection: A Revisitation
Abstract: The development of AI methods is based on three pillars: (1) the availability of data, at least partially annotated; (2) the development of algorithms; and (3) the evaluation of performance. From a scientific perspective, the question of this development becomes more delicate when we have to agree on the metrics for assessing inference performance and when we have to decide on a ranking score. For example, the CDnet site, designed to evaluate the performance of change detection methods for video, proposes no less than 9 metrics.
The aim of this presentation is to outline the various elements of a methodology for evaluating the performance of tools for detecting the motion of objects in a video. First, we present the probabilistic framework for motion detection from the perspective of segmentation. This framework makes it possible to define a series of indicators that are widely used in practice for performance evaluation. As we assume that the result is segmentation, the indicators will be calculated at the pixel level. Next, after defining the notion of ranking, we show that there are an infinite number of performance indicators that can be used for ranking, including the F1 score. Second, we study the practical case of an evaluation involving several videos, or even several categories of videos. If we consider all these videos as coming from the same source, we explain why a contextual average is more appropriate than an arithmetic average. On the other hand, we also discuss the case of performance evaluation for a set of sources. Both of these cases apply to video surveillance, including all-weather surveillance. We conclude by presenting rules of good practice that avoid interpretations drawn, for example, from redundant indicators or those obtained by inadequate averaging. These rules are presented as a basis for choosing an algorithm to be implemented in practice
Invited Talk: Ass. Prof. Subudhi Badri (Indian Institute of Technology, Jammu, India)
Title: Thermal Video Surveillance System using Deep Learning Architectures: A Review
Abstract: Thermal surveillance system includes the detection and the tracking of the moving objects in a thermal videos. However, the performance of the thermal surveillance is challenging due to the distinct characteristics of the thermal videos: low resolution (or) missing information, low signal-to-noise ratio, lack of structure such as shape and textural information, lack of color information, and low contrast, etc. Thus, the visual contents of the thermal image are poorer and make it difficult to detect the moving object present in the thermal scene. Hence, there is a need for designing of technologies to enhance the perceivable information and use the same to detect moving objects. Hence image fusion is an essential task on thermal surveillance before, object detection from thermal video.
First, we review few algorithms based on the visual and thermal image fusion to bring-out the subtle details in the thermal scene. In this regard an integration of bi-dimensional empirical mode decomposition with two streams VGG-16 network is developed for image fusion. It is observed that due to use of the VGG-16 architecture, the developed scheme retains the in-depth features at different levels. The proposed deep multi-level fusion strategy produces the fused image with complementary details. However, the proposed model is found to be computationally intensive. Thus, to reduce the computational parameters, we have proposed a non-sub-sampled contourlet transform induced two streams ResNet-50 network technique where the deep directional features at low-frequency and high-frequency bands are explored for fusion process. During experimental validation we observed that, the results obtained by the non-subsampled contourlet transform induced two streams ResNet-50 network technique to preserve the maximum edge details with reduced artifacts during the image fusion, and hence we used the fused image from this stage for object detection.
Second, we focus on a multi-scale deep learning architecture for moving object detection where a modified ResNet-152 network with a hybrid pyramidal pooling. The modified ResNet-152 network is induced on the multi-scale features extraction (MFE) block to enhance the feature learning capabilities that preserve sparse and dense deep features. Further, we develop a multi-scale contrast preserving deep learning architecture, for moving object detection. A new encoder is designed based on the multi-scale contrast preservation (MSCP) convolutional neural network architecture which can retain the maximum contrast details for the in-depth features. The selected decoder network accurately projects the extracted features at different layers into pixel-level. It is observed that the selected multi-scale deep learning architectures produce results with accurate shapes for the various challenging thermal video scenes.
The efficiency of the selected techniques is corroborated by testing them on different benchmark databases. The performances of the proposed methods are evaluated by considering different competitive state-of-the-art techniques using relevant quantitative measures. The results show that the selected techniques provide better accuracy than the state-of-the-art techniques.