Dr.Ing. Anouar BEN KHALIFA

PhD

(Th4). Wafa LEJMI

Thesis topic : Spatio-Temporal Violence Classification based on Material Derivative and Deep Learning Models
Thesis director : Pr. Mohamed Ali Mahjoub.
Co-supervisor : Dr. Anouar BEN KHALIFA.
Institution : Higher Institute of Computer Science and Communication Techniques of Hammam Sousse, University of Sousse.
Defense date : October 30, 2021.
Abstract : The growing need for information and high-quality video cameras has led to the proliferation of video-based systems that perform tasks such as traffic monitoring, surveillance, etc. A basic component in these systems is the visual tracking of objects contained into a video sequence to estimate their paths. Indeed, the main purpose of event detection systems is to characterize activities using unsupervised or supervised techniques. Our work highlights the violence classification in video surveillance sequences especially that, in the current era, the implementation of automated security video surveillance systems is particularly needy in terms of human action recognition. Nevertheless, the latter encounters various interlinked difficulties which require efficient solutions as well as feasible methods that provide a relevant distinction between normal human actions and abnormal ones.

This thesis presents an overview of the tools and techniques used for violence recognition in video sequences, as well as a literature review of the current research on-going efforts on this field and finally proposes two new models for violent scenes predicting. The originality of this thesis is highlighted through three major contributions in terms of characterization and classification. Indeed, the first contribution consists in proposing a first model based on a preliminary extraction of spatio-temporal features using the substantial derivative which describes the rate of change of a particle while in motion with respect to time. Analogically to this particle derivative stemming from the physics of fluid mechanics, we were able to estimate local and convective accelerations from video. In fact, the local or temporal acceleration represents the increase rate of a pixel’s speed over time at a specific point of the flow. Convective acceleration describes the increase rate of speed due to the change in pixel position. The classification algorithm is then implemented using a recurrent neural network (Long Short-Term Memory LSTM) which can process both isolated data as well as sequences. This helps avoid long-term dependency issues, by interacting through four layers of neural network and gates indicating which data is useful to keep and which is not. Thus, only relevant data passed through the sequence chain to facilitate prediction.

This second contribution, namely the LSTM deep learning classification technique, allowed us to classify the generated features into eight violent and non-violent categories and a prediction value for each action class is calculated. This first model was trained on a public dataset and its classification ability is evaluated on three confusion matrices which present the system predictions and their actual labels. The third contribution of this thesis is a model based on Deep Belief Networks DBN which are multilayer neural networks where each layer is a restricted Boltzmann machine (RBM) stacked with other RBMs to build the DBN. During training, the first step was to learn a feature layer from the input (visible) units using the contrasting divergence (CD) algorithm. Then, processing the activation functions of the entities previously formed as visible units and learning the features in a second hidden layer. The entire DBN is trained when the training of the final hidden layer is achieved. We prepared the training and test data, configured the DBN layers for automatic feature learning, and specified the RBM parameters, i.e., a number of three hidden layers and the number of hidden nodes in each layer, corresponding to the input data features stored in the system. We needed a discriminant RBM in the last layer and specified Softmax as the original classifier in the DBN.

Through the experiments we carried out on a public dataset sequence (SBU Kinect Interaction dataset), we assessed the effectiveness of some detectors and descriptors by performing a feature extraction using various algorithms, namely, HARRIS, SURF, HOG and STIP and an SVM classification. In addition, we evaluated the performance of the first proposed model based on a descriptor relying on accelerations resulting from the material derivative as well as the LSTM neural classification approach, as well as the performance of the second model based on the deep Bayesian network DBN. The classification performances of the proposed models are measured by generating confusion matrices to compare the distribution of predictions made for each class of action, are promising and very encouraging.

Key words : violence scene; point of interest; spatio-temporal descriptor; optical flow; material derivative; acceleration; local; convective; classification; recognition; deep learning; LSTM; DBN; RBM.
Publications : This thesis led to the publication of the following papers :

(C30). Wafa Lejmi, Anouar Ben Khalifa, Mohamed Ali Mahjoub, An Innovative Approach Towards Violence Recognition Based on Deep Belief Network, 8th International Conference on Control, Decision and Information Technologies (CoDIT), pp. 1297-1302, 2022, Istanbul, Turkey. DOI: https://doi.org/10.1109/CoDIT55151.2022.9803898. (Conf.Rank C) .

(J14) . Wafa Lejmi, Anouar Ben Khalifa, Mohamed Ali Mahjoub, A Novel Spatio-Temporal Violence Classification Framework Based on Material Derivative and LSTM Neural Network, Traitement du Signal, Vol. 37, No. 5, November 2020, pp. 687-701, DOI: https://doi.org/10.18280/ts.370501 . Quartile: Q3, IF= 2.589.

(C20) . Wafa Lejmi, Anouar Ben Khalifa, Mohamed Ali Mahjoub, Challenges and Methods of Violence Detection in Surveillance Video: A Survey, In: Vento M., Percannella G. (eds) Computer Analysis of Images and Patterns. CAIP 2019. Lecture Notes in Computer Science, vol 11679. Springer, Cham, pp. 62-73. DOI: https://doi.org/10.1007/978-3-030-29891-3_6 (Conf.Rank B)

(C12). Wafa Lejmi, Anouar Ben Khalifa, Mohamed Ali Mahjoub, Fusion Strategies for Recognition of Violence Actions, IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA), pp. 178 - 183, 2017. DOI: https://doi.org/10.1109/AICCSA.2017.193 (Conf.Rank C)

(C11). Wafa Lejmi, Mohamed Ali Mahjoub, Anouar Ben Khalifa, Event detection in video sequences: Challenges and perspectives, 13th IEEE International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), pp. 682 – 690, 2017. DOI: https://doi.org/10.1109/FSKD.2017.8393354 (Conf.Rank C)

Page updated

Google Sites

Report abuse