Dr.Ing. Anouar BEN KHALIFA

PhD Students

(Th5). Laila OUANNES

Thesis topic : Contribution To Face Recognition In a Degraded Environment
Thesis director : Pr. Najoua ESSOUKRI BEN AMARA
Co-supervisor : Dr. Anouar BEN KHALIFA.
Institution : ENISo, University of Sousse.
Defense date : June 10, 2022
Abstract : Nowadays, the exploitation of biometrics is becoming crucial in several fields and applications such as access control, border control, and the fight against terrorism. In particular, facial recognition, which is a relatively new technology, has gained prominence around the world in identifying people. However, several challenges inhibit the task of facial recognition such as facial expressions, head pose variations, illumination variations, and partial occlusion. The work developed in this Ph.D. thesis concerns facial recognition under degraded conditions. The first contribution consists in the combination of the k-NN with the KD-Tree for the classification of the interest points provided by the SURF descriptor. The new classifier ensures the performance of our identification system. The second contribution consists in comparing hand-crafted features with learned features provided by the pre-trained deep learning models such as VGG-19 and Inception-v3. These have exceeded hand-crafted features provided by the usual descriptors such as the HOG. The development of a Siamese network based on the Inception-v3 pre-trained model and using the contrastive loss function is the subject of the third contribution. Indeed, Siamese networks have proven their efficiency to overcome the problems of degraded conditions and the necessity of huge datasets that need deep learning models. Our fourth contribution is manifested by an investigation of the impact of occlusion on facial recognition. In this context, we have resorted to de-occlusion and reconstruction of occluded faces. First, we detect the faces using our detector that combines the HSV skin color and the Viola & Jones detector. This combination reinforces the Viola & Jones detector against partial occlusion and head pose variations. Then, for the reconstruction, we use two methods: the Laplacian pyramid blending and CycleGANs. The comparison between reconstructed, de-occluded, and occluded faces shows that partial occlusion inhibits people identification and that reconstruction improves facial recognition performance. All these approaches are evaluated on two public databases: EKFD and IST-EURECOM LFFD, which provide the different challenges of facial recognition. The obtained experimental results show the robustness of the proposed approaches in terms of precision and efficiency.
Key words : Facial recognition, degraded environment, occlusion, hand-crafted features, learned features, Siamese networks, face detection, face reconstruction, deep learning, pretrained models, Laplacian pyramid blending, CycleGANs.
Publications : This thesis led to the publication of the following papers :

(C34). Laila Ouannes, Anouar Ben Khalifa, Najoua Essoukri Ben Amara, Siamese Network for Face Recognition in Degraded Conditions Siamese, 6th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP'2022), pp. 1-6, 2022, Hybrid Moncton (Canada)-Sfax (Tunisia). DOI: https://doi.org/10.1109/ATSIP55956.2022.9805878.

(J17). L. Ouannes, Anouar Ben Khalifa, Najoua Essoukri Ben Amara, Comparative Study Based on De-Occlusion and Reconstruction of Face Images in Degraded Conditions, Traitement du Signal, Vol. 38, No. 3, pp. 573-585, June 2021. DOI : https://doi.org/10.18280/ts.380305 . Quartile: Q3, IF= 2.589.

(C23). L. Ouannes, Anouar Ben Khalifa, N. Essoukri Ben Amara, Facial Recognition in Degraded Conditions Using Local Interest Points, 17^th IEEE International Multi-Conference on Systems, Signals and Devices (SSD’20) , 20-23 july 2020, pp. 404-409 Sfax-Tunisia. DOI : https://doi.org/10.1109/SSD49366.2020.9364124

(C22). Laila Ouannes Nasr, Anouar Ben Khalifa, Najoua Essoukri Ben Amara, Deep Learning vs Hand-Crafted Features for Face Recognition under Uncontrolled Conditions, IEEE International Conference on Signal, Control and Communication (SCC’19), pp 1-6, 2019. DOI: https://doi.org/10.1109/SCC47175.2019.9116159.

(Th4). Wafa LEJMI

Thesis topic : Spatio-Temporal Violence Classification based on Material Derivative and Deep Learning Models
Thesis director : Pr. Mohamed Ali Mahjoub.
Co-supervisor : Dr. Anouar BEN KHALIFA.
Institution : Higher Institute of Computer Science and Communication Techniques of Hammam Sousse, University of Sousse.
Defense date : October 30, 2021.
Abstract : The growing need for information and high-quality video cameras has led to the proliferation of video-based systems that perform tasks such as traffic monitoring, surveillance, etc. A basic component in these systems is the visual tracking of objects contained into a video sequence to estimate their paths. Indeed, the main purpose of event detection systems is to characterize activities using unsupervised or supervised techniques. Our work highlights the violence classification in video surveillance sequences especially that, in the current era, the implementation of automated security video surveillance systems is particularly needy in terms of human action recognition. Nevertheless, the latter encounters various interlinked difficulties which require efficient solutions as well as feasible methods that provide a relevant distinction between normal human actions and abnormal ones.

This thesis presents an overview of the tools and techniques used for violence recognition in video sequences, as well as a literature review of the current research on-going efforts on this field and finally proposes two new models for violent scenes predicting. The originality of this thesis is highlighted through three major contributions in terms of characterization and classification. Indeed, the first contribution consists in proposing a first model based on a preliminary extraction of spatio-temporal features using the substantial derivative which describes the rate of change of a particle while in motion with respect to time. Analogically to this particle derivative stemming from the physics of fluid mechanics, we were able to estimate local and convective accelerations from video. In fact, the local or temporal acceleration represents the increase rate of a pixel’s speed over time at a specific point of the flow. Convective acceleration describes the increase rate of speed due to the change in pixel position. The classification algorithm is then implemented using a recurrent neural network (Long Short-Term Memory LSTM) which can process both isolated data as well as sequences. This helps avoid long-term dependency issues, by interacting through four layers of neural network and gates indicating which data is useful to keep and which is not. Thus, only relevant data passed through the sequence chain to facilitate prediction.

This second contribution, namely the LSTM deep learning classification technique, allowed us to classify the generated features into eight violent and non-violent categories and a prediction value for each action class is calculated. This first model was trained on a public dataset and its classification ability is evaluated on three confusion matrices which present the system predictions and their actual labels. The third contribution of this thesis is a model based on Deep Belief Networks DBN which are multilayer neural networks where each layer is a restricted Boltzmann machine (RBM) stacked with other RBMs to build the DBN. During training, the first step was to learn a feature layer from the input (visible) units using the contrasting divergence (CD) algorithm. Then, processing the activation functions of the entities previously formed as visible units and learning the features in a second hidden layer. The entire DBN is trained when the training of the final hidden layer is achieved. We prepared the training and test data, configured the DBN layers for automatic feature learning, and specified the RBM parameters, i.e., a number of three hidden layers and the number of hidden nodes in each layer, corresponding to the input data features stored in the system. We needed a discriminant RBM in the last layer and specified Softmax as the original classifier in the DBN.

Through the experiments we carried out on a public dataset sequence (SBU Kinect Interaction dataset), we assessed the effectiveness of some detectors and descriptors by performing a feature extraction using various algorithms, namely, HARRIS, SURF, HOG and STIP and an SVM classification. In addition, we evaluated the performance of the first proposed model based on a descriptor relying on accelerations resulting from the material derivative as well as the LSTM neural classification approach, as well as the performance of the second model based on the deep Bayesian network DBN. The classification performances of the proposed models are measured by generating confusion matrices to compare the distribution of predictions made for each class of action, are promising and very encouraging.

Key words : violence scene; point of interest; spatio-temporal descriptor; optical flow; material derivative; acceleration; local; convective; classification; recognition; deep learning; LSTM; DBN; RBM.
Publications : This thesis led to the publication of the following papers :

(C30). Wafa Lejmi, Anouar Ben Khalifa, Mohamed Ali Mahjoub, An Innovative Approach Towards Violence Recognition Based on Deep Belief Network, 8th International Conference on Control, Decision and Information Technologies (CoDIT), pp. 1297-1302, 2022, Istanbul, Turkey. DOI: https://doi.org/10.1109/CoDIT55151.2022.9803898. (Conf.Rank C) .

(J14) . Wafa Lejmi, Anouar Ben Khalifa, Mohamed Ali Mahjoub, A Novel Spatio-Temporal Violence Classification Framework Based on Material Derivative and LSTM Neural Network, Traitement du Signal, Vol. 37, No. 5, November 2020, pp. 687-701, DOI: https://doi.org/10.18280/ts.370501 . Quartile: Q3, IF= 2.589.

(C20) . Wafa Lejmi, Anouar Ben Khalifa, Mohamed Ali Mahjoub, Challenges and Methods of Violence Detection in Surveillance Video: A Survey, In: Vento M., Percannella G. (eds) Computer Analysis of Images and Patterns. CAIP 2019. Lecture Notes in Computer Science, vol 11679. Springer, Cham, pp. 62-73. DOI: https://doi.org/10.1007/978-3-030-29891-3_6 (Conf.Rank B)

(C12). Wafa Lejmi, Anouar Ben Khalifa, Mohamed Ali Mahjoub, Fusion Strategies for Recognition of Violence Actions, IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA), pp. 178 - 183, 2017. DOI: https://doi.org/10.1109/AICCSA.2017.193 (Conf.Rank C)

(Th3). Amira MIMOUNA

Thesis topic : Exploring Data Fusion for Multi-Object Detection for Intelligent Transportation Systems using Deep Learning.
Thesis director : Pr. Najoua Essoukri BEN AMARA and Pr. Abdelmalik TALEB-AHMED
Co-supervisor : Dr. Anouar BEN KHALIFA and Dr. Ihsen ALOUANI.
Institution : ENISo, University of Sousse., Tunisia and Polytechnic University Hauts-De-France.
Defense date : May 25, 2021.
Abstract : Building reliable environment perception systems is a crucial task for autonomous driving, especially in dense traffic areas. Researching in this field is evolving increasingly. However, we are at the beginning of a research pathway towards a future generation of intelligent transportation systems. In fact, challenging conditions in real-world driving circumstances, infrastructure monitoring, and accurate real-time system response, are the predominant concerns when developing such systems. Recent improvements and breakthroughs in scene understanding for intelligent transportation systems have been mainly based on deep learning and the fusion of different modalities. In this context, firstly, we introduce OLIMP1 : A heterOgeneous MuLtimodal Dataset for Advanced EnvIronMent Perception . This is the first public, multimodal and synchronized dataset that includes Ultra Wide-Band (UWB) radar data, acoustic data, narrowband radar data and images. OLIMP comprises 407 scenes and 47,354 synchronized frames, including four categories: pedestrians, cyclists, cars and trams. The dataset presents various challenges related to dense urban traffic such as cluttered environments and different weather conditions. To demonstrate the usefulness of the introduced dataset, we propose, afterwards, a fusion framework that combines the four modalities for multi object detection. The obtained results are promising and spur for future research.

In short range settings, UWB radars represent a promising technology for building reliable obstacle detection systems as they are robust to environmental conditions. However, UWB radars suffer from a segmentation challenge: localizing relevant Regions Of Interests (ROIs) within its signals. Therefore, we put froward a segmentation approach to detect ROIs in an environment perception-dedicated UWB radar as a third contribution. Specifically, we implement a differential entropy analysis to detect ROIs. The obtained results show higher performance in terms of obstacle detection compared to state-of-theart techniques, as well as stable robustness even with low amplitude signals.

Subsequently, we propose a novel framework that exploits Recurrent Neural Networks (RNNs) with UWB signals for multiple road obstacle detection as a deep learning-based approach. Features are extracted from the time-frequency domain using the discrete wavelet transform and are forwarded to the Long short-term memory (LSTM) network. The obtained results show that the LSTM-based system outperforms the other implemented related techniques in terms of obstacle detection.

Key words : Intelligent transportation systems; Public dataset; Multi-modality; Fusion; Object detection; UWB radar; Entropy; Segmentation; Deep learning; LSTM.
Publications : This thesis led to the publication of the following papers :

(C26). Amira Mimouna, Anouar Ben Khalifa, Ihsen Alouani, Abdelmalik Taleb-Ahmed, Atika Rivenq, Najoua Essoukri Ben Amara, LSTM-based system for multiple obstacle detection using ultra-wide band radar, In Proceedings of the 13^thInternational Conference on Agents and Artificial Intelligence, Volume 2, pp. 418-425, Vienna, Austria, 2021. DOI : 10.5220/0010386904180425 (Conf.Rank C) .

(J16). Amira Mimouna, Anouar Ben Khalifa, Ihsen Alouani, Najoua Essoukri Ben Amara, Atika Rivenq, Abdelmalik Taleb-Ahmed, Entropy-based Ultra-Wide Band radar signals Segmentation for Multi Obstacle Detection, IEEE Sensors Journal, Vol. 21, No. 6, pp. 8142-8149, March 2021. DOI : 10.1109/JSEN.2021.3050054. Quartile: Q1, IF=3.301.

(J5). Amira Mimouna, Ihsen Alouani, Anouar Ben Khalifa, Yassin El Hillali, Abdelmalik Taleb-Ahmed, Atika Menhaj, Abdeldjalil Ouahabi, Najoua Essoukri Ben Amara, OLIMP: A Heterogeneous Multimodal Dataset for Advanced Environment Perception, Electronics, Volume 9, March 2020, 560. DOI: https://doi.org/10.3390/electronics9040560. Quartile: Q2, IF= 2.397.

(Th2). Imen JEGHAM

Thesis topic : Human action recognition in uncontrolled environment: Application to driver monitoring.
Thesis director : Pr. Mohamed Ali Mahjoub.
Co-supervisor : Dr. Anouar BEN KHALIFA.
Institution : Higher Institute of Computer Science and Communication Techniques of Hammam Sousse, University of Sousse.
Defense date : April 6, 2021.
Abstract : Due to the growing demand for automatic interpretation of human action, human action recognition has become one of the most trending and attractive research fields. Ambiguities in recognizing actions does not only come from the difficulty to define the motion of body parts but also from a variety of issues related to real-world problems including illumination variation and dynamic and cluttered background, making this field a challenging topic. Analyzing and understanding a person’s behavior is fundamentally required for a wide range of applications such as video indexing, biometrics, intelligent transportation systems, etc. Specifically, driver distraction and fatigue have become one of the leading causes of severe traffic accidents. With the growing development of advanced driver assistance systems and the introduction of third-level autonomous vehicles, recognizing driver’s actions becomes increasingly critical and complex because of challenges related to naturalistic driving settings. In fact, the limited in-vehicle space where the actions are executed and the parallel execution of different actions with driving tasks challenge the human action recognition techniques performances.

Different contributions are proposed in this PhD thesis. First, we introduce a public and well-structured dataset, named Multiview, Multimodal and Multispectral Driver Action Dataset (3MDAD). The dataset is mainly composed of two sets: the first one recorded in daytime and the second one at nighttime. Each set consists of two synchronized data modalities, each from frontal and side views. More than 60 drivers were asked to execute 16 in-vehicle actions under a wide range of naturalistic driving settings. Such a dataset is of a valuable benefit to researchers working in different fields like image processing, computer vision, sensors fusion, and human-centered intelligent driver assistance systems. Inspired by the human vision process, visual attention models extract relevant information by selectively concentrating on parts of the visual space where and when it is needed. Attention models can be clustered into two main categories: hard and soft attention models. Thus, we, third, propose a novel soft spatial attention-based network named the Depth-based Spatial Attention network (DSA), which unprecedentedly exploit depth modality to add a cognitive process to deep network by selectively focusing on the driver’s silhouette and motion in the cluttered driving scene. Finally, we propose two hard spatial attention-based approaches: the first based on traditional handcrafted features: in fact, based on SURF keypoints, we extract the region of interest that contains mainly the body parts that are involved in the action itself other than safe driving. The second based on deep learning techniques: using convolution detection, local discriminative salient regions of the scene, mainly head and hands, are extracted and exploited for distraction detection and in-vehicle action recognition.

Key words : Human action recognition; Driver action recognition; Naturalistic driving settings; Safe driving; Distracted driving; Deep learning; Dataset; Multimodal; Intelligent transportation system; Visual attention; Real world challenges.
Publications : This thesis led to the publication of the following papers :

(J15). Imen Jegham, Anouar Ben Khalifa, Ihsen Alouani, Mohamed Ali Mahjoub, Soft Spatial Attention-based Multimodal Driver Action Recognition Using Deep Learning, IEEE Sensors Journal, Vol. 21, No. 2, pp. 1918-1925. January, 2021. DOI: 10.1109/JSEN.2020.3019258. Quartile: Q1, IF=3.301.

(J13). Imen Jegham, Anouar Ben Khalifa, Ihsen Alouani, Mohamed Ali Mahjoub, A novel public dataset for multimodal multiview and multispectral driver distraction analysis: 3MDAD, Signal Processing: Image Communication, Volume 88, October 2020, 115966, DOI: https://doi.org/10.1016/j.image.2020.115960. Quartile: Q1, IF= 3.256.

(J6). Imen Jegham, Anouar Ben Khalifa, Ihsen Alouani, Mohamed Ali Mahjoub, Vision-based human action recognition: An overview and real world challenges, Forensic Science International: Digital Investigation, Volume 32, March 2020, 200901, DOI: https://doi.org/10.1016/j.fsidi.2019.200901. Quartile: Q2, IF= 1.66.

(C21). Imen Jegham, Anouar Ben Khalifa, Ihsen ALOUANI, Mohamed Ali MAHJOUB, MDAD: A Multimodal and Multiview in-Vehicle Driver Action Dataset, In: Vento M., Percannella G. (eds) Computer Analysis of Images and Patterns. CAIP 2019. Lecture Notes in Computer Science, vol 11679. Springer, Cham, pp. 518-529. DOI: https://doi.org/10.1007/978-3-030-29888-3_42 (Conf.Rank B)

(C14). Imen Jegham, Anouar Ben Khalifa, Ihsen ALOUANI, Mohamed Ali MAHJOUB, Safe Driving : Driver Action Recognition using SURF Keypoints, The 30^th International Conference on Microelectronics (ICM2018), pp. 60 - 63, 2018. DOI: https://doi.org/10.1109/ICM.2018.8704009

(Th1). Safa AMEUR

Thesis topic : Gesture Recognition With The Leap Motion For Screens Manipulation In a Surgical Operating Room.
Thesis director : Pr. Med Salim BOUHLEL.
Co-supervisor : Dr. Anouar BEN KHALIFA.
Institution : ENISo, University of Sousse.
Defense date : March 25, 2021.
Abstract : Human action recognition has been an intense research area for more than a decade. In particular, Hand Gesture Recognition (HGR) has become one of the most interesting means of touchless human-computer interaction thanks to the advancement of sensing technology. The recent introduction of novel acquisition devices, like the Leap Motion Controller (LMC), allows obtaining a very informative description of the hand pose and motion that can be exploited for accurate gesture recognition. In this thesis, we are interested in HGR approaches applied on time series data gathered from the LMC, in a context related to medical image manipulation. We introduce the first public dataset, called "LeapGestureDB" gathered with the LMC in the medical field. This dataset consists of 6600 samples. As a second contribution, we suggest a novel feature extraction method named Chronological Pattern Indexing (CPI) which encodes the temporal order of patterns that form the performed hand gesture. In a third contribution, we provide a dynamic hand gesture recognition approach using recurrent neural networks. First, we analyze the sequential time series data gathered from the LMC using different Long Short-Term Memory (LSTM) variants separately, in particular the unidirectional LSTM, the bidirectional LSTM and the deep LSTM networks. Then, we propound novel architecture by combining the aforementioned networks, named Hybrid Bidirectional Unidirectional LSTM (HBU-LSTM). The suggested network improves the model performance significantly by considering the spatial and temporal dependencies in the LMC sequential data.

Throughout this work, the recognition models are examined on two available benchmark datasets, named the LeapGestureDB dataset and the RIT dataset. We provide both quantitative and qualitative results. On several aspects related to HGR, this work outperforms the state-of-the-art gesture recognition methods in term of efficiency and computational complexity.

Key words : Hand gesture recognition, Leap Motion controller, Feature extraction, Time series data, Chronological indexing, Deep learning, LSTM.
Publications : This thesis led to the publication of the following papers :

(C24). Safa Ameur, Anouar Ben Khalifa, Mohamed Salim Bouhlel, Hand-gesture-based Touchless Exploration of Medical Images with Leap Motion Controller, 17^th IEEE International Multi-Conference on Systems, Signals and Devices (SSD’20) , 20-23 july 2020, pp. 6-11, Sfax-Tunisia. DOI: https://doi.org/10.1109/SSD49366.2020.9364244.

(J10). Safa Ameur, Anouar Ben Khalifa, Med Salim Bouhlel, A novel hybrid bidirectional unidirectional LSTM network for dynamic hand gesture recognition with Leap Motion, Entertainment Computing, Volume 35, August 2020, 100373, DOI: https://doi.org/10.1016/j.entcom.2020.100373. Quartile: Q2, IF= 1.455.

(J9). Safa Ameur, Anouar Ben Khalifa, Med Salim Bouhlel, Chronological pattern indexing: An efficient feature extraction method for hand gesture recognition with Leap Motion, Journal of Visual Communication and Image Representation, Volume 70, July 2020, 102842, DOI: https://doi.org/10.1016/j.jvcir.2020.102842. Quartile: Q1, IF= 2.678.

(C16). Safa Ameur, Anouar Ben Khalifa, Mohamed Salim Bouhlel, LeapGestureDB: A Public Leap Motion Database Applied for Dynamic Hand Gesture Recognition in Surgical Procedures, In: Balas V., Jain L., Balas M., Shahbazova S. (eds) Soft Computing Applications. SOFA 2018. Advances in Intelligent Systems and Computing, vol 1222. pp. 125-138, Springer, Cham. DOI: https://doi.org/10.1007/978-3-030-52190-5_9 (Conf.Rank C)

(C10). Safa Ameur, Anouar Ben Khalifa, Mohamed Salim Bouhlel, A comprehensive leap motion database for hand gesture recognition, 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications, pp. 514 - 519, 2016. DOI: https://doi.org/10.1109/SETIT.2016.7939924