Thesis topic : Human action recognition in uncontrolled environment: Application to driver monitoring.
Thesis director : Pr. Mohamed Ali Mahjoub.
Co-supervisor : Dr. Anouar BEN KHALIFA, Dr. Ihsen ALOUANI
Institution : Higher Institute of Computer Science and Communication Techniques of Hammam Sousse, University of Sousse.
Defense date : April 6, 2021.
Abstract : Due to the growing demand for automatic interpretation of human action, human action recognition has become one of the most trending and attractive research fields. Ambiguities in recognizing actions does not only come from the difficulty to define the motion of body parts but also from a variety of issues related to real-world problems including illumination variation and dynamic and cluttered background, making this field a challenging topic. Analyzing and understanding a person’s behavior is fundamentally required for a wide range of applications such as video indexing, biometrics, intelligent transportation systems, etc. Specifically, driver distraction and fatigue have become one of the leading causes of severe traffic accidents. With the growing development of advanced driver assistance systems and the introduction of third-level autonomous vehicles, recognizing driver’s actions becomes increasingly critical and complex because of challenges related to naturalistic driving settings. In fact, the limited in-vehicle space where the actions are executed and the parallel execution of different actions with driving tasks challenge the human action recognition techniques performances.
Different contributions are proposed in this PhD thesis. First, we introduce a public and well-structured dataset, named Multiview, Multimodal and Multispectral Driver Action Dataset (3MDAD). The dataset is mainly composed of two sets: the first one recorded in daytime and the second one at nighttime. Each set consists of two synchronized data modalities, each from frontal and side views. More than 60 drivers were asked to execute 16 in-vehicle actions under a wide range of naturalistic driving settings. Such a dataset is of a valuable benefit to researchers working in different fields like image processing, computer vision, sensors fusion, and human-centered intelligent driver assistance systems. Inspired by the human vision process, visual attention models extract relevant information by selectively concentrating on parts of the visual space where and when it is needed. Attention models can be clustered into two main categories: hard and soft attention models. Thus, we, third, propose a novel soft spatial attention-based network named the Depth-based Spatial Attention network (DSA), which unprecedentedly exploit depth modality to add a cognitive process to deep network by selectively focusing on the driver’s silhouette and motion in the cluttered driving scene. Finally, we propose two hard spatial attention-based approaches: the first based on traditional handcrafted features: in fact, based on SURF keypoints, we extract the region of interest that contains mainly the body parts that are involved in the action itself other than safe driving. The second based on deep learning techniques: using convolution detection, local discriminative salient regions of the scene, mainly head and hands, are extracted and exploited for distraction detection and in-vehicle action recognition.
Key words : Human action recognition; Driver action recognition; Naturalistic driving settings; Safe driving; Distracted driving; Deep learning; Dataset; Multimodal; Intelligent transportation system; Visual attention; Real world challenges.
Publications : This thesis led to the publication of the following papers :
(J15). Imen Jegham, Anouar Ben Khalifa, Ihsen Alouani, Mohamed Ali Mahjoub, Soft Spatial Attention-based Multimodal Driver Action Recognition Using Deep Learning, IEEE Sensors Journal, Vol. 21, No. 2, pp. 1918-1925. January, 2021. DOI: 10.1109/JSEN.2020.3019258. Quartile: Q1, IF=3.301.
(J13). Imen Jegham, Anouar Ben Khalifa, Ihsen Alouani, Mohamed Ali Mahjoub, A novel public dataset for multimodal multiview and multispectral driver distraction analysis: 3MDAD, Signal Processing: Image Communication, Volume 88, October 2020, 115966, DOI: https://doi.org/10.1016/j.image.2020.115960. Quartile: Q1, IF= 3.256.
(J6). Imen Jegham, Anouar Ben Khalifa, Ihsen Alouani, Mohamed Ali Mahjoub, Vision-based human action recognition: An overview and real world challenges, Forensic Science International: Digital Investigation, Volume 32, March 2020, 200901, DOI: https://doi.org/10.1016/j.fsidi.2019.200901. Quartile: Q2, IF= 1.66.
(C21). Imen Jegham, Anouar Ben Khalifa, Ihsen ALOUANI, Mohamed Ali MAHJOUB, MDAD: A Multimodal and Multiview in-Vehicle Driver Action Dataset, In: Vento M., Percannella G. (eds) Computer Analysis of Images and Patterns. CAIP 2019. Lecture Notes in Computer Science, vol 11679. Springer, Cham, pp. 518-529. DOI: https://doi.org/10.1007/978-3-030-29888-3_42 (Conf.Rank B)
(C14). Imen Jegham, Anouar Ben Khalifa, Ihsen ALOUANI, Mohamed Ali MAHJOUB, Safe Driving : Driver Action Recognition using SURF Keypoints, The 30th International Conference on Microelectronics (ICM2018), pp. 60 - 63, 2018. DOI: https://doi.org/10.1109/ICM.2018.8704009