Research Activities

Current Research Projects

1. Enhancing Generated Chest Radiology Interpretability: A Multimodal Vision-Language Approach towards an Explainable Model

GIPSA-Lab, Grenonle-INP, UGA, ACTIV Team research Units

PhD Sayeh Gholipour-Picha

2022 to 2025, Acadanic Project

This work focuses on the issue of explainability in Deep Learning within the medical field , with a particular emphasis on chest X-rays. To address explainability, we propose to develope a closed pipeline of separate models that integrate natural language models that take advantages of the information within medical radiologist with localisation methods to produce visually interpretable results about the location of pathologies.

Our initial results in generating concise reports and semantically grounding them on chest X-rays images have proven to be successful. We further developed a specialised medical metric for assessing the similarity of medical phrases as the natural language metrics does not address the same semantics.

The current phase of this PhD will be dedicated on improving these individual models and coming up with a final hybrid model that is capable at providing confidence scores for localised regions of pathologies.

2. Infrared imaging based on spectral biomarker selection for the diagnosis of oral squamous cell carcinoma (OSCC)

Collaboration with Active Digital Multispectral InfraRed (ADMIR), Grenoble, France and Department of Medical Oncology, Centre Léon Bérard, and University Claude Bernard Lyon and GIPSA-Lab, Grenonle-INP, UGA

PhD Fabian León

2023 to 2026, Industry Project

The current cancer diagnosis from histological sections is carried out by an observational protocol performed by an expert who mainly analyses morphological structures in a tissue sample. This clinical procedure requires the use of stains, chemicals, and protocols to reveal morphological structures analyzed by a pathologist. As digital pathology is rapidly developing, it is becoming clear that while slight differences in stainings and chemical protocols may not impact the analysis of a tissue morphology, it is associated with high variance in color, and brightness among others, making emerging technologies such as machine learning challenging.

Infrared microscopy (absorption spectroscopy coupled to a microscope and thermography imaging) is an alternative to classical histopathology methods that allow non-destructive chemical mapping without the need of any staining which does reduce the technical burden. It allows generating a set of images that describe chemical bonding present in tissues composed of DNA, RNA, lipids, and proteins, depending on a specific wavelength. In oral cavity squamous cell carcinoma (OSCC) these infrared images can therefore generate a chemical signature that could help the pathologist to better define resection margins, lymph node invasion, and human papillomavirus status (HPV). They may also help in the diagnosis of precancer lesions of the oral mucosa, also called oral potential malignant disorders (OPMD). Recent research works have shown the promise of the study of chemical components obtained with infrared microscopy using machine learning algorithms. However, the results have been obtained with Infra-Red Fourier Transform microscopes and takes hours to image 1cm² of the tissue slice. This long time is due to both its spatial and spectral scanning. To reduce image acquisition time and improve the spatial resolution, photothermal microscopes or Digital Mid-IR spectrometers have been developed. They have demonstrated their ability to reduce the analysis time (typically 15min per cm² depending on the optical setup) but, they remain bulky and unusable by doctors due to their strong complexity and relative long analysis time |compared to the clinical requirements.

In this context, the proposed PhD thesis will use a new infrared imaging setup that can achieve an infrared image stack in less than 1min. This new tool may be useful in clinical workflow. However, there is still a lack of acquisition protocols for infrared images, as well as the identification of chemical parameters relevant to each disease. Additionally, the lack of databases makes it difficult to implement learning algorithms on infrared images. The PhD will try to handle the two challenges reported above. The work focuses on the diagnosis of head and neck squamous cell carcinoma from which chemical information will be obtained using infrared images from unstained tumor sections. OPMD sections as an early stage of oral tumorigenesis will also be included.

Two main objectives will be pursued:

Objective 1: acquisition of a database of infrared and annotated H&E images of normal tissue vs. tumor tissue, tumor stroma vs. masses of tumor cells, tumor invasion front vs. intratumorally region, epithelial layer of OPMD vs. underlying stroma; normal mucosa vs. hyperplastic vs. variable grade dysplastic in OPMD, dissection lymph node not invaded vs. lymph node metastasis.

Objective 2: identification of relevant chemical parameters for the diagnosis of the disease, identification of the best algorithms for the segmentation and classification of OSCC/OPMD specimens, validation of the obtained results using explainable criteria (accuracy, precision, recall, agreement among others). To answer the double objective, we propose the classical bottom-up approach: i) choice of the suitable tissue samples, ii) infrared image acquisition, iii) annotation of reference H&E images to be used as a baseline for infrared images, iv) and implementation of machine learning algorithms on the infrared images in comparison with H&E images.

Postdoctoral Research Project

Medical Image Segmentation with Fewer Data, MILCOM project

I contributed to the Connect Talent MILCOM project "Multimodal Imaging and Learning for COmputational Medicine" which combines data science and health, and focuses on the application of machine learning to the analysis of multimodal medical image analysis. In the framework of this project, my research addressed the following problems:

Annotation-Efficient Deep Learning for Medical Imaging

Context: Current trends in medical image analysis have shown the effectiveness of Machine Learning (ML) and Deep Learning (DL) in devising computer-aided solutions for a plethora of medical applications and imaging modalities. However, the challenge remains that ML/DL models are data-hungry, requiring large and high quality medical expert annotated datasets. Yet, rarely does one have a perfectly-sized and carefully-labeled dataset with which to train a ML/DL model, particularly in medical imaging where data and annotations are expensive to acquire. In addition, there is never a full consensus among medical expert annotators. Consequently, there is a need for innovative methodologies that enable annotation-efficient deep learning for medical imaging. To address this challenge, our research focuses on: i) How to learn with a limited and incomplete quantity of annotations and ii) How to leverage unannotated data.

Unsupervised Domain Adaptation for Cross-Modality Segmentation

Context: The generalization of deep learning approaches has been an issue in real clinical scenario, where large differences in the collected data characteristics between medical centers and scanners exist. The generalization issue is due to the domain shift problem, that is for example: appearances variations across modalities, inter-patient anatomical structure variations, and different clinical sites with different acquisition parameters. Consequently, there is a need for innovative methodologies that enable generalizable deep learning approaches. Typically, this problem is addressed via unsupervised domain adaptation (UDA) strategies where one assume no labels are available for the target domain. The core idea of UDA is to go through an adaption phase using a non-linear mapping to find a common domain-invariant representation or a latent space Z. The domain shift in Z can be reduced by enforcing the two domains distributions to be closer via a certain loss (e.g. Maximum Mean Discrepancy). Since Z is common to all domains who share the same label space, projected labeled source domain samples can be used to train a segmenter for all domains. To address this challenge, our research focuses on: i) Developing an unsupervised adaptation method that take advantages of a model learned over a source domain dataset (e.g. using MRI images) to address a related problem (e.g. Segmentation task) on target images (e.g. using CT images collected from different clinical sites), for which no annotations are available. ii) Investigating the possibility of building a general segmenter for any organ with minimal task-specific annotations, while still leveraging other tasks information.

Ph.D Research Project

Automatic Analysis of Macro and Micro Facial Expressions : Detection and Recognition via Machine Learning

I was part of the AGPIG team at GIPSA-Lab, (newly ACTIV team). During my Ph.D., my research addresses the problem of automatic analysis of macro and micro facial expressions. Our tasks addresses the problem of micro-expression detection and macro-expression recognition using machine learning and deep learning frameworks. We analyzed facial expressions using images and video sequences. In the framework of this project, my research addressed the following problems:

Macro Facial Expressions for Emotion Recognition

Macro-Facial Expressions (MaEs) occur when the subject agrees to express a given emotion. As a consequence, MaEs are characterized by lasting over a substantial period of time on several regions of the face. They are frequent and involve more conscious control. MaEs can occur spontaneously due to an involuntary manifestation of an emotional state or it can be posed as a result of a deliberate effort of communicating an emotional signal. Posed MaEs induce exaggerated movements and changes in the location and appearance of facial features. On the contrary, spontaneous MaEs are more subtle but have visible facial movements and typically evolve differently over time than posed MaEs. In my dissertation, first, we dedicate our work on MaEs analysis where expressions are categorized into basic (e.g., joy or anger) and non basic (e.g., worried or ashamed) emotions and we put the focus on spontaneous facial expression associated with less constrained environmental conditions. Facial expressions could appear different when presented with extrinsic and intrinsic variations even if performed by the same identity. Such changes within the same identity overwhelm the variations due to identity differences and make Facial Expression Recognition (FER) challenging, mainly in unconstrained conditions. Therefore, the complexity of discerning whether two Facial Expressions (FE) reveal similar or different emotions is shifted to the detection and description of a rich set of discriminative features. Tackling these issues requires extracting visual features that can reflect the essential visual content of the FE images while being robustly invariant to intrinsic and extrinsic factors. We put the focus on finding the best feature representation able to discriminate between FEs no matter the acquisition conditions or the expressions are. Therefore, we design different levels of facial feature appearance-based representations, mainly: low-level, mid-level and hierarchical features.

Transfer Learning

In real-world applications, FE databases are inconsistent between each other. For instance, face images of the same expression may have different appearances in the face images within the same database. On the contrary, different expressions may have a similar appearance in the face images of different subjects from different databases. Such inconsistency is due to varying domains because of different extrinsic and intrinsic factors variability, such as different cameras, illuminations, populations, acquisition setup and participants’ culture background or personality, etc. As a consequence of the foregoing inconsistency between different databases, the performance degrades when we train a FER system on one source domain and we test it on another target domain. The mismatch distribution problem is often referred as domain-shift. In this context, a robust FER model must take special care during the learning process to infer models that adapt well to the test data they are deployed on. Yet, many critical issues associated to the target domain induce the domain-shift problem. Mainly three: 1) the inter-subject-expression variations such as the way to produce an expression are inconsistent across different people, 2) the large variance in face pose, illumination, occlusions, changes in the camera and image resolution, and finally 3) the issues with spontaneous expressions with various intensities. Hence, we study how to adapts facial expression models trained for a particular visual domain, that is posed expression datasets, to a new domain, that is spontaneous expression datasets, by learning a non-linear transformation that minimizes the effect of domain-shift changes in the feature distribution. We also study the problem of Zero Shot Learning (ZSL) for FER recognition, where facial expression classes in the test set are unseen during the training step. We direct our research toward transfer learning, where we aim at adapting facial expression models to new domains and tasks. We studied domain adaptation and zero shot learning for developing a method that solves the two tasks jointly. Our method is suitable for unlabelled target datasets coming from different data distributions than the source domain while sharing the same label space and task, and for unlabelled target datasets with different feature and label distributions than the source domain. To permit knowledge transfer between domains and tasks, we use Euclidean learning and Convolutional Neural Networks to design a non-linear mapping function that maps the visual information coming from facial expressions into a semantic space coming from a Natural Language model that encodes the visual attribute description or uses the label information description. The consistency between the two subspaces is maximized by aligning them using the visual feature distribution. Chapter 4 in my dissertation describe completely the domain adaptation and the zero-shot learning approaches.

Micro Facial Expression Detection

Micro-expressions (MiE) are the cause of either conscious suppression or unconscious repression of expressions when a person experiences an emotion but attempts to mask over the facial deformations. Understanding MiEs helps to identify the deception and the true mental condition of a person. Unlike macro-facial expressions, which typically last for 0.5-4 seconds and thus can be immediately recognized by humans, MiEs generally remain less than 0.2 seconds, as well they are very subtle which makes them difficult to spot and recognize them. In order to improve the capacity of people to identify and recognize MiEs, researchers in psychology made improvements to train specialists using the Micro Expression Training Tools. However, even with these training tools, visual reading of MiEs by experts is only around 45%. Obviously, spotting and recognizing MiEs with a human eye is an extremely difficult task, as there is a need for more descriptive facial feature displacements and motion information. In my dissertation, we propose a process for MiEs spotting, that is identifying their temporal and spatial locations in a video sequence while effectively dealing with parasitic movements. As the duration of an MiE is very short, to capture its speed and subtlety, a high-speed camera is a must during data acquisition. However, the usage of high-speed camera tends to produce parasitic motions and deformations, such as those related to head movements, eye blinks, gaze direction, and mouth opening or closing movements. Those parasitic movements along other facial muscle activations are usually reinforced, in return resulting a confusion with MiEs. As a result, it is essential to eliminate the interferences from unrelated facial MiE information and to emphasize in the meantime on important characteristics of MiEs. On the top of that, to detect a MiE, a method that captures subtle facial motions and subtle local spatial deformations effectively is required. In this sense, our objective was: i) to spot MiE segments (onset-offset frames); ii) to pinpoint their subtle local spatial deformations over facial regions; iii) to effectively deal with parasitic movements by distinguishing motions related to MiEs from other facial events. To achieve our objectives, first we needed to consider the nature of how MiEs are produced. For instance, MiEs tend to be naturally infrequent since people try not to produce them and very specific conditions are required to evoke MiEs. Therefore a small number of data can be collected about MiEs accompanied with parasitic movements and deformations even though the acquisition process is done in a strictly controlled environment. Having at disposal few data regarding single MiE segments, typically about 2 to 20 MiE frames from onset to offset if recorded with a high-speed camera at 60 fps, it is difficult to utilize supervised learning methods toward automating the process of MiE segment detection. As a consequence, we proposed a weakly supervised method where we reformulate the problem of MiEs spotting into a problem of Anomaly Detection. All facial motion and deformations but those caused by MiEs are considered as Natural Facial Behaviour (NFB) events. NFB motions and spatial deformations are learned so that we can detect MiEs motions and deformations in the frame as they are different from the one the model has learned. To this end, by reformulating the problem into anomaly detection, we alleviate the main challenge of dealing with small amount of MiE segments as we deal mainly with NFB events, which are frequent. More importantly, it is more efficient to deal with NFB segments as it is possible to extract for them discriminant spatio-temporal information because their motions and deformations are bigger. Our method is composed of a deep Recurrent Convolutional Auto-Encoder to capture spatial and motion feature changes of natural facial behaviours. Then, a statistical based model for estimating the probability density function of normal facial behaviours while associating a discriminating score to spot micro-expressions is learned based on a Gaussian Mixture Model. Finally, an adaptive thresholding technique for identifying micro expressions from natural facial behaviours is proposed. Chapter 5 in my dissertation describe completely the work on micro facial expression analysis.