Research Areas & Projects

Opening in our group:

Open PhD positions:

My Research Areas:

 Computer Vision

        - Civil Infrastructure Inspection, Analysis and Monitoring,

        - Human Behavior Analysis,

        - Biomedical Image Analysis,

        - Subspace Modeling,

        - Activity Analysis,

        - Feature Extraction.

        - Biometrics.

 Machine Learning 

       - Deep Learning, CNN, RNN, Attention networks,

       - Generative and Adversarial Models,

       - Semi-supervised and Unsupervised Learning,

       - Invariance Models,

       - Manifold Learning.

Selected Research Projects (Active)

1. Civil Structure Component Based Recognition, their Damage Detection and Analysis

         - Damage Analysis.

         - Structural Component Recognition.

         - Object Detection and Semantic Segmentation.

2. Non-intrusive Behavioral Analysis

         - Learning with partially labelled data.

         - Modelling spatial and temporal information.

         - Characterizing non-verbal communication skills of trainee doctors.

3. Face Analysis

    (a) Cognitive Face Recognition & Memorability.

    (b) Face Identification and Verification in Wearable devices.

    (c) Smile classification using micro and global features.

4. Characterization of Social Interactions

    (a) Multi-sensor Self-Quantification of Presentations and

    (b) Recovering Social Interaction Spatial Structure from Multiple First-Person Views.

5. Egocentric Activity Recognition

6. Non-intrusive Driver Fatigue Detection and Estimation

7. Person Re-identification Using Wearable Devices.


Past Research Projects (Here)

Below are the Details of Active Research Projects

1. Crowd Analysis: Project MONICA (Management of Networked IoT Wearables - Very Large Scale Demonstration of Cultural Societal Applications)

(a) Holistic and Object Based Approaches

Classification of crowd behaviour depends on the perspective from which we observe the crowd. It could be a single entity or a bunch of independent individuals. We are planning to develop both holistic as well as component based methodologies to approach this problem. 

Read More ...


2. Face Analysis

(a) Cognitive Face Recognition & Memorability

Faces provide important cues in social interactions. In this work, psychophysics experiments are conducted to investigate how and what humans learn from faces to perform face recognition in natural dynamic face videos. We are trying to extract critical visual information for invariant face recognition by studying how humans fixate (eye gaze scan path) and where they fixate (frequently-fixated facial features). The face recognition performance may deteriorate if they have not paid enough attention to encode the cues in certain regions of the face. Our study reveals that these regions are needed for accurate recognition, whereas the eye gaze pattern for the cases of failures may not exist. A new natural dynamic faces database would be contributed so as to advance the field of face recognition and perception. This would have potential translation into computational models in machines to emulate the competence of human recognition performance.

Read More ...

(b) Face Identification and Verification in Wearable devices

Our research work addresses the problems of discriminant analysis using within-class and within-subclass scatter matrices for FR. Each class is divided into subclasses using spatial partition trees so as to approximate the underlying distribution with mixture of Gaussians and perform whole space subclass discriminant analysis among these subclasses. This work proposes a regularization methodology that enables discriminant analysis in the whole eigenspace of the within-subclass scatter matrix. Low dimensional face discriminative features are extracted after performing discriminant evaluation in the entire eigenspace of within-subclass scatter matrix. Experimental results on popular databases show the superiority of our proposed approach on all three databases.

Read More ...

(c) Smile Classification

Smile is an irrefutable expression that shows the physical state of the mind in both true and deceptive ways. Generally, it shows happy state of the mind, however, `smiles' can be deceptive, for example people can give a smile when they feel happy and sometimes they might also give a smile (in a different way) when they feel pity for others. This work aims to distinguish spontaneous (felt) smile expressions from posed (deliberate) smiles by extracting and analyzing both global (macro) motion of the face and subtle (micro) changes in the facial expression features through both tracking a series of facial fiducial markers as well as using dense optical flow. Specifically the eyes and lips features are captured and used for analysis. It aims to automatically classify all smiles into either `spontaneous' or `posed' categories, by using various classifiers. Experimental results on large UvA-NEMO smile database show promising results as compared to other relevant methods.

Read more ...


3. Characterization of Social Interactions

(a) Multi-sensor Self-Quantification of Presentations

Presentations have been an effective means of delivering information to groups for ages. Over the past few decades, technological advancements have revolutionized the way humans deliver presentations. Despite that, the quality of presentations can be varied and affected by a variety of reasons. Conventional presentation evaluation usually requires painstaking manual analysis by experts. Although the expert feedback can definitely assist users in improving their presentation skills, manual evaluation suffers from high cost and is often not accessible to most people.

In this work, we proposed a novel multi-sensor self-quantification framework for presentations. Utilizing conventional ambient sensors (i.e., static cameras, Kinect sensor) and the emerging wearable egocentric sensors (i.e., Google Glass), we first analyze the efficacy of each type of sensor with various nonverbal assessment rubrics, which is followed by our proposed multi-sensor presentation analytics framework.We set up our experiment environment in a meeting room with a Kinect sensor (denoted as AM-K) and two static RGB cameras with microphone (denoted as AM-S 1 and AM-S 2). AM-S1 and AM-S 2 capture the speaker and audience from two di.fferent locations, whereas AM-K is confi.gured to capture the behavior of the speaker with both RGB and depth channel. For each presentation, three Google Glasses are deployed, where one is worn by the speaker (denoted as WS-S) and the remaining two are worn by two randomly chosen audience members (denoted as WS-A 1 and WS-A 2). The overview of the sensor confi.guration and the approximate spatial location of the speaker and the audience are shown in above Figure.

The proposed framework is evaluated on a new presentation dataset, namely NUS Multi-Sensor Presentation (NUSMSP) dataset, which consists of 51 presentations covering a diverse set of topics. The dataset was recorded with ambient static cameras, Kinect sensor, and Google Glass. In addition to multi-sensor analytics, we have conducted a user study with the speakers to verify the effectiveness of our system generated analytics, which has received positive and promising feedback.

Read more ...


(b) Recovering Social Interaction Spatial Structure from Multiple First-Person Views

In a typical multi-person social interaction, spatial information plays an important role in analyzing the structure of the social interaction. Previous studies, which analyze spatial structure of the social interaction using one or more third-person view cameras, suffer from the occlusion problem. With the increasing popularity of wearable computing devices, we are now able to obtain natural first-person observations with limited occlusion. However, such observations have a limited field of view, and can only capture a portion of the social interaction. To overcome the aforementioned limitation, we propose a search-based structure recovery method in a small group conversational social interaction scenario to reconstruct the social interaction structure from multiple first-person views, where each of them contributes to the multifaceted understanding of the social interaction, as shown in the above Figure. We first map each first-person view to a local coordinate system, then a set of constraints and spatial relationships are extracted from these local coordinate systems. Finally, the human spatial configuration is searched under the constraints to "best" match the extracted relationships. The proposed method is much simpler than full 3D reconstruction, and suffices for capturing the social interaction spatial structure. Experiments for both simulated and real-world data show the efficacy of the proposed method.

Read more ...


4. Egocentric Activity Recognition

With the increasing availability of wearable devices, research on egocentric activity recognition has received much attention recently. In this work, we build a Multimodal Egocentric Activity dataset (as shown in the above Figure (a)), which includes egocentric videos and sensor data of 20 fine-grained and diverse activity categories (list in the above Figure (b)). We present a novel strategy to extract temporal trajectory-like features from sensor data. We propose to apply the Fisher Kernel framework to fuse video and temporal enhanced sensor features. This work proposed a new multimodal multi-stream deep learning framework to recognize egocentric activities. In our proposed framework, multi-stream ConvNets and multi-stream long short term memory (LSTM) architectures are utilized to learn discriminative spatial and temporal features from the egocentric video and sensor data respectively. Two different fusion techniques are evaluated on two different levels. Our comparison with state-of-the-art results shows that our proposed method achieves very encouraging performance, despite the fact that we have only very limited egocentric activity samples for training. Future work investigates different data argumentation approaches to improve the networks. Experiment results show that with careful design of feature extraction and fusion algorithm, sensor data can enhance information-rich video data. We make publicly available the Multimodal Egocentric Activity dataset to facilitate future research. 

Read more ...


5. Non-intrusive Driver Fatigue Detection and Estimation

Fatigue, drowsiness and sleepiness are often used synonymously. It is often very difficult for the researchers to define these states as they are affected by multidimensional human factors that construct the driver's muscle-motor activities. Fatigue is defined as a global reduction in physical or mental arousal that results in a performance deficit and a reduced capacity of performing a task. On the other hand, sleepiness or drowsiness is defined as the physiological drive to sleep. However, drowsiness may happen as the stage prior to the sleepiness. The cause of fatigue can be diverse, from physical, psychological, perceptual factors and boredom to laziness. Whereas the reasons for sleepiness could be time duration of being awake, time of the day and hours slept for last 24 hours, etc. These (fatigue, drowsiness and sleepiness) states are the important contributing factors in thousands of crashes, injuries and fatal accidents happening everyday around the world.

In this work, we have proposed a vision-based method and system for bus driver monitoring. Our approach starts with detection of head-shoulder figure in the image, followed by face and eye detections and eye feature recognition. Finally, a multi-model fusion scheme is designed to infer three states of driver's attentions, e.g. normal, drowsy and sleepy (as shown in the above Figure (a)-(d)). Experimental results show that our proposed method can easily distinguish the drowsy and sleepy states from the normal state of mind. Hence, our designed scheme is useful for monitoring and alerting bus diver's attention. Extensive experiments on simulated driving videos show the superiority of our proposed approach as compared to the baseline method.

Read more ...


6. Person Re-identification Using Wearable Devices

The rise of wearable devices has led to many new ways of re-identifying an individual. Unlike static cameras, where the views are often restricted or zoomed out and occlusions are common scenarios, first-person-views (FPVs) or ego-centric views see people closely and mostly get unoccluded face images. In this paper, we propose a face re-identification framework designed for a network of multiple wearable devices. This framework utilizes a global data association method termed as Network Consistent Reidentification (NCR) that not only helps in maintaining consistency in association results across the network, but also improves the pair-wise face re-identification accuracy. To test the proposed pipeline, we collected a database of FPV videos of 72 persons using multiple wearable devices (such as Google Glasses) in a multi-storied office environment. Experimental results indicate that NCR is able to consistently achieve large performance gains when compared to the state-of-the-art methodologies.

Read more ...


Note to all visitors:

There are large number of Scholarships available for local and overseas international students from Keele University, UK, A*STAR, Singapore and Kingston University London, UK. Please refer to the following website for details of how to apply: 

[1] Scholarship at Keele University, UK.

[2] PhD Scholarship at Kingston University, London, UK.

[3] Scholarships & Attachments (https://www.a-star.edu.sg/Scholarships/overview)

[4] A*STAR Research Attachment Programme (ARAP), (https://www.a-star.edu.sg/Scholarships/for-graduate-studies/a-star-research-attachment-programme-(arap))

[5] External Funding Bodies.

Only short-listed candidates would be notified.


Anybody interested in the research areas or projects or any other collaborations please contact me at:

   Dr. Bappaditya Mandal

   Lecturer in Computer Science,

   School of Computing and Mathematics,

   Faculty of Natural Sciences,

   Keele University,

   Colin Reeves Building Room: CR36,

   Staffordshire ST5 5BG, United Kingdom,

   Tel: +44 (0)1782-7-33076, Fax: +44-1782-734268.

   Email1: b.mandal AT keele.ac.uk

   Email2: bappadityamandal AT gmail.com


Copyright Notice: Please note that all the necessary copyrights are retained by authors and/or other copyright owners. In this web-site, the ideas/papers are provided only as a means to disseminate research work in a timely manner. Any commercial usage of the ideas/papers presented here without the written permission of authors and/or other copyright owners is considered a violation of copyrights.