PROJECTS

Multi-modal Monitoring of Driver Alertness and Distractions

Distracted and drowsy driving are leading causes of transportation accidents worldwide. The tasks of drowsiness and distraction detection and recognition have been traditionally addressed as computer vision problems. However, such behaviors are not always expressed in a visually observable way. In this project, we introduce a novel multi-modal dataset of distracted driver behaviors, consisting of data collected using twelve information channels coming from visual, acoustic, near-IR, thermal, linguistic and physiological (Blood Volume Pulse (BVP), Respiration, Skin Conductance and Skin Temperature) modalities. The data were collected from 45 subjects while being exposed to four different distractions, three cognitive and one physical. Data collection experiments took place during different times of the day to account for a greater drowsiness variability in our participants. Through the analysis of the data, we explore a variety of spatio-temporal  machine learning techniques ranging from traditional machine learning pipelines to more novel, state-of-the-art deep learning architectures. We identify modality specific and multi-modal supervised and unsupervised features and we evaluate their ability on characterizing the two different conditions within and across subjects. This project aims to explore how drowsy and distracted states overlap in drivers and how they can affect driver performance and safety. Our goal is to design personalized and adjustable AI models that learn through interaction and fit each individuals needs in order to increase road safety and offer just-in-time interventions. This project is in collaboration with the Toyota Research Institute.

Related Publications: [1], [2], [3], [4],[5]

CogBeacon, is a multi-modal dataset designed to target the effects of cognitive fatigue in human performance. The dataset consists of 76 sessions collected from 19 male and female users performing different versions of a cognitive task inspired by the principles of the Wisconsin Card Sorting Test (WCST), a popular cognitive test in experimental and clinical psychology designed to assess cognitive flexibility, reasoning, and specific aspects of cognitive functioning. During each session, we record and fully annotate user EEG functionality, facial keypoints, real-time self-reports on cognitive fatigue, as well as detailed information of the performance metrics achieved during the cognitive task (success rate, response time, number of errors, etc.). Along with the dataset we provide free access to the CogBeacon data-collection software to provide a standardized mechanism to the community for collecting and annotating physiological and behavioral data for cognitive fatigue analysis. Our goal is to provide other researchers with the tools to expand or modify the functionalities of the CogBeacon data-collection framework in a hardware-independent way. As a proof of concept we show some preliminary machine learning-based experiments on cognitive fatigue detection using the EEG information and the subjective user reports as ground truth. Our experiments highlight the meaningfulness of the current dataset, and encourage our efforts towards expanding the CogBeacon platform. To our knowledge, this is the first multi-modal dataset specifically designed to assess cognitive fatigue and the only free software available to allow experiment reproducibility for multi-modal cognitive fatigue analysis

Related Publications: [1]

Artificial Intelligence has probably been the most rapidly evolving field of science during the last decade. Its numerous real-life applications have radically altered the way we experience daily-living with great impact in some of the most basic aspects of human lives including but not limited to health and well-being, communication and interaction, education, driving, daily, and entertainment. Human-Computer Interaction (HCI) is the field of Computer Science lying in the epicenter of this evolution and is responsible for transforming rudimentary research findings and theoretical principles into intuitive tools, responsible for enhancing human performance, increasing productivity and ensuring safety. Two of the core questions that HCI research tries to address relate to a) what does user want? and b) what can the user do? Multi-modal user monitoring has shown great potential towards answering those questions. Modeling and tracking different parameters of user's behavior has provided groundbreaking solutions in several fields such as smart rehabilitation, smart driving, and workplace-safety. Two of the dominant modalities that have been extensively deployed for such systems are speech and vision-based approaches with a special focus on activity and emotion recognition. Despite the great amount of research that has been done in these domains, there are numerous other implicit and explicit types of user-feedback produced during an HCI scenario, that are very informative but have attracted very limited research interest. This is usually due to the great levels of inherent noise that such signals tend to carry, or due to the highly invasive equipment that is required to capture this kind of information. These factors make most real-life applications almost impossible to implement. This research aims to investigate the potentials of multi-modal user monitoring towards designing personalized scenarios and interactive interfaces that focus on two different research axis. Firstly we explore the advantages of reusing existing knowledge across different information domains, application areas, and individual users in an effort to create predictive models that can expand their functionalities between distinct HCI scenarios. Secondly, we try to enhance multi-modal interaction by accessing information that stems from more sophisticated and less explored sources such as Electroencephalogram (EEG) and Electromyogram (EMG) analysis using minimally invasive sensors. We achieve this by designing a series of end-to-end experiments (from data collection to analysis and application) and by performing an extensive evaluation on various Machine Learning (ML) and Deep-Learning (DL) approaches on their ability to model diverge signals of interaction. As an outcome of this in-depth investigation and experimentation, we propose CogBeacon. A multi-modal dataset and data-collection platform, to our knowledge the first of its kind, towards predicting events of cognitive fatigue and understanding its impact on human performance

The identification of cognitive impairments in early childhood provides the best opportunity for successful remedial intervention, because brain plasticity diminishes with age. Attention deficit hyperactivity disorder (ADHD) is a psychiatric neurodevelopmental disorder that is very hard to diagnose or tell apart from other disorders. Symptoms include inattention, hyperactivity, or acting impulsively, all of which often result in poor performance in school and persist later in life. In this project, an interdisciplinary team of computer and neurocognitive scientists will develop and implement transformative computational approaches to evaluate the cognitive profiles of young children and to address these issues. The project will take advantage of both physical and computer based exercises already in place in 300 schools in the United States and involving thousands of children, many of whom have been diagnosed with ADHD or other learning disabilities. Project outcomes will have important implications for a child's success in school, self-image, and future employment and community functioning. The main goal of this project is to discover new knowledge about the role of physical exercise in cognitive training, including correlations between individual metrics and degree of improvement over time. They will identify important new metrics and correlations currently unknown to cognitive scientists, which will have broad impact on other application domains as well. And the PIs will develop an interdisciplinary course on computational cognitive science and one on user interfaces for neurocognitive experts.

Related Publications: [1], [2], [3]

Demographic and epidemiologic transitions have brought a new health care paradigm with the presence of both, growing elderly population and chronic diseases. Life expectancy is increasing as well as the need for long-term care. Institutional care for the aged population faces economical struggles with low staffing ratios and consequent quality problems.

Although the aforementioned implications of ageing impose societal challenges, at the same time new opportunities arise for the European citizens, the healthcare systems as well as the industry and the European market. Two of the most important aspects of assistive environments and independent living are user acceptance and unobtrusiveness. Mostly explored in a smart home setup and the unobtrusive installation of audio-visual monitoring equipment, the consensus is that users accept monitoring if they are not constantly aware of its presence. A more recent trend is home assistant robots. These two lines of development have for the most part ran without heavily interacting with each other and, even more so, without developing integrated solutions that combine smart home automation with robotics. In RADIO, we will develop an integrated smart home/assistant robot system, with the objective of pursuing a novel approach to acceptance and unobtrusiveness: a system where sensing equipment is not discrete but an obvious and accepted part of the user’s daily life. By using the integrated smart home/assistant robot system as the sensing equipment for health monitoring, we mask the functionality of the sensors rather than the sensors themselves. In this manner, sensors do not need to be discrete and distant or masked and cumbersome to install; they do however need to be perceived as a natural component of the smart home/assistant robot functionalities.

Related Publications: [1]


Recent brand monitoring technologies rely mostly on the textual aspect of content to derive the under-lying public sentiment with respect to a brand, essentially ignoring the wealth of - constantly increasing - visual content. SentIMAGi aims to create a powerful brand monitoring and reputation management framework, exploiting multi-modal sentiment analysis methods and summariza-tion. The goal of the framework is to provide an efficient but complete view of the public senti-ment towards different aspects of a brand. We will break the text-only barrier by fusing the in-formation conveyed via textual and visual content under a unified analysis methodology. SentI-MAGi will also apply summarization and text mining techniques to provide efficient yet complete reports through intuitive visualizations. The SentIMAGi framework  allows the user to describe an information need. Then the system  gathers related multi-modal content, analyze it and will visualize the results in an actionable manner, forming a useful decision support toolkit. SentIMAGi is implemented in the context of the Greece-Israel Research Co-operation Program.

Related Publications: [1]

Fun, Dynamic, Multimodal Robot Learning with I Spy and 20 Questions, IRSS, June-July 2014

Can we minimize the typical tedium of training robots by naturally integrating robot learning into conversational interactions with humans? Specifically, can robots engage humans in interactive games such as ISpy and 20 Questions, which can be naturally multimodal, in a way that assists the robot in speech recognition, language learning and object recognition? The project  focuses on creating a game combining I Spy and 20 Questions that when played jointly by a human and a robot enables the robot to improve both their language and vision performance.

Related Publications: [1], [2], [3], [4]

Diploma Thesis: Continuous Language Model Adaptation, Technical University of Crete, January-September 2013

Statistical Language Models (LMs) are widely used in many applications such as speech recognition systems, automatic translation systems etc. However every statistical model is based to the domain of the training data and thus it can not perform well when tested in out-of-domain data. Moreover collecting and processing data in order to train a new statistical model is always a time consuming and expensive procedure given the large amounts required. Hence adaptation techniques have been developed in order to adapt an existing LM to a new domain, using significantly smaller amount of data. N-grams, which is the dominant technology for language modeling, are very difficult to be adapted due to the large amounts of parameters. Thus LMs in Continuous Space have been implemented in order to make language models more robust and easier to be adapted. This study is an initial approach to continuous LM adaptation. We take advantage of some widely used algorithms from the field of speech recognition and we try to adapt an initial LM ,trained in a corpus from Wall Street Journal, with data from Air Travel Information System. We examined different approaches and techniques and came up to some useful conclusions which can feed many future works.