Projects

VCOACH: Virtual Coaching for Indoors and Outdoors Sporting

PI of project funded by ITIDA (Information Technology Industry Development Agency). The budget is 1,497,200 EGP running over one and half years starting July 2021.

Abstract

Human-centered computing is an emerging research and application area that aims at understanding the human behavior and integrating users and social context with the digital technology. In the context of athletics and sporting, inertial sensors are made up of accelerometers to measure force and acceleration, a gyroscope to give an indication of rotation (angular velocity and angular displacement), and a magnetometer to measure body orientation. These sensors collect data across three axes each and can capture athlete’s movement in minute detail. For example, they can be used to differentiate between a jump to the right or jump to the left, a walk or a run, a walk on sand, solid land, or grass, etc. One of the most popular uses of inertial sensors is in trying to quantify athlete readiness and fatigue in the field. Wearables, especially IMUs can be used to detect changes in acceleration or direction of acceleration which may change with injury and/or fatigue. Each person has an individual motion signature, so coaches, trainers, and sports scientists can compare an athlete to his/her normal self. IMUs can provide good insights into the exercising and/or playing performance and can aid in providing assessment to athletes recovering from injury. In addition, they are currently being embedded in almost all commodity wearable and mobile devices such as smart watches, smart phones, wrist bands, shoes, cloths, etc. With the COVID-19 situation, gyms among other public places were forced to close their doors in order to control the spread of the global pandemic. Moreover, people were advised to stay at home and in some places, and fined if found outside. Hence, many people’s mental and physical state started to deteriorate. As a result, more and more workout mobile apps started to gain an increasing popularity. Accordingly there has been a growing interest in performing exercising and sports indoors with the assistance of smart digital technology for tracking and assessment of such exercising.

The overall objective of this project is to develop a smart virtual assistant coach for indoor sporting, exercising, and workout. In other words, we aim at quantifying the human movement for the purpose of providing in-home automated coaching. The system is to first model the different modes of exercise/workout performance, and then monitor, track, and provide a real-time continuous feedback to the user while performing his indoor exercising. We also plan to incorporate a preliminary prototype of smart virtual coaching for outdoor sporting. We choose tennis playing as the IP of this project is an expert tennis player. Once we understand and generate a plausible virtual coach for a subset of tennis training actions, this can be extended to the whole of tennis training and performance and consequently to any sporting activity such as soccer, fencing, basketball, etc. We will base our analysis and feedback control on the use of two data modalities: inertial motion streaming from wearable devices as well as visual data streamed from a video camera, though the former plays a more major role as it is more intrinsic and intimate to capture the motion characteristics. The visual streaming will be provided through two means: (1) the recent state-of-the-art Google ML Kit Pose Detection that provides an accurate API for pose detection and tracking the distance between joints; the inputs are assumed normal RGB images which can be captured and streamed by smart phones which is the main advantage of this method and (2) through the use of depth Kinect V2 camera, this provides much more valuable information, however, it needs a specialized device that not many people have. We will balance the pros and cons of each to reach a final decision. Such system in general allows athletes to reduce muscle injury or imbalance and increase the effectiveness of training exercises.

The above functionality will be implemented on a mobile device, such as a smartphone, as well as on a lap- top/notebook computing environment. We plan to target both the Android and iOS users on the mobile platform and the Windows/Mac OS users on the laptop computing environment. Though initially we will focus more on the Android and Windows users as they are more open platforms and include more users.

This prototype phase of the project is already a sequel to a PRP phase: “PRP 2019.R26.1: A Robust Wearable Activity Recognition System based on IMU Signals”. During the PRP phase of the project we have done extensive work on inertial time-series data streamed from different types of IMU sensors mounted on different parts of the human body. This whole set of work has resulted in the publications of more than 10 articles in reputable conference and journal venues in addition to about 5 articles currently being under review or preparation for submission.

Finally, the deliverables of the project are as follows:

  1. Datasets of inertial motion signals as well as video stream signals of:

    • An indoor workout routine/program such as the 7-minutes program.

    • Individual workout exercises such as squats and push-ups.

    • Certain particular actions/exercises (such as serving) of tennis playing.

  2. A software module for action recognition that is able to recognize the current exercise/action and the switching times between consecutive exercises/actions.

  3. A virtual assistant coach for indoor workout routines and individual workout exercises. The module should be able to give feedbacks in addition to early warnings of injury, especially for elderly people and people in rehabilitation (such as deeper than enough squats, measured by the knee angle).

  4. A mobile app that implements and packages all of the above functionalities in addition to an app running on a laptop/notebook.

  5. Scientific articles and patents submitted to reputable publication venues.

Automatic Video Surveillance System for Crowd Scenes

PI of project funded by STDF (Science and Technology Development Fund). The budget is 1,330,160 EGP running over three years starting June 2020.

Abstract

The overall objective of this project is to design and develop a smart camera-based surveillance system for crowded places, e.g., schools, universities, metro stations, train stations, airports, shopping malls, general public events, stadiums, etc. The system will work in two phases: an offline phase for discovering long-term typical patterns and collective behaviors as well as an online phase for real-time automatic analysis of video feed captured by fixed surveillance cameras. The offline analysis aims at developing a model for understanding the behavior of the crowd in the underlying scene in order to serve several main purposes: planning assistance, crowd management, prediction and/or detection of potential hazardous situations, and anomaly and threat detection. To serve these purposes the system should be able to automatically discover semantic regions both static and dynamic, in the scene layout. These include discovering typical motion pathways, learn typical and safe motion patterns (i.e. directions, intensities, and speeds) exhibited by the crowd, discover periodic motion patterns, discover locations of interest in the scene (manifested by crowd slowing down and/or stopping), discovering typical locations where stationary groups are formed, discover the terminal points of the scene such as typical entrances and exits, an estimate count of the number of people (or general moving objects) at different regions of the scene, etc. These are all crucial information for planning and management of crowds and crowd scenes. To support threat detection, the system will continuously, in real-time, analyze the crowd behavior in order to detect any deviation from the typical (and safe/secure) patterns, which could indicate a potential threat. The typical behavior is formulated in terms of normalcy models that are constructed apriori by the offline analysis phase. For example, the system can detect objects (people or vehicles) moving in the wrong direction, which could indicate a potential hazard, or objects suddenly slowing down or stopping, which could indicate a potentially dangerous congestion, or suspicious appearance or dynamics that could indicate potential criminal and/or terrorist act. We have already done some preliminary studies, analyzing in particular the Grand Central Station in New York City which is one of the largest transportation hubs in the world. Our techniques varied between clustering based on non-parametric stochastic methods and the use of Long Short-Term Memory (LSTM) recurrent networks to model the scene local dynamics. In both techniques we use dynamic features in the form of short trajectories, called tracklets; we have not yet considered appearance-based features. Our work has already resulted in the publication of several articles in reputable conferences and journals. Furthermore, we will use the gait modality for surveillance and people identification. Compared with other modes of biometrics, gait has many advantages: (1) it is one of the common daily activities that can be unobtrusively observed, (2) it can be perceived even at a long distance from a camera, (3) gait is an unconscious behavior, and (4) it does not require the subject’s cooperation. Gait-based analysis and modeling can be directly used for many applications such as surveillance, forensics, and criminal investigations. We have established research collaboration with the topmost gait research lab in the world, which is Yagi’s lab in Osaka University in Japan through several visits and joint publications with the PI of this project and several of his students. Gait analysis will also be used for other purposes including the anonymous determination of people gender, age (or age group), ethnicity, etc. Such information can be of interest for product marketing research or automatic age-based access control to a specific area. Our experimentation and validation will be performed on typical benchmark datasets available on the web in addition to datasets collected in the premises EJUST and Alexandria Univ.

A Robust Wearable Activity Recognition System based on IMU Signals

PI of project funded by ITIDA (Information Technology Industry Development Agency). The budget is 240,000 EGP running from November 2019 for one year.

Abstract

Recognition of human activities has been a long-running research domain, which has received increasing attention over the past few years. Human activity recognition (HAR) systems aim at determining the ongoing activities of a person, a group of persons, or even the crowd based on sensory observation data, as well as some knowledge about the context within which the observed activities take place. In many cases, an activity is required to be recognized regardless of the environment in which it is performed or the performing person. HAR systems can be classified based on the type of sensory information used, as the kind of sensory data greatly affects the kinds of features, algorithms, and intelligent architectures used for analysis. Generally, we can identify the following streams of research and developments in HAR systems: (1) HAR systems based on visual data, (2) HAR systems based on motion sensors such as IMU (Inertial Measurement Unit), and (3) HAR systems based on the received signal strength RSS from commodity routers installed in the surrounding environment.

With pros and cons of each we have chosen to work on activity recognition based on IMU sensing for the following reasons: (1) with the advancements of nanotechnology, IMU units have become increasingly cheaper, reliable, and ubiquitous; it has become a kind of routine that such units are embedded on every mobile and wearable device, (2) the wide availability of personal mobile and wearable devices; they are virtually available everywhere with the user unlike visual data and wifi data, (3) accordingly they are natural and non-obtrusive, and (4) unlike cameras, they are available in private places, hence very compliant with privacy issues and concerns. HAR is especially beneficial in a number of scenarios, such as health monitoring in medical and therapeutic settings, where the manner in which an action is performed or the conformance to some treatment or living regime may be relevant to treatment or health outcomes. On the same track it is an enabling technology for in-home safety and health monitoring, especially for the elderly, the disabled, and the sick people. Another possible application of this is in the domain of intelligent environments or smart spaces. Activity recognition broadly enables the space itself to be contextually aware of the user’s activities, such that it can adapt itself accordingly for the maximum utility/comfort/safety of the occupant.

Current scientific and technological progress has caused dramatic societal changes. Intensive research and development in the field of Active and Assisted Living (AAL) focus on mastering one of the consequences of this change: the increasing need of care and support for the elderly. The goal of AAL systems is to provide appropriate unobtrusive technical support enabling people to live as independent as possible for as long as possible in their homes. In order to achieve such objective an AAL system needs to know about the user’s behavior; that is, it depends on powerful HAR systems for obtaining, collecting, compiling, and analyzing such knowledge. Tele-immersion (TI) systems are designed and developed to enable users in geographically distributed sites to collaborate in real-time in a shared simulated environment. Such system make good use of HAR systems to track and simulate human behaviors in a virtual environment in order to build attractive game interfaces or to enhance the existing communication methods.

The overall objective of this project is to build a robust and accurate human activity system based on IMU sensors onboard wearable devices. We conjecture that the accelerometer and Gyroscope rotation velocity signals are complete in the sense that these signals can uniquely (up to a certain small error margin) identify the vast majority of activities that are of interest. One goal of our project is to verify this claim, or at least identify the activities for which this claim hold. Wearable devices include smart watches, smart phones, wrist bands, etc. The proposed system must achieve the following characteristics: (1) it is accurate (enough) to be effective and usable, (2) computationally feasible as an online version will run on a mobile personal device (smart phone), (3) it is robust in the sense that its performance (predictive accuracy, etc.) remains fixed (up to a certain some error margin), and invariant to changes in the hardware and software characteristics of the sensing wearable device as well as to the spatial location and other variations; this means that the system should be calibration-free, (4) it is able to effectively and continuously in real-time recognize the current activity of the user in addition to the switching points between different activities, (5) able to do an offline analysis for statistical analysis of the user’s activity pattern over some extended period of time, (6) able to offline detect abnormality of change in the user’s typical behavioral pattern, overall and in specific activities, (7) able to detect an online anomaly such as falling, and (7) detect and recognize joint activities among the users and one or more companion.

Generally, when deployed on large scale, e.g., on multiple devices, the performance of HAR systems is often significantly degraded. This is mainly due to the severe variation that exist between the training phase during the manufacturing process and the testing phase during the actual operation of the system. We will tackle this problem using transfer learning, in particular, the use of deep methods in order to pre-train the system using diverse set of data and activities in order to learn an abstract set of features that characterize the different activity signals in a way that is invariant to the heterogeneities that are inherent in the deployed environment. Instead of targeting a specific application, in this project we provide a kind of modules and set of enabling technologies that need further minimal adjustment and customization for any specific application. As mentioned above there are currently a wide range of hot applications in the areas of health sector, elderly care, smart homes, smart environments, etc. The deliverables of the project are as follows:

  1. Activity datasets based on wearable IMUs collected at our own lab. These include the following:

  • Datasets of single person’s activities collected with different people, genders, ages, hardware, software, and with devices mounted on different parts of the body.

  • Datasets of single person’s stream of activities collected with diversity of configurations as mentioned in the preceding point.

  • Datasets of multiple persons joint activities collected with diversity of configurations as mentioned in the preceding point.

  1. An offline analytics tool which analyze the behavioral pattern of the user over some extended period of time. The resolution of such analysis spans two dimension: the temporal and the spatial. The temporal controls the time segment over which the analysis is performed such as daily activity, weekly activity, etc. The spatial dimension controls the behavior sparsity, whether we look at specific action, or a class of actions such as sports or feeding activities, or general pattern of activity that is indifferent to the activity label.

  2. An online activity predictive system that runs on a smart device (particularly, smart phone). The system should be able in real-time recognition the current action (or joint action) of the user, and detect the switching point between different actions.

  3. An offline and online anomaly detection system. The online one should be able to detect anomalies such as falling, unknown risky action, etc. The offline mode should be able to detect changes in the typical behavior pattern of the user and might compare with preset parameters that indicate safe healthy performance.

  4. Scientific articles and patents submitted to reputable publication venues.

The dataset collected will be published publicly in any of the online repositories. All software, at least at the prototype stage, will be written in Python and R.

The activities and data collection (of EJUST-ADL-1 dataset) using Apple smart watch.

A data sample of 3D acceleration, 3D rotation, and 3D angular velocity signals of the activity ‘walk’.

Automatic Crowd Scene Analysis and Anomaly Detection From Video Surveillance Cameras

PI of project funded by ITIDA (Information Technology Industry Development Agency). The budget is 130,000 EGP running from April 2016 to April 2017.

Abstract

The long-term goal of this project is to design and develop a smart camera-based surveillance system for crowded places, e.g. airports, metro stations, and shopping malls. The target system will be capable of automatically analyzing the video data captured by fixed surveillance cameras, and autonomously modeling and understanding the behavior of the crowd in the scene in order to serve two main purposes: planning assistance, and threat detection. To assist planning for crowded places, the system will automatically detect common pathways and learn typical motion patterns (i.e. directions and speeds) exhibited by the crowd, which is crucial information for future planning. To support threat detection, the system will continuously analyze the crowd behavior to detect any deviation from the typical behavior, which could indicate a potential threat. For example, the system can detect objects (people or vehicles) moving in the wrong direction, which could indicate a potential hazard, or objects suddenly slowing down or stopping, which could indicate a potentially dangerous congestion. Conventional video surveillance techniques are typically based on object detection and tracking. In crowded scenes, such techniques may fail both on the accuracy and computational levels. In contrast, the proposed framework performs the desired analysis without assuming specific types of objects in the scene. It relies on modeling the local dynamics in different areas of the scene, and grouping similar dynamics in order to infer the higher level semantics of crowd behavior.

The underlying crowd analysis framework in the proposed surveillance system can serve many application domains in addition to the security domain, emphasized above. These include, for example, crowd management, especially for frequent and popular events such as football matches, concert events, public demonstrations, etc. The main objective in this domain is to avoid crowd disasters and ensure public safety. This involves assisting the crowd as a group through posting guiding instructions on public screens, for example, or assisting individuals in the crowd through their mobile phones, for example. Another application domain is guiding the design of public spaces (train stations, airports, theaters, governmental service buildings, etc.) from the perspectives of safety, efficiency, effectiveness, comfort, etc; relying on the information provided by the proposed framework. In the entertainment domain, deep understanding of crowd emergent behaviors can lead to more accurate simulation of the dynamics of crowds in different contexts. This can be used in video synthesis, computer games, special effects, architectural visualization, etc. In conclusion, the technologies to be developed under the proposed project are enabling for a variety of application domains in the ICT industry. In particular, the deliverable items of the proposed project constitute algorithms and prototype codes for the following:

  • Offline Phase: given a video recorded for a specific crowded area for a period of time, the system analyzes the scene and extracts the following information (This information directly supports the planning assistance purpose of the system, and through which models are built for the threat detection purpose too):

    • The entrances and exits, which are areas of the scene at which new objects emerge or existing objects disappear, respectively

    • The common pathways taken by moving objects.

    • A normalcy model for each common pathway, which captures the statistics of the typical directions and speeds of motions for objects moving through it.

  • Online Phase: Given an online video stream for the analyzed scene from the same camera used in the offline phase, the system produces the following:

    • An alarm for a possible abnormal activity or a hazardous event, which is marked by an abrupt deviation from the learned normal motion pattern.

    • An alarm for a possible emerging congestion situation, which is marked by consistent global motion slow down with respect to the learned normal pattern.

Focus and Scope

The focus of the proposed project will be on crowd scenes from public places that are video taped using conventional surveillance cameras. In such scenes, many types of moving objects are expected to appear, e.g. people, vehicles, pets, and street animals; and cameras are expected to be static. In the proposed research, effective representations of the crowd scene dynamics are sought. Such representation should enable multiple types of scene analysis. Particularly, we will be focusing on two domains: planning assistance and threat detection. In planning assistance, the proposed system will analyze videos of a target scene to produce the entrances and exits, and the common pathways taken by moving objects in the scene. It will also learn models for the normal motion speeds and directions in each pathway. This information can be very valuable for decision makers considering major renovations or designing similar facilities in different places.

The learned models for normal activities will then be used in the online phase of the system to provide early threat detection. The system will detect abnormal motions or appearances and report them as possible threats. For example, people could be moving in an abnormally random way in situations of street fights, or situations of emergency, such as observing a sick person or an unusual abandoned object. All these threats are marked by abnormal motion patterns, which will be detected by the proposed system. The system will also detect abnormally slow motion patterns, which also could indicate a hazardous situation, and may trigger authorities to re-route people to avoid more severe congestion. This is particularly relevant for traffic monitoring situations, or sites with extremely large crowds, such as Hajj sites. Both of these situations, whether involving vehicles (traffic) or involving people (Hajj), can be handled seamlessly by the proposed system because it works at the level of motion patterns, regardless of the nature of the moving objects.

The technologies developed under the proposed project will have a wider scope that can benefit other application domains, which could be explored in the future. The system targets modeling, analyzing and understanding crowd behavior in a video-taped scene. The term ’crowd’ here is not restricted to crowds of people. In fact, any scene containing a large number of moving objects is considered a crowd scene. This applies to traffic scenes with a high density of vehicles, as well as biological scenes with a high density of insects or living micro-organisms. To allow such broad range of crowd scenes, the developed technologies will not depend on detecting specific classes of objects. In fact, object detection and tracking in crowd scenes are subject to drastic failures and/or high computational costs. Hence, in this project, a unified representation that captures the two main aspects of scene dynamics, which are motion and appearance, will be adopted.

(A) structured crowded scene and (B) unstructured crowded scene.

Flow behavioral patterns obtained through Lagrangian particle dynamics.

Overview of the proposed system for crowd scene analysis. Blocks with dashed lines and dimmed text will be considered in the next phase of the project.

Computability and Computational Complexity over Non-Traditional Spaces
This is a joint research scheme between Egypt and France. Amount is 120,000 EGP + 13,000 EURO. (2015-2016)

Abstract

Cyber-Physical Systems CPSs refers to the integration and coordination of computing/communication systems with the dynamics of physical and engineered systems. Examples of CPSs are very diverse including: aerospace, chemical processes, healthcare, transportation, manufacturing, traffic networks, entertainment, power grids, weather, etc. Such systems can be as complicated as tele-surgery, where a surgical operation is performed remotely using a surgical robot and a wide area network (like the internet), or as simple as a normal computer sending e-mails to a dishwasher telling it to start. The following figure shows a general schematic view of a cyber-physical system.

A schematic view of a cyber-physical system.

By its very definition and nature research in CPSs is inherently multi-disciplinary and diversified. Research themes span the whole spectrum from the purely theoretical to the purely practical. However, this interaction between the physical and the cyber is much more complicated than what might look on the surface. This is due to the following simple fact: the physical and engineered systems are generally analyzed by continuous models in temporal and/or state evolution whereas digital computers (the execution and computation medium) are inherently discrete both in time and space. This discrepancy has deep implications on both the conceptual as well as the practical domains. Ill consideration of such issues has already caused several disasters with human casualties and huge material loss. Understanding and developing a computational framework to address such realm of computation, and accordingly the cyber-physical system in general, have been the target of an active research area; even since the early days of computer science. Several frameworks have been developed, among them computable analysis seems to be the most successful and the most amenable to practical implementation. Computable analysis is an approach to computation over generic metric and topological spaces, in particular computation over the real and complex numbers. Computable analysis provides a theoretical framework to investigate the computability and complexity-theoretic aspects over such spaces. It is based on Type II Turing machines which is an extension of the normal Turing machine model; see the following figure. Currently, computable analysis represents the most accepted and practical framework to study computation over continuous spaces.

Type II Turing Machine: y_0 = f_M(y_1, . . . , y_k)

The purpose of this research proposal is to continue working with my French colleagues in computable analysis and its relationships with the classical theory of computability. More specifically, we plan to address the following points. First, we will investigate computability over quasi-Polish spaces; such spaces are recently discovered and studied by Matthew de Brecht. Second, we plan to investigate computability and complexity-theoretic questions of piecewise affine dynamical systems. For example, to address the questions of whether one-dimensional systems can simulate universal Turing machines and what happens, from the complexity perspective, when restricting the time and precision of reachability problems. We have already started working on such points and reached some preliminary results and wish to use this IMHOTEP project for continuation of such collaboration.