Selected Research Contributions with ITU

Following are some selected research contributions presented and published through the Kaleidoscope events of the ITU Telecommunication Standardization Sector (ITU-T) held in different countries.

Enhancing User Experience in Pedestrian Navigation Based On Augmented Reality and Landmark Recognition

Pedestrian navigation using traditional mapping systems is constrained by the inherent limitations of existing digital online mapping services. The major challenges include complete reliance on GPS for user localization and inferior user experience caused by lack of information about the surroundings, especially in unknown environments. We designed and developed a marker-less augmented reality-based pedestrian navigation system which can handle navigation even in the absence of GPS as well as improve user experience by providing a novel landmark recognition feature, which allows users to identify nearby buildings or streets during navigation. To mitigate the absence of a GPS signal, a user localization method utilizing a step count-based distance estimator is implemented.

More details available at:

(i) https://www.itu.int/en/ITU-T/academia/kaleidoscope/2022/Documents/S2.2_1570820947.pdf

(ii) https://ieeexplore.ieee.org/document/10003059

Enhancing the System Model for Home Interior Design Using Augmented Reality

Home interior design is often a challenging and time-consuming task due to several mismatches between the selection of interiors in a shop and their composition in a target room. One of these mismatches is a lack of knowledge in relation to the target room and to other interiors when an item is seen in a shop. We proposed to design and develop a system model by which users can virtually try out various settings of home interiors by using marker-less Augmented Reality (AR). To minimize the latency of the system, we applied the hybrid approach of combining an AR framework with a Simultaneous Localization And Mapping (SLAM) algorithm, where the 3D feature points are updated dynamically, thereby avoiding occlusion when multiple interiors are placed on the real-world view.

More details available at:

(i) https://www.itu.int/en/ITU-T/academia/kaleidoscope/2021/Documents/Presentations/S2.2_1570746846%20presentation.pdf

(ii) https://ieeexplore.ieee.org/document/9662114

Abnormal Activity Recognition Using Deep Learning in Streaming Video for Indoor Application

Human activity recognition has emerged as a challenging research domain for video analysis. The major issue for abnormal activity recognition in a streaming video is the presence of the large spatio-temporal data along with the constraints of communication networks affecting the quality of received data for analysis. We proposed a deep learning-based system to identify abnormal human activities using a combination of Skeleton Activity Forecasting (SAF) and a Bi-LSTM network. The generated skeleton joint points of a human subject are used for the pose estimation. The skeleton tracking and regions of interest points are estimated on a streaming video from an IP networked camera. The extracted interest points and their corresponding features are optimized and used to classify them as normal, abnormal or suspicious actions.

More details available at:

(i) https://www.itu.int/en/ITU-T/academia/kaleidoscope/2021/Documents/Presentations/S3.2_1570747571%20presentation.pdf

(ii) https://ieeexplore.ieee.org/document/9662095

Visual Action Recognition Using Deep Learning in Video Surveillance Systems

The skeleton tracking technique allows the usage of the skeleton information of human-like objects for action recognition. The major challenge in action recognition in a video surveillance system is the large variability across and within subjects. We proposed a deep-learning-based novel framework to recognize human actions using skeleton estimation. The main component of the framework consists of pose estimation using a stacked hourglass network (HGN). The pose estimation module provides the skeleton joint points of humans. Since the position of skeleton varies according to the point of view, we apply transformations on the skeleton points to make it invariable to rotation and position. The skeleton joint positions are identified using HGN-based deep neural networks (HGN-DNN), and the feature extraction and classification is carried out to obtain the action class. The skeleton action sequence is encoded using Fisher Vector before classification. The proposed system complies with Recommendation ITU-T H.626.5 “Architecture for intelligent visual surveillance systems”, and has been evaluated over benchmarked human action recognition data sets.

More details available at:

(i) https://www.itu.int/en/ITU-T/academia/kaleidoscope/2020/Documents/Presentations/S8.2_Visual_Action_Recognition_D.Kumar.pdf

(ii) https://ieeexplore.ieee.org/document/9303222

Elderly Health Monitoring System with Fall Detection Using Multi-Feature-Based Person Tracking

The need for personalized surveillance systems for elderly healthcare has risen drastically. However, recent methods involving the usage of wearable devices for activity monitoring offer limited solutions. To address this issue, we have proposed a system that incorporates a vision-based deep learning solution for elderly surveillance. This system primarily consists of a novel multi-feature-based person tracker (MFPT), supported by an efficient vision-based person fall detector (VPFD). The MFPT encompasses a combination of appearance and motion similarity in order to perform effective target association for object tracking. The similarity computations are carried out through Siamese convolutional neural networks (CNNs) and long-short term memory (LSTM).

More details available at:

(i) https://www.itu.int/en/ITU-T/academia/kaleidoscope/2019/Documents/Presentations/S5.1_Dhananjay%20Kumar.pdf

(ii) https://ieeexplore.ieee.org/document/8996141

Optical Flow Based Learning Approach for Abnormal Crowd Activity Detection with Motion Descriptor Map

Automated abnormal crowd activity detection with faster execution time has been a major research issue in recent years. In this work, a novel method for detecting crowd abnormal activities is proposed which is based on processing of optical flow as motion parameter for machine learning. The proposed model makes use of magnitude vector which represents motion magnitude of a block in eight directions divided by a 45 degree pace angle. Further, motion characteristics are processed using Motion Descriptor Map (MDP), which takes two main parameters namely aggregate magnitude of motion flow in a block and Euclidean distance between blocks.

More details available at:

(i) https://www.itu.int/en/ITU-T/academia/kaleidoscope/2018/Documents/Presentations/S5.3_Optical%20Flow%20Based%20Learning_Kumar.pdf

(ii) https://ieeexplore.ieee.org/document/8597814

Double SARSA Based Machine Learning To Improve Quality of Video Streaming Over HTTP through Wireless Networks

The adaptive streaming over HTTP is widely advocated to enhance the Quality of Experience (QoE) in a bitrate constrained IP network. However, most previous approaches based on estimation of available link bandwidth or fullness of media buffer tend to become ineffective due to the variability of IP traffic patterns. We proposed a Double State-Action-Reward-State-Action (SARSA) based machine learning method to improve user QoE in IP network. The Pv video quality estimation model specified in ITU-T P.1203.1 recommendation is embedded in the learning process for the estimation of QoE. We have implemented the proposed Double SARSA based adaptation method on the top of HTTP in a 4G wireless network and assessed the resulting quality improvement by using full reference video quality metrics.

More details available at:

(i) https://www.itu.int/en/ITU T/academia/kaleidoscope/2018/Documents/Presentations/S1.3_Double%20Sarsa%20Video%20Streaming_Kumar.pdf

(ii) https://ieeexplore.ieee.org/document/8597682

Machine learning approach for quality adaptation of streaming video through 4G wireless network over HTTP

Video streaming over HTTP through 4G wireless network used for multimedia applications faces many challenges due to fluctuations in network conditions. The existing HTTP Adaptive Streaming (HAS) techniques based on prediction of buffer state or link bandwidth offer solution to some extent, but if the link condition deteriorates, the adaptation process may reduce the streaming bit rate below an acceptable quality level. In this work, we proposed a machine learning based method, State Action Reward State Action (SARSA) Based Quality Adaptation algorithm using Softmax Policy (SBQA-SP), which identifies the current state (Throughput), action (Streaming quality) and reward (current video quality) at client to determine the future state and action of the system. The ITU-T G.1070 recommendation (parametric) model is embedded in the SBQA-SP to implement adaptation process. The proposed system was implemented on the top of HTTP in a typical internet environment using 4G wireless network and the streaming quality was analyzed using several full reference video metrics.

More details available at:

(i) https://www.itu.int/en/ITU-T/academia/kaleidoscope/2017/Documents/presentations/S5.1.pdf

(ii) https://ieeexplore.ieee.org/abstract/document/8246996

Furthermore details are available at:

https://www.itu.int/en/ITU-T/academia/kaleidoscope/Pages/default.aspx