Selected Projects

Multiple AUs running on a single video stream

Elixir Overview

Elixir: A system to enhance data quality for multiple analytics on a video stream

Sibendu Paul, Kunal Rao, Giuseppe Coviello, Murugan Sankaradas, Y. Charlie Hu, Srimat T. Chakradhar

IoT sensors, especially video cameras, are ubiquitously deployed around the world to perform a variety of computer vision tasks in several verticals including retail, healthcare, safety and security, transportation, manufacturing, etc. To amortize their high deployment effort and cost, it is desirable to perform multiple video analytics tasks, which we refer to as Analytical Units (AUs), off the video feed coming out of every camera. As AUs typically use deep learning-based AI/ML models, their performance depend on the quality of the input video, and recent work has shown that dynamically adjusting the camera setting exposed by popular network cameras can help improve the quality of the video feed and hence the AU accuracy, in a single AU setting. In this paper, we first show that in a multi-AU setting, changing the camera setting has disproportionate impact on different AUs performance. In particular, the optimal setting for one AU may severely degrade the performance for another AU, and further the impact on different AUs varies as the environmental condition changes. We then present Elixir, a system to enhance the video stream quality for multiple analytics on a video stream. Elixir leverages Multi-Objective Reinforcement Learning (MORL), where the RL agent caters to the objectives from different AUs and adjusts the camera setting to simultaneously enhance the performance of all AUs. To define the multiple objectives in MORL, we develop new AU-specific quality estimator values for each individual AU. We evaluate Elixir through real-world experiments on a testbed with three cameras deployed next to each other (overlooking a large enterprise parking lot) running Elixir and two baseline approaches, respectively. Elixir correctly detects 7.1% (22,068) and 5.0% (15,731) more cars, 94% (551) and 72% (478) more faces, and 670.4% (4975) and 158.6% (3507) more persons than the default-setting and timesharing approaches, respectively. It also detects 115 license plates, far more than the time-sharing approach (7) and the default setting (0).

City Wide Video Analytics

CamTuner Overview

CamTuner: Enhancing Video Analytics Accuracy via Real-time Automated Camera Parameter Tuning

Sibendu Paul, Kunal Rao, Giuseppe Coviello, Murugan Sankaradas, Oliver Po, Y. Charlie Hu, Srimat T. Chakradhar

Video analytics systems critically rely on video cameras, which capture high-quality video frames, to achieve high analytics accuracy. Although modern video cameras often expose tens of configurable parameter settings that can be set by end-users, deployment of surveillance cameras today often uses a fixed set of parameter settings because the end-users lack the skill or understanding to reconfigure these parameters. In this paper, we first show that in a typical surveillance camera deployment, environmental condition changes can significantly affect the accuracy of analytics units such as person detection, face detection and face recognition, and how such adverse impact can be mitigated by dynamically adjusting camera settings. We then propose CAMTUNER, a framework that can be easily applied to an existing video analytics pipeline (VAP) to enable automatic and dynamic adaptation of complex camera settings to changing environmental conditions, and autonomously optimize the accuracy of analytics units (AUs) in the VAP. CAMTUNER is based on SARSA reinforcement learning (RL) and it incorporates two novel components: a light-weight analytics quality estimator and a virtual camera. CAMTUNER is implemented in a system with AXIS surveillance cameras and several VAPs (with various AUs) that processed day-long customer videos captured at airport entrances. Our evaluations show that CAMTUNER can adapt quickly to changing environments. We compared CAMTUNER with two alternative approaches where either static camera settings were used, or a strawman approach where camera settings were manually changed every hour (based on human perception of quality). We observed that for the face detection and person detection AUs, CAMTUNER is able to achieve up to 13.8% and 9.2% higher accuracy, respectively, compared to the best of the two approaches (average improvement of ∼8% for both face detection and person detection AUs). When two cameras are deployed side-by-side in a real world setting (where one camera is managed by CAMTUNER, and the other is not), and video streams are sent over a 5G network, our dynamic tuning approach improves the accuracy of AUs by 11.7%. Also, CAMTUNER does not incur any additional delay in the VAP. Although we demonstrate use of CAMTUNER for remote and dynamic tuning of camera settings over a 5G network, CAMTUNER is lightweight and it can be deployed and run on small form-factor IoT devices. We also show that our virtual camera abstraction speeds up the RL training phase by 14X.

Impact of Distortions on Accuracy

Training and Running AQuA

AQUA: Analytical QUality Assessment of Images for Optimizing Video Analytics Systems

Sibendu Paul, Utsav Drolia, Y. Charlie Hu, Srimat T. Chakradhar

Millions of cameras at edge are being deployed to power a variety of different deep learning applications. However, the frames captured by these cameras are not always pristine - they can be distorted due to lighting issues, sensor noise, compression etc. Such distortions not only deteriorate visual quality, they impact the accuracy of deep learning applications that process such video streams. In this work, we introduce AQuA, to protect application accuracy against such distorted frames by scoring the level of distortion in the frames. It takes into account the analytical quality of frames, not the visual quality, by learning a novel metric, classifier opinion score, and uses a lightweight, CNN-based, object-independent feature extractor. AQuA accurately scores distortion levels of frames and generalizes to multiple different deep learning applications. When used for filtering poor quality frames at edge, it reduces high-confidence errors for analytics applications by 17%. Through filtering, and due to its low overhead (14ms), AQuA can also reduce computation time and average bandwidth usage by 25%.

QoE requirements of VR

Coterie Deployment Setup

Coterie: Exploiting Frame Similarity to Enable High-Quality Multiplayer VR on Commodity Mobile Devices

Jiayi Meng*, Sibendu Paul*, Y. Charlie Hu (* co-primary authors)

Video

In this work, we study how to support high-quality immersive multiplayer VR on commodity mobile devices. First, we perform a scaling experiment that shows simply replicating the prior-art 2-layer distributed VR rendering architecture to multiple players cannot support more than one player due to the linear increase in network bandwidth requirement. Second, we propose to exploit the similarity of background environment (BE) frames to reduce the bandwidth needed for prefetching BE frames from the server, by caching and reusing similar frames. We find that there is often little similarly between the BE frames of even adjacent locations in the virtual world due to a “near-object” effect. We propose a novel technique that splits the rendering of BE frames between the mobile device and the server that drastically enhances the similarity of the BE frames and reduces the network load from frame caching. Evaluation of our implementation on top of Unity and Google Daydream shows our new VR framework, Coterie, reduces per-player network requirement by 10.6X-25.7X and easily supports 4 players for high-resolution VR apps on Pixel 2 over 802.11ac, with 60 FPS and under 16ms responsiveness.

Default Scheduling Policy (Low Resolution)

Predictive Scheduling Policy (High Resolution)

Predictive Scheduling for Virtual Reality

I-Hong Hou, Narges Zarnaghinaghsh, Sibendu Paul, Y. Charlie Hu, Atilla Eryilmaz.

Video

A significant challenge for future virtual reality (VR) applications is to deliver high quality-of-experience, both in terms of video quality and responsiveness, over wireless networks with limited bandwidth. This paper proposes to address this challenge by leveraging the predictability of user movements in the virtual world. We consider a wireless system where an access point (AP) serves multiple VR users. We show that the VR application process consists of two distinctive phases, whereby during the first (proactive scheduling) phase the controller has uncertain predictions of the demand that will arrive at the second (deadline scheduling) phase. We then develop a predictive scheduling policy for the AP that jointly optimizes the scheduling decisions in both phases. In addition to our theoretical study, we demonstrate the usefulness of our policy by building a prototype system. We show that our policy can be implemented under Furion, a Unity-based VR gaming software, with minor modifications. Experimental results clearly show visible difference between our policy and the default one. We also conduct extensive simulation studies, which show that our policy not only outperforms others, but also maintains excellent performance even when the prediction of future user movements is not accurate.