Gideon:

Intelligent Video Analytics (IVA)

NVIDIA's Jetson Nano based AI-on-The-Edge Embedded Platform for Abnormal Activity Recognition with near Real-Time Inference on CCTV Surveillance.

Train your ANN/DNN Models using Google Cloud Platform Setup (Click here)

Our Video series based Step-by-Step Guide on how to setup Compute Engines (Virtual Machines) on Google Cloud Platform (GCP), for Machine Learning (ML), Data Mining (DM) and Artificial Intelligence (AI) related workloads, especially for Artificial Neural Networks (ANN) based Models Training, that are massively compute-resource hungry, requiring top-of-the-line NVIDIA's Tesla Series GPUs, extravagant RAM sizes and a cluster of CPUs (both Computer and Memory optimized).

Live Demo of Complete Processing Pipeline:

Here is the link to Live Demo Videos of our Video Analytics application's complete processing pipeline, in action:

https://github.com/fyp-gideon/GideonIVA_LiveDemo_JetsonNano

It contains 3 videos in total for the demonstration, plus the below link for the processed video in this demo:

https://drive.google.com/open?id=1IHAxsbgG2AS2vUWRAiQ_orVTxewdnN34

Project Abstract

Conventional CCTV based surveillance systems to observe and report unusual, suspicious and abnormal activities, depend heavily upon human attention. The concept behind them, until now, is merely a live stream of camera visual(s), being constantly recorded by a DVR and at the same time (but quite possible, not all the time), being monitored by a human supervisor. The entire reliability of such system is limited by the vigilance and attentiveness of its human supervisor. To eliminate these limitations, our project Gideon; is designed to monitor by analysing a live video stream of CCTV security system and generate rapid alarm against any suspicious/abnormal activity.

To achieve such accurate and prompt response on real-time video streams, we trained our 3D Deep Neural Network in both Spatial and Temporal domains to extract features, using UCF-Crime Video data-set[1]. Nature of such computations require intense parallel computing power for video-level features extraction and processing, which is usually available via cloud based, costly infrastructure, also exposing the AI application to network connectivity, bandwidth and latency issues. We differentiate our work by deploying AI on the Edge, instead of the Cloud. We achieved this by integrating the power of NVIDIA’s embedded parallel computing platform; Jetson NANO and our custom Intelligent Video Analytics (IVA) application, name the collective system as Gideon. By interfacing Gideon with CCTV cameras, it readily performs Inference using 3D DNN on situations of abnormal or suspicious activities, in real-time and promptly report through Alerts/Alarms.

Introduction

Nowadays, Intelligent Video Analytics (IVA) has become an intensive research domain, under the area of Computer Vision, Machine Learning and Artificial Intelligence. This has become possible as more and more AI accelerated hardware is becoming available. Conventional CCTV surveillance systems involve continuous human labour of visual focus on events happening and they also do not generate any automated alerts/alarms against suspicious/abnormal activity, own their own, if the human supervision is not present. And then the fact that humans get tired, cannot remain steadily active entire day long, require work breaks, may get distracted etc.. In some cases, such as homes, it is not even feasible to have human supervision available all the time, round the clock. Such windows of time are simply Down-Time or System Unresponsive.

In light of the above mentioned, Gideon; our Final Year Project is AI on the Edge, powered by Deep Learning frameworks and Computer Vision, capable of Real-Time Intelligent Video Analytics (IVA), trained for the detection of suspicious/abnormal activities, in context of Security Surveillance. It can be deployed in a plug-n-play manner with a currently installed CCTV system, seamlessly analysing and extracting the exact video frames of flagged segments, logging them in system application and generating a relevant alert/intimation through its connected Mobile and Web Interfaces. This automates the entire process of Security and Surveillance, eliminating the need of constant human dependency, attention to monitor and report.

The recent advancements in specialized fields of Artificial Intelligence, such as Machine Learning, Computer Vision, Artificial Neural Networks and Inference based on Video Analytics (Live Stream) have been phenomenal. The nature of their computational workload is highly parallel in nature, such as vectors and tensors processing. On the other hand the architecture of Graphics Processing Unit (GPU) is SIMD (Single Instruction Stream, Multiple Data Streams), involving hundreds to thousands of Floating Point Units (FPUs), capable of executing instructions on multiple data streams, in a single clock cycle. NVIDIA refers these clusters of FPUs to as CUDA cores and AMD refers the same as Streaming Processors (SM) and they are all contained inside a single GPU Chip. They are the brains behind the parallel computing platform.

In conjunction with the increasing density of CUDA Cores per GPU, decreasing die-size of Processor Chips, it has now resulted in greater Computational Cores, both monetary unit and volume. This phenomena has made it practically possible, to develop and deploy Deep Neural Network based applications that can perform Inference, on the Edge in embedded form-factor and harness the power of parallel computing to exploit them in endless possibilities. Surveillance and Security monitoring is turning out to be merely one of them.

Hence, with the development of Gideon; a theoretically and practically never-tiring, emotions-less electronic computer system, runs a Deep Neural Network based pre-trained model, to observe, infer and log suspicious/abnormal activities. Gideon reports a potential security breach or a threatening situation, through an automated system using its Mobile and Web interface, making the entire process of a Surveillance System, highly reliable, efficient and at the same time, eliminating constant human dependency.

Motivation and Scope

Our main idea finds its roots in the very fact that the conventional setup and infrastructure based on a single or network of CCTV cameras (deployed anywhere including private property, public places etc.), has to be constantly monitored by a human(s) observer. It's intensively laborious. Also in some cases, most of them rather, it is not even feasible to have a human observer monitoring all the time, such as in small homes. Or consider the case Punjab Safe Cities Authority (PSCA) where 8000+ cameras are already installed in the city of Lahore (Pakistan) only. With many more to be installed gradually, and it's ambit to expand to other cities in the province of Punjab as well, this will demand a huge human workforce to be deployed for such integrated monitoring and even with constant monitoring, it will not be the feasibility to cover all the cameras simultaneously, all the time.

So there exists a comprehensively wide window of the time that is fully exposed to going undetected for abnormal and suspicious activities. For instance, someone attempting to trespass one's property, or there is shooting happening somewhere in the city, but unfortunately no one attentive at the CCTV Cam in that moment to take a rapid action, although the activity is being recorded. So practically there is not much use of recording such activities alone, if a rapid and proper action cannot be taken on them, avoiding on controlling the collateral damage.

Our project, aims at providing cost feasible, computationally capable solution with AI on the Edge, by providing embedded hardware solution powered by Deep Learning and Computer Vision, that can be easily interfaced with current CCTV camera(s) in the market and provide the capability of Intelligent Video Analytics (IVA), detecting abnormal or suspicious activities.

The embedded solution in our case is NVIDIA Jetson NANO; a 99$ Heterogeneous Computing platform with 128 Maxwell Architecture CUDA Cores for Parallel Computing, with a quad core ARM Cortex A57 application processor @ 1.43 GHz, capable of real-time Inference with a pre- trained Neural Network application, without dependency on a cloud based resource.

Therefore, infrastructure replacement is not required. Only additional component(s) will be mounted. Secondly, AI on the Edge runs a pre-trained Neural Network, without the need to use a GPUs based Compute Server, remotely on Cloud. Hence, no network dependency for video analytics and inference, making surveillance system responsive in Real-Time constraints.

Tools and Technologies

Tools and technologies that have been used for this project include:

Software Technologies and Frameworks:

For Intelligent, Real-time Video Analytics Application

NVIDIA Jetpack
NVIDIA Deepstream
NVIDIA CUDA
NVIDIA VisionWorks
Tensorflow
Google Colabs
OpenCV
Anaconda (Python Distribution)
Ubuntu (OS)
2D/3d CNNs

For Web and Mobile Interfaces (UI)

Node.js
React.js
Express.js
MongoDB
Android Studio

Embedded Hardware Platform for AI on the Edge:

The embedded solution in our case is NVIDIA Jetson NANO; a 99$ Heterogeneous Computing platform with:

128 Maxwell Architecture based CUDA Cores
Quad core ARM Cortex A57 based application processor @ 1.43 GHz
4GB 64-bit LPDDR4 25.6 GB/s Shared RAM with GPU cores.
1 x MIPI CSI-2 DPHY lanes camera interface
10/100/1000BASE-T Ethernet
HDMI port with 1080p UHD output
4 x USB 3.0, 1 x USB 2.0 Micro-B
40 x GPIO, I2C, I2S, SPI, UART, microSD
Video Encoding: 4K @ 30fps (H.264/H.265)
Video Decoding: 4K @ 60fps (H.264/H.265)
Mechanical Form factor: 100mm x 80mm x 29mm.

Final Presentation

Gideon - Presentation (external) (1).pptx

Web Interface

Android Interface

Poster

DICE-IET 2020 Project Poster Template.pptx

References

[1] Waqas Sultani, C. C. (2018). Real-world Anomaly Detection in Surveillance Videos. 10.

[2] Du Tran, L. B. (2015). Learning Spatiotemporal Features with 3D Convolutional Networks. 16.

[3] Kun Liu, W. L. (2018). T-C3D: Temporal Convolutional 3D Network for Real-Time Action Recognition. The Thirty-Second AAAI Conference on Artificial Intelligence, (p. 8).

Project Supervisor

Dr. Usama Ijaz BajwaCo-PI, Video Analytics lab, National Centre in Big Data and Cloud Computing,Program Chair (FIT 2019),HEC Approved PhD Supervisor,Assistant Professor & Associate Head of DepartmentDepartment of Computer Science,COMSATS University Islamabad, Lahore Campus, Pakistanwww.usamaijaz.comwww.fit.edu.pk

The Team

Maroof Ismail

Email: maroofismailkhanniazi@gmail.com BS Student(Computer Science, COMSATS Lahore)GitHub Profile

Hassan Shafiq

Email: ciao.hassanshafiq@gmail.comBS Student(Computer Science, COMSATS Lahore)GitHub Profile

Abdul Wahab

Email: abdulwahab0193@gmail.comBS Student(Computer Science, COMSATS Lahore)GitHub Profile

Page updated

Google Sites

Report abuse