REAL-TIME OBJECT RECOGNITION

INTRODUCTION

The real-time object recognition has been a growing trends in compute vision. In this project, such function is realized by utilization of related API from Tensorflow and OpenCV. However, the purpose of the project is to understand the principles of the realization and relative knowledge in Computer Vision and Machine Learning.

By finishing such project, the expected performance is shown below.

There are many learning/classification algorithms exists nowadays such as PCA (principal component analysis), neural network., etc. However, when dealing with real-time situation, the real-time computation expense becomes critical. Thus, the neural network becomes a more viable option since let alone its long training time, the computation cost is low when dealing with real time data feed.

EXAMPLE

Key Python Packages:

TensorFlow (Model Training and Detection)
NumPy (Mathematical purpose and image matrix-representation)
OpenCV (Image and Video Display)

Structure & Flow Chart

Module

Demo

Challenges

1. Vague documentary and manual.

Solution: Get helps through TensorFlow GitHub Repo and Stack Overflow. Also try several simple examples with the source code.

2. TensorFlow compatibility with Mac OS High Sierra.

Solution: Use Virtualenv for package and project management. Virtualenv is a tool to create isolated Python environment.

3. Training samples collection

Solution: Scripted Python Code to crawl related images from Google Images.

4. Labeling huge amount of Images

Solution: Used LabelImgs for labeling.

5. Training took a long time

Solution:

a. Scale down the number of training samples from 1500 to 200.

b. Resize the images from 1024 * 768 to 400 * 300.

c. Reduce the number of iterations from 20000 to 5000.

d. Reduce the number of classes to be recognized from 12 to 1.

e. Use GPU instead of CPU for training. Another way is to use Cloud Computing such as AWS or Google Cloud ML

6. Display and Detection lag of the video

Solution: Configure the code that when the fps drop below the threshold, the next couple frames captured will not be sent to the learnt model for detection. It will be directly displayed unchanged by OpenCV. Thus it may create slightly noncontinuous display of the display boxes and labels.

Google Sites

Report abuse