Introduction
Today, many jobs that once required large amounts of labor are now performed by automated systems. These include everything from car manufacturing plants and food processing facilities to daily tasks such as calculating budgets and communicating by email. One of the fields of computer automation to become feasible more recently is computer vision. Computer vision, the use of a computer to analyze input from an image sensor, has become increasingly popular thanks to advances in computer hardware. Computer vision is meant to perform tasks that would usually require some sort of visual analysis by a human. There are a wide variety of applications for computer vision, as so many processes require visual inspection or monitoring. There are many components involved in creating a computer vision system, but they generally perform three functions: segmentation, classification, and tracking.
Materials
The only devices needed to run this computer vision program are a computer and a webcam.
This project is written in Java, which is a platform-independent Just-In-Time (JIT) compiled language.
There are several pieces of software that are being used for this project. These include:
Windows 8
Oracle Java Development Kit (JDK) 7
NetBeans IDE (version 7.4)
Additionally, the Xuggler library is used to decode video.
Procedure
The test evaluates the accuracy of the object detector in a variety of conditions by calculating the distance between the detected position of the target feature and the “real” position of the target feature. The real positions of the target features are generated for each video using another application that I developed to allow me to manually locate a reference point attached to the target object. The program allows one to record the real position of an object by clicking on the reference point in each video frame.
Before beginning actual experimentation, a preliminary test is run to verify that given the same target image and video, the testing program would always produce the same output. This eliminates the necessity to perform more than one trial for each testing scenario. The experiment is performed using a version of the object detection application modified to record its accuracy. This application loads sets of real object coordinates and records the distance between the object’s detected position and its real position in each frame.
Data
The data for the experiment is collected and analyzed in a spreadsheet. Data is grouped by the video used in the test and potentially the algorithms used. Each data group will be a list of the accuracy of the algorithm at each frame of the video. The data will be used to calculate the average accuracy for each group. This data can be used to infer the limitations and strengths of the computer vision system.
Scientific Principles
This project involves a synthesis of existing algorithms to create a functioning computer vision system. It combines the use of things such as histogram of oriented gradients (HOG) descriptors, simple machine learning algorithms, and object tracking algorithms.
The project has undergone significant evolution by the scientific method. The original aim of the project was to compare the viability of a computer vision system written in Javascript with one written in Java. However, the aim quickly changed as it became apparent that there were plenty of variables within computer vision itself. Tests were performed on a few different algorithms until the application was re-implemented using HOG descriptors and a simple sliding-window object tracker.
The project aims to redesign the experiment yet again to produce a viable computer vision system, which will also incorporate a machine learning system and an improved tracking algorithm, as well as improved hardware usage through techniques such as multithreading.