In computer vision we generally need to provide the classifier samples of various objects and these samples are classifier as positive images or negative images. The difference between the both are that positive images are images or samples of just the object of being classified while negative images or samples are images of other objects not containing our positive sample.
To gather positive images quickly we want to use a sample set that has been collected from Image-Net which can display all images relating to a classified tag which is basically a database of images that are organized based on a search term such as for example "computer". Image-Net is free and open source to use which basically just takes images from the internet or Google and classifies them for us already and packages all images into download links for us to quickly download.
For positive images, we want them to be no more in dimensions of 50 pixels by 50 pixels and we want the images to be squared to make the classifier as accurate as possible and at the same time to quickly train it. Any dimensions bigger than 50 pixels by 50 pixels is just pushing for the classifier to work harder (your computer). We want to be able to keep retraining the classifier multiple times using various object sets we will be generating.
Like positive images, we can obtain negative images off from Image-Net so not much more explaining will be put into negative images.
Some things to note about negative images is that generally we want to have at least twice the amount of samples compared to positive images. For example we have 1,000 positive images then we need at least 2,000 negative images of anything other than the positive image. Also the dimensions should be larger than positive images so the maximum we can have positive images at are 100 pixels by 100 pixels in dimension.
Downloading all these images are daunting but we have a Python script that can do that for us instead of manually downloading each one, resizing them and then applying a grayscale filter to them. The Python script is called image_link_downloader.py which can be found at the Python Files page. Please go to the page download the file and put it in the workspace for OpenCV.
There is also an understanding that broken link images or images that don't exist on the website of host sometimes will display a common image that will be downloaded. We classify these images as "dirty" images that will affect our classifier accuracy and want to eliminate these "dirty" images. We can call these images as trash images to keep things simple because all they are to us is just trash. The task is easy but is also daunting if we have to do this manually but there is also a Python script for that. The script is called find_trash.py which basically finds all trash images and removed them from the set. This does it for both positive and negative image sets. You may also download this file at the Python Files page.
We also need to create descriptors for both the positive and negative samples which there is also a Python script called descriptors.py which looks into both samples and creates the list of images in each directory for us and outputs it into text files for us. You may download this file at the Python Files page.