Histogram of Oriented Gradients and Object Detection

If you remember,in last blog we discussed little bit about Implementation Histogram of Oriented Gradients for Objection Detection.

I’ve been working on a Python framework/package to rapidly construct object detectors using Histogram of Oriented Gradients and Linear Support Vector Machines. So, Now we should review of how the Histogram of Oriented Gradients method is used in conjunction with a Linear SVM to train a robust object detector.

Drawbacks of Haar Cascade Classifiers

Honestly, I really can’t stand using the Haar cascade classifiers provided by OpenCV (i.e. the Viola-Jones detectors) — and hence why I’m working on my own suite of classifiers. While cascade methods are extremely fast, they leave much to be desired. If you’ve ever used OpenCV to detect faces you’ll know exactly what I’m talking about.


Falsely detected face because of haar Cascade

Figure : Example of falsely detecting a face in an image. This is a common problem when using Haar cascade classifiers.

Figure : Example of falsely detecting a face in an image. This is a common problem when using Haar cascade classifiers.

Figure : Example of falsely detecting a face in an image. This is a common problem when using Haar cascade classifiers.(sometimes it even does not detects face.)

In order to detect faces/humans/objects/whatever in OpenCV (and remove the false positives), you’ll spend a lot of time tuning the cv2.detectMultiScale parameters. And again, there is no guarantee that the exact same parameters will work from image-to-image. This makes batch-processing large datasets for face detection a tedious task since you’ll be very concerned with either (1) falsely detecting faces or (2) missing faces entirely, simply due to poor parameter choices on a per image basis.

Now, We have Histogram of Oriented Gradients. We have deformable parts models. Exemplar models. And we are now utilizing Deep Learning with pyramids to recognize objects at different scales!

what's idea behind Histogram Of Oriented Gradients ??

The idea of HOG is instead of using each individual gradient direction of each individual pixel of an image, we group the pixels into small cells. For each cell, we compute all the gradient directions and group them into a number of orientation bins. We sum up the gradient magnitude in each sample. So stronger gradients contribute more weight to their bins, and effects of small random orientations due to noise is reduced. This histogram gives us a picture of the dominant orientation of that cell. Doing this for all cells gives us a representation of the structure of the image. The HOG features keep the representation of an object distinct but also allow for some variations in shape.

brief introduction to the topic is explained in II.1. Theory of Histrogram of Oriented Gradients(H.O.G.)

All that said, even though the Histogram of Oriented Gradients descriptor for object recognition is nearly a decade old, it is still heavily used today — and with fantastic results. The Histogram of Oriented Gradients method suggested by Dalal and Triggs in their seminal 2005 paper, Histogram of Oriented Gradients for Human Detection demonstrated that the Histogram of Oriented Gradients (HOG) image descriptor and a Linear Support Vector Machine (SVM) could be used to train highly accurate object classifiers — or in their particular study, human detectors.

implementing HOG

Histogram of Oriented Gradients and Object Detection with Linear Support Vector Machine (SVM).

This method can be broken into a 7-step process, including:

I’m not going to review the entire detailed process of training an object detector using Histogram of Oriented Gradients (yet), simply because each step can be fairly detailed. But I wanted to take a minute and detail the general algorithm for training an object detector using Histogram of Oriented Gradients. It goes a little something like this:

Github Repos:

Step 1:

Sample P positive samples from your training data of the object(s) we want to detect and extract HOG descriptors from these samples. By applying feature extraction on a labeled training set of images.

Step 2:

Sample N negative samples from a negative training set that does not contain any of the objects we want to detect and extract HOG descriptors from these samples as well. In practice N >> P. By applying feature extraction on a labeled training set of image.

Step 3:

Train a Linear Support Vector Machine on your positive and negative samples.

Step 4: SLIDING WINDOW TECHNIQUE

For each image and each possible scale of each image in your negative training set, apply the sliding window technique and slide your window across the image. At each window compute your HOG descriptors and apply our classifier. If our classifier (incorrectly) classifies a given window as an object (and it will, there will absolutely be false-positives), record the feature vector associated with the false-positive patch along with the probability of the classification. This approach is called hard-negative mining.

sliding window technique

Figure 2: Example of the sliding a window approach, where we slide a window from left-to-right and top-to-bottom. Note: Only a single scale is shown. In practice this window would be applied to multiple scales of the image.


Figure 3: A second example of applying a sliding window to each layer of the image pyramid.

Step 5:

Re-training your Linear SVM using the hard-negative samples

Take the false-positive samples found during the hard-negative mining stage, sort them by their confidence (i.e. probability) and re-train your classifier using these hard-negative samples. (Note: You can iteratively apply steps 4-5, but in practice one stage of hard-negative mining usually [not not always] tends to be enough. The gains in accuracy on subsequent runs of hard-negative mining tend to be minimal.)

Figure : Example of Re-training the hard-negative samples

Step 6:

our classifier is now trained and can be applied to your test dataset.

Again, just like in Step 4, for each image in our test set, and for each scale of the image, apply the sliding window technique. At each window extract HOG descriptors and apply your classifier. If your classifier detects an object with sufficiently large probability, record the bounding box of the window.

These are the bare minimum steps required, but by using this 6-step process you can train and build object detection classifiers of your own! Extensions to this approach include a deformable parts model and Exemplar SVMs, where you train a classifier for each positive instance rather than a collection of them.

However, if you’ve ever worked with object detection in images you’ve likely ran into the problem of detecting multiple bounding boxes around the object you want to detect in the image.

Step 7:

After you have finished scanning the image, apply non-maximum suppression to remove redundant and overlapping bounding boxes.

GITHUB REPOS :

Here’s an example of this overlapping bounding box problem:

Figure 3: (Left) Detecting multiple overlapping bounding boxes around the face we want to detect. (Right) Applying non-maximum suppression to remove the redundant bounding boxes.

Notice on the left we have 6 overlapping bounding boxes that have correctly detected face. However, these 6 bounding boxes all refer to the same face — we need a method to suppress the 5 smallest bounding boxes in the region, keeping only the largest one, as seen on the right.

To fix this situation we’ll need to apply Non-Maximum Suppression (NMS), also called Non-Maxima Suppression.

This is a common problem, no matter if you are using the Viola-Jones based method or following the Dalal-Triggs paper.

There are multiple ways to remedy this problem. Triggs et al. suggests to use the Mean-Shift algorithm to detect multiple modes in the bounding box space by utilizing the (x, y) coordinates of the bounding box as well as the logarithm of the current scale of the image.

I spent some time looking for a good non-maximum suppression (sometimes called non-maxima suppression) implementation in Python.

I found two methods to apply for this issue:

It’s important to note that Tomasz’s method is over 100x faster than the Felzenszwalb et al. method. And when you’re executing your non-maximum suppression function millions of times, that 100x speedup really matters.

Non-Maximum Suppression for Object Detection in Python

Histogram of Oriented Gradients(HOG) combined with Support Vector Machines(SVM) have been pretty successful for detecting objects in images but the problem with those algorithms is that they detect multiple bounding boxes surrounding the objects in the image. Hence they are not applicable in our case that is detecting pedestrians on crowded roads. Here’s where Non maximum suppression(NMS) comes to rescue to better refine the bounding boxes given by detectors. In this algorithm we propose additional penalties to produce more compact bounding boxes and thus become less sensitive to the threshold of NMS. The ideal solution for crowds under their pipelines with greedy NMS is to set a high threshold to preserve highly overlapped objects and predict very compact detection boxes for all instances to reduce false positives.

I) implementing the Felzenszwalb et al. method for non-maximum suppression in Python

Open up a file, name it nms.py , and let’s get started implementing the Felzenszwalb et al. method for non-maximum suppression in Python:


Finally, we return the set of picked bounding boxes (the ones that were not suppressed) on Line 67.

Let’s go ahead and create a driver so we can execute this code and see it in action. Open up a new file, name it nms_slow.py, and add the following code:

Non-Maximum Suppression for Object Detection in action using Python

Non-Maximum Suppression in Action

To see the Felzenszwalb et al. non-maximum suppression method in action, download the source code and accompanying images for this post from the bottom of this page, navigate to the source code directory, and issue the following command:

Non-Maximum Suppression for Object Detection in Python
$ python nms_slow.py

First, you’ll see the Audrey Hepburn image:

Figure 2: Our classifier initially detected six bounding boxes, but by applying non-maximum suppression, we are left with only one (the correct) bounding box.






Notice how six bounding boxes were detected, but by applying non-maximum suppression, we are able to prune this number down to one.







(Faster) Non-Maximum Suppression in Python

open up a new file in your favorite editor, name it Faster_nms.py, and let’s get started on creating a faster non-maximum suppression implementation:

This code looks almost identical to previous one, right?

So you’re probably asking yourself “Where does this 100x speed-up come from?”

And the answer is the removal of an inner for loop.

The implementation of pevious NMS required an extra inner for loop to compute the size of bounding regions and compute the ratio of overlapped area.

Instead, Dr. Malisiewicz replaced this inner for loop with vectorized code — and this is how we are able to achieve substantially faster speeds when applying non-maximum suppression.

The Malisiewicz et al. method is essentially identical to the Felzenszwalb et al. method, but by using vectorized code we are able to reach a reported 100x speed-up in non-maximum suppression!

Faster Non-Maximum Suppression in Action

Figure 3: Six bounding boxes are detected around the face, but by applying fast non-maximum suppression we are correctly able to reduce the number of bounding boxes to one.

In this last example we can again see that our non-maximum suppression algorithm is working correctly — even though six original bounding boxes were detected by the HOG + Linear SVM detector, applying non-maximum suppression has correctly suppressed five of the boxes, leaving us with the final detection.

SOURCE CODES:





Summary

In this blog post we had a little bit of a history lesson regarding object detectors. We also had a sneak peek into a Python framework that I am working on for object detection in images.

From there we had a quick review of how the Histogram of Oriented Gradients method is used in conjunction with a Linear SVM to train a robust object detector.

However, no matter what method of object detection you use, we will likely end up with multiple bounding boxes surrounding the object we want to detect. In order to remove these redundant boxes we’ll need to apply Non-Maximum Suppression.

We described two method for Non-Maximum Suppression

first, we apply apply the Felzenszwalb et al. method for non-maximum suppression in which Instead of returning all of the found bounding boxes you should first apply non-maximum suppression to ignore bounding boxes that significantly overlap each other.

However, there are improvements to be made to the Felzenszwalb et al. method for non-maximum suppression.

then next post we had implement the method suggested by my friend Dr. Tomasz Malisiewicz which is reported to be over 100x faster!

This method is nearly identical to the Felzenszwalb et al. method, but by removing an inner for loop and using vectorized code we are able to obtain substantial speed-ups.

If you want to get in touch and by the way, you know a good joke you can connect with me on Twitter or LinkedIn.

Thanks for reading!😄 🙌