Road Sign Identification

Road Sign Identification Data

We used feature extraction and matching to identify stop signs. Feature extraction is the process of applying mathematical operations such as transforms and derivatives to extract features such as edges, corners, and blobs, or regions in an image that differ in properties, of the input file.

There are two major algorithms that we tried out for identifying road signs. The first one was SIFT (Scale-Invariant Feature Transform) and the other one, which is the algorithm we used for this project is SURF (Speeded-Up Robust Feature). Both of these algorithms are ideal for object identification because they extract visual features of contents in images, videos, and other similar applications under various conditions such as different lighting and/or rescaled images.

Below is a brief brief overview of what each algorithm does and why we chose to use SURF instead of SIFT.

SIFT Algorithm:

SIFT is an algorithm consisting of the following steps:

Apply Gaussian Blur operator to resized copies of the image - this reduces noise by "smoothing" out the image
Calculate the Difference of Gaussians (DoG) - approximates taking the 2nd order derivative of the image to find areas with sharp changes (potential features)
Find local minima and maxima in DoG images by comparing the values of a pixel with its neighbors, then use a Taylor expansion to approximate the true maxima as if the image were a continuous function (since pixels are a discrete representation of the image)

Below is the DoG equation where L(x, y, kσ) is the convolution of a Gaussian filter and the original image. The DoG is then computed by subtracting the convoluted image at different scales.

After finding all possible keypoints, we must reject low contrast keypoints - minima and maxima, with a magnitude less than a certain value, to eliminate as many false-positives as possible. Then calculate the gradients at valid keypoints to determine whether a keypoint is a corner, edge or flat spot in the image. Corners are desirable keypoints.
Use gradient calculations around keypoints to determine the orientation of the keypoint. This allows features to be matched regardless of their orientation (rotated images).
Generate feature vectors (128-dimension), which are vectors that uniquely identify keypoints by taking the gradient of surrounding regions and binning based on gradient magnitude and orientation

SURF Algorithm:

SURF consists of 3 main parts: detection, description, and matching. The keypoint calculations for SURF are essentially the same as SIFT. The key difference is that SURF uses integral image and box filters to compute points; these two applications decreases processing time greatly because they create a 64-dimension matrix rather than an 128-dimension matrix. Ultimately, we decided to use this algorithm because it is considered to work much faster and is better in blurry and less ideal situations/road conditions.

SURF uses this formula to the left, the this algorithm uses integral images to make the computations significantly quicker. This equation adds up all of the points above and to the left of the point you are looking at (x,y).

Once the SIFT or SURF algorithm has been performed on two images, we can compare the extracted features by taking the Euclidean distances between their feature vectors and finding those that have small distances. The SURF process is illustrated below on our test photos:

Reference Image: https://www.safetysign.com/images/source/grid-images/stop-signsy1249.40ae128b3c1a257f6813a3564eea2662.jpg

False Detections

As you can see in the image below, our feature matching was not always accurate. Out of the 11 points matched between our reference image (right) and the test image (left), 5 of them were matched to areas on the road, buildings, and cars.

We must figure out a way to decrease false positive feature matching without decreasing true positive points. When matching features between other photos we took and the reference image, results became more accurate as we decreased the matching and ratio threshold. The matching threshold represents a percent of the distance from a perfect match. Features will not be labelled as matches unless they are below this threshold. To see more visit the MATLAB page for matchFeatures.

To alleviate the issue, we decreased the match threshold. By lowering the threshold value, we ensure that only features that are extremely similar will result in matches. As a result of lowering this value in our image, the 5 false positive values were removed, but we still kept the 7 true positives on the stop sign.

Thresholding & Filtering to Reduce False Detections

While reducing the matching threshold was effective for certain images, it did not prevent false matches with enough consistency to be effective. In the video below, we extracted each frame of a moving vehicle approaching a stop sign and attempted to run SURF to perform feature matching and circle matched features:

final_vid_no_thresholding.avi

Original Video Link: https://www.youtube.com/watch?v=f9Q9jYOSzOQ&t=842s

Clearly, this is not a satisfactory result. The SURF algorithm appears to match many objects outside of the stop sign, including the "STOP" marking on the road. We wanted to remove these unneeded parts of the image and extract just the area of the image around the sign to aid the SURF algorithm and reduce false detections. To do this, we implemented prepocessing in the form of thresholding and filtering the image. The goal of our thresholding was to accurately isolate our region of interest - in this case, our stop sign. We decided to first isolate the red color of the stop sign to determine our threshold, and used the following formula:

Where f (x,y) corresponds to the 0-255 color value of the image and Xmin and Xmax correspond to thresholds for that value. For green and blue, we used the ratio of the blue/green color value to the red value, which has been noted to work better in poor lighting conditions.

We then applied a filter in MATLAB using bwareafilt, which filtered out groups of less than 400 white (1) pixels and greater than 400000 white (1) pixels. This removed some of the small noise in the image, as well as filtered for large red objects such as red cars. After filtering, we dilated the image, which removed holes in the white areas of the stop sign, allowing for further filtering. The last filter we applied found the grouping of pixels with smallest eccentricity, and filtered everything else out. Eccentricity is a measure of how circular an ellipse is. By drawing an ellipse around each grouping of pixels, MATLAB assigns the groups eccentricity values. Since stop signs are very circular, they have a low eccentricity, and are thus detected easily by the filter. Finally, the binary image generated is used to mask the original image, revealing our region of interest.

Final Video

We applied thresholding as described above to each video frame, then used SURF to match features to our reference image. The results were significantly improved, as shown in the video.

Drawbacks

Clearly, the detection is still not perfect. When the stop sign leaves the frame, we see that SURF still matches features with the side of the road. Lowering the match threshold did not seem to alleviate this issue without removing the good matches as well. Additionally, red traffic lights will not be filtered out by our pre-processing and could potentially result in false matches.

final_vid (1).avi