Automotive Perception
Team-members: Ritish Shailly and Aaron Giuffré
Fall 2022 ECE 5554 Computer Vision: Course Project
Virginia Tech
Abstract: This work applies computer vision models to dash cam video data of downtown Blacksburg in order to recognize artifacts useful to autonomous driving. Aside from the application, the goal of this work is to test the efficacy of various models. Video data is processed frame by frame to highlight relevant objects such as pedestrians, cars, and lanes. Pedestrian recognition is done using 2 different methods: Histogram of Oriented Gradients (HOG) and by using a pre-trained YOLO model for pedestrian recognition.[1] The trained model used for HOG was obtained from the OpenCV library. Car detection was done using Haar cascades pre-trained model and a pre-trained YOLO model. Car tracking is done with the Kanade-Lucas-Tomashi (KLT) feature tracker and sparse optical flow algorithm[5][6][7]. Lane detection is done with a combination of image preprocessing, image masking, Canny Edge Detection, and Hough Transformation. [10] As result, each model shows a qualitative demonstration of the video processing. It can be seen that the pre-trained YOLO model performs better for car and people detection, the Hough Transform approach detects the lane in real-time and the KLT approach works well for tracking an object in a video.
Teaser Figure
Introduction:
Autonomous driving involves complex decision-making based on the information perceived by different sensors such as cameras, LiDAR, radar, ultrasonic sensors, etc. These sensors sense the environment around the car based on which the decision-making algorithm takes an action. In our project, we are using cameras to determine different objects/people around the car which either convey information or play an essential role in the decision-making process. We are trying to determine cars, pedestrians, lanes, and track objects on the road by using dash camera input. The video recorded by the dash camera is processed frame by frame to determine the aforementioned objects.
Approach:
For some of the applications, we implemented more than one method and qualitatively compared them to pick the best one for that application. Here are the applications we worked on:
People Detection: We implemented two different methods for this application and later picked the best one.
Histogram of oriented gradients (HOG) feature descriptor [1]: OpenCV features an implementation of a human detection method, called HOG (Histogram of Oriented Gradients) [11]. The main idea behind this algorithm is to use gradients with orientations to create histogram boxes of the images, and then use these histograms as descriptors to train a support vector machine. Fortunately, OpenCV has already trained a classifier model for detecting people which we will take advantage of in this project.
Pre-trained YOLO object detection model [9]: YOLO is an object detection algorithm that stands for You Only Look Once. This model we are using is trained to detect and localize various objects including cars, traffic lights, and humans. We will take advantage of this model to identify any pedestrians in the dashcam video.
Car detection: We implemented two different methods for this application and later picked the best one.
Haar Cascade approach [13]: Haar Cascade works with Haar wavelets to analyze image pixels into squares. This classifier works just like convolution kernels, where we try to extract different features of the image with an "integral image". This uses the AdaBoost learning algorithm, selecting a small amount of features from a large set of images. The training is done using a large set of positive and negative images and the trained model is stored in an XML file. Fortunately, we will be using a pre-trained model for this project to detect cars.
Pre-trained YOLO object detection model [9]: We will use the same pre-trained model [9] to detect both pedestrians and cars.
Lane detection: Canny Edge Detection + Image masking + Hough Transformation [3]
First, we applied the gaussian blur to smoothen the image, and then we applied the canny edge detector to the image. Then we mask the image to just get the portion of lanes. Next, the image is thresholded so that all that is left are the white lines of interest. Hough Line Transformation will then find the lines that pass through the lane lines. These operations are performed on every frame to create a continuous output of lane detection.
Car tracking: Kanade-Lucas-Tomashi (KLT) feature selection and object tracking [5][6][7]
Our KLT approach was inspired by but built upon the Geeks for Geeks [6] implementation of OpenCV's [5][7] goodFeaturesToTrack() and calcOpticalFlowPyrLK() methods which respectively implement Lucas and Kanade's feature detector and Tomasi, Lucas, and Kanade's sparse optical flow algorithm that we researched from the original paper [8] and OpenCV's documentation [5][7]. The goal of the modifications to the aforementioned approach was to not just track all points that the feature detector picked up, but to reduce the number of points picked up to only track points associated with surrounding vehicles. Harris corner detection was used instead, as well as image masking to limit the region of interest where starting points could be limited to the road, excluding irrelevant points associated with the background and skyline. Algorithm parameters were also tuned to this end, thresholding the minimum strength of the eigenvalues associated with the corners detected, increasing the number of points of interest associated with objects of interest, and eliminating all the points associated with objects that were not of interest.
Experimental Setup:
All data were collected from a dash cam on a drive through downtown Blacksburg that had a wide variety of environments and objects, from crowded garages to open highways, from mostly cars to mostly pedestrian activity. Each model processed sections of the same video set using its own looping Python scripts making use of OpenCV libraries that processed the video one frame at a time.
Qualitative Results:
The following list summarizes the results we have achieved:
People detection using HOG descriptor:
We found that the HOG descriptor was extremely slow and produced a lot of false positives. Here are some of the cases where the HOG descriptor was successful:
Although in most of the cases HOG produced false positives:
The following GIF captures the entire animation:
We used this method to identify humans in some random images and found following results.
People detection using YOLO pre-trained model
This model works extremely well in identifying all the objects it was trained to identify. We used the same above images to identify people using YOLO pre-trained model and found the following results:
1
2
3
4
The above data are the F-1 scores of the YOLO object detector applied to the above pedestrian input images. Individual and overall F-1performance scores were calculated with the formula F-1 = TP / (TP + 0.5*(FP + FN)) where true positive (TP), false positive (FP), and false negative (FN) were counted manually from the output images. The overall performance scored 0.889. It can be clearly seen that YOLO model for people detection is far more successful as compared to the HOG descriptor method. However, the model was trained to identify more than just people. So, it identifies all the objects it was trained to identify and it was difficult for us to classify them differently. For example, in the following animation it detects all the objects it was trained for.
Car detection using Haar cascade:
This method successfully identified some of the cars when they were very close to the camera. However, some of the other times, it identified false positives.
We also implemented this method on some random pictures with cars and it performed very poorly as shown in the pictures below.
Most of the objects detected in the above images were not even cars. So, we can discard this method as it does not even perform better than a random classifier.
Car detection using YOLO pre-trained model:
We implemented the YOLO model on the same set of videos and pictures and got pretty good results as seen below.
Video output:
Photo output:
1
2
3
4
5
The following data are the F-1 scores of the YOLO object detector for detecting cars and pedestrians on the above input car images. The overall performance scored an acceptable 0.773. The technique is actually very robust, but we've observed that compared with a human observer it has issues with lines of parked cars stretching into the distance. Even so, it performs far better than needed for driving applications.
Lane detection:
The following image briefly describes our approach for finding lanes:
The following video shows the application of the above described method. We can easily see that our algorithm successfully finds the lane.
Object Tracking using KLT approach:
The Lucas and Kanade's feature detector (making use of Harris corner detection) selects the points with the strongest respective eigenvalues within the initial video frame while Tomasi, Lucas, and Kanade's sparse optical flow algorithm tracks the motion of these points as long as they're not occluded. The following sample video shows the successful performance of our implementation. In the initial frame only the right side of the car is shown which becomes associated with a number of tracking corners and they are continuously tracked as the car passes the driver and disappears into the distance, even as the surface where the corners lie begins turning away from the viewer.
Conclusion:
Perceiving information from the environment with a high degree of reliability is crucial in autonomous driving. In our exploration of these various techniques, we have found only YOLO, Hough transform, and KLT to perform consistently enough, i.e., for considerable periods of unbroken functionality allowing perhaps a few edge cases where errors arose. In the case of YOLO, its F-1 scores performed very well, with exceptions of where the missed classifications were still on objects that had low relevancy to decisions an autonomous vehicle would have to make. Haar cascades and HOG descriptors on the other hand are consistently noisy and therefore unsuitable for our application even for short periods of dash-cam data.
References:
[1] M. Kachouane, S. Sahki, M. Lakrouf and N. Ouadah, "HOG based fast human detection," 2012 24th International Conference on Microelectronics (ICM), 2012, pp. 1-4, doi: 10.1109/ICM.2012.6471380.
[2] A. Arunmozhi, S. Gotadki, J. Park and U. Gosavi, "Stop Sign and Stop Line Detection and Distance Calculation for Autonomous Vehicle Control," 2018 IEEE International Conference on Electro/Information Technology (EIT), 2018, pp. 0356-0361, doi: 10.1109/EIT.2018.8500268.
[3] X. Yan, Z. Wang and J. Yang, "Lane Line Detection based on Machine Vision," 2022 IEEE 4th International Conference on Power, Intelligent Computing and Systems (ICPICS), 2022, pp. 94-97, doi: 10.1109/ICPICS55264.2022.9873783.
[4] X. Cao, J. Lan, P. Yan and X. Li, "KLT Feature Based Vehicle Detection and Tracking in Airborne Videos," 2011 Sixth International Conference on Image and Graphics, 2011, pp. 673-678, doi: 10.1109/ICIG.2011.92.
[5] Feature Detection Image Processing (no date) OpenCV. Available at: https://docs.opencv.org/4.x/dd/d1a/group__imgproc__feature.html#ga1d6bb77486c8f92d79c8793ad995d541.
[6] Mankumar, A. (2020) Python OpenCV: Optical Flow with Lucas-Kanade method, Geeks for Geeks. Available at: https://www.geeksforgeeks.org/python-opencv-optical-flow-with-lucas-kanade-method/.
[7] Object Tracking Video Analysis (no date) OpenCV. Available at: https://docs.opencv.org/4.x/dc/d6b/group__video__track.html#ga473e4b886d0bcc6b65831eb88ed93323.
[8] Carlo Tomasi and Takeo Kanade. Detection and Tracking of Point Features. Carnegie Mellon University Technical Report CMU-CS-91-132, April 1991.
[9] S. Canu, “Object tracking from scratch - OpenCV and python,” Pysource, Oct. 05, 2021. https://pysource.com/2021/10/05/object-tracking-from-scratch-opencv-and-python/ (accessed Nov. 29, 2022).
[10] P. Joshi, “Hands-On Tutorial on Real Time Lane Detection using OpenCV,” Analytics Vidhya, May 12, 2020. https://www.analyticsvidhya.com/blog/2020/05/tutorial-real-time-lane-detection-opencv/
[11] “OpenCV: HOGDescriptor Struct Reference,” docs.opencv.org. https://docs.opencv.org/3.4/d5/d33/structcv_1_1HOGDescriptor.html
[12] A. Sobral, “andrewssobral/vehicle_detection_haarcascades,” GitHub, Sep. 22, 2020. https://github.com/andrewssobral/vehicle_detection_haarcascades
[13] P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple features," Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 2001, pp. I-I, doi: 10.1109/CVPR.2001.990517.