Automotive Perception

Team-members: Ritish Shailly and Aaron Giuffré

Fall 2022 ECE 5554 Computer Vision: Course Project

Virginia Tech

Abstract: This work applies computer vision models to dash cam video data of downtown Blacksburg in order to recognize artifacts useful to autonomous driving. Aside from the application, the goal of this work is to test the efficacy of various models. Video data is processed frame by frame to highlight relevant objects such as pedestrians, cars, and lanes. Pedestrian recognition is done using 2 different methods: Histogram of Oriented Gradients (HOG) and by using a pre-trained YOLO model for pedestrian recognition.[1]  The trained model used for HOG was obtained from the OpenCV library. Car detection was done using Haar cascades pre-trained model and a pre-trained YOLO model. Car tracking is done with the Kanade-Lucas-Tomashi (KLT) feature tracker and sparse optical flow algorithm[5][6][7]. Lane detection is done with a combination of image preprocessing, image masking, Canny Edge Detection, and Hough Transformation. [10] As result, each model shows a qualitative demonstration of the video processing. It can be seen that the pre-trained YOLO model performs better for car and people detection, the Hough Transform approach detects the lane in real-time and the KLT approach works well for tracking an object in a video. 

Teaser Figure

Introduction:

Autonomous driving involves complex decision-making based on the information perceived by different sensors such as cameras, LiDAR, radar, ultrasonic sensors, etc.  These sensors sense the environment around the car based on which the decision-making algorithm takes an action. In our project, we are using cameras to determine different objects/people around the car which either convey information or play an essential role in the decision-making process. We are trying to determine cars, pedestrians, lanes, and track objects on the road by using dash camera input. The video recorded by the dash camera is processed frame by frame to determine the aforementioned objects. 

Approach:

For some of the applications, we implemented more than one method and qualitatively compared them to pick the best one for that application. Here are the applications we worked on:


Experimental Setup:

All data were collected from a dash cam on a drive through downtown Blacksburg that had a wide variety of environments and objects, from crowded garages to open highways, from mostly cars to mostly pedestrian activity. Each model processed sections of the same video set using its own looping Python scripts making use of OpenCV libraries that processed the video one frame at a time. 


Qualitative Results: 

The following list summarizes the results we have achieved:

We found that the HOG descriptor was extremely slow and produced a lot of false positives. Here are some of the cases where the HOG descriptor was successful:

Although in most of the cases HOG produced false positives: 

The following GIF captures the entire animation:

We used this method to identify humans in some random images and found following results.

This model works extremely well in identifying all the objects it was trained to identify. We used the same above images to identify people using YOLO pre-trained model and found the following results:

 1

  2

  3

   4

The above data are the F-1 scores of the YOLO object detector applied to the above pedestrian input images. Individual and overall F-1performance scores were calculated with the formula F-1 = TP / (TP + 0.5*(FP + FN)) where true positive (TP), false positive (FP), and false negative (FN) were counted manually from the output images. The overall performance scored 0.889. It can be clearly seen that YOLO model for people detection is far more successful as compared to the HOG descriptor method. However, the model was trained to identify more than just people. So, it identifies all the objects it was trained to identify and it was difficult for us to classify them differently. For example, in the following animation it detects all the objects it was trained for. 

This method successfully identified some of the cars when they were very close to the camera. However, some of the other times, it identified false positives.

We also implemented this method on some random pictures with cars and it performed very poorly as shown in the pictures below.

Most of the objects detected in the above images were not even cars. So, we can discard this method as it does not even perform better than a random classifier.

We implemented the YOLO model on the same set of videos and pictures and got pretty good results as seen below. 

     Video output:

Photo output:

1

2

 3

 4

 5

The following data are the F-1 scores of the YOLO object detector for detecting cars and pedestrians on the above input car images. The overall performance scored an acceptable 0.773. The technique is actually very robust, but we've observed that compared with a human observer it has issues with lines of parked cars stretching into the distance. Even so, it performs far better than needed for driving applications.

The following image briefly describes our approach for finding lanes:


The following video shows the application of the above described method. We can easily see that our algorithm successfully finds the lane.

The Lucas and Kanade's feature detector (making use of Harris corner detection) selects the points with the strongest respective eigenvalues within the initial video frame while Tomasi, Lucas, and Kanade's sparse optical flow algorithm tracks the motion of these points as long as they're not occluded. The following sample video shows the successful performance of our implementation. In the initial frame only the right side of the car is shown which becomes associated with a number of tracking corners and they are continuously tracked as the car passes the driver and disappears into the distance, even as the surface where the corners lie begins turning away from the viewer.

Conclusion:

Perceiving information from the environment with a high degree of reliability is crucial in autonomous driving. In our exploration of these various techniques, we have found only YOLO, Hough transform, and KLT to perform consistently enough, i.e., for considerable periods of unbroken functionality allowing perhaps a few edge cases where errors arose. In the case of YOLO, its F-1 scores performed very well, with exceptions of where the missed classifications were still on objects that had low relevancy to decisions an autonomous vehicle would have to make.  Haar cascades and HOG descriptors on the other hand are consistently noisy and therefore unsuitable for our application even for short periods of dash-cam data. 

References:

[1] M. Kachouane, S. Sahki, M. Lakrouf and N. Ouadah, "HOG based fast human detection," 2012 24th International Conference on Microelectronics (ICM), 2012, pp. 1-4, doi: 10.1109/ICM.2012.6471380.

[2] A. Arunmozhi, S. Gotadki, J. Park and U. Gosavi, "Stop Sign and Stop Line Detection and Distance Calculation for Autonomous Vehicle Control," 2018 IEEE International Conference on Electro/Information Technology (EIT), 2018, pp. 0356-0361, doi: 10.1109/EIT.2018.8500268.

[3] X. Yan, Z. Wang and J. Yang, "Lane Line Detection based on Machine Vision," 2022 IEEE 4th International Conference on Power, Intelligent Computing and Systems (ICPICS), 2022, pp. 94-97, doi: 10.1109/ICPICS55264.2022.9873783.

[4] X. Cao, J. Lan, P. Yan and X. Li, "KLT Feature Based Vehicle Detection and Tracking in Airborne Videos," 2011 Sixth International Conference on Image and Graphics, 2011, pp. 673-678, doi: 10.1109/ICIG.2011.92.

[5] Feature Detection Image Processing (no date) OpenCV. Available at: https://docs.opencv.org/4.x/dd/d1a/group__imgproc__feature.html#ga1d6bb77486c8f92d79c8793ad995d541.

[6] Mankumar, A. (2020) Python OpenCV: Optical Flow with Lucas-Kanade method, Geeks for Geeks. Available at: https://www.geeksforgeeks.org/python-opencv-optical-flow-with-lucas-kanade-method/.

[7] Object Tracking Video Analysis (no date) OpenCV. Available at: https://docs.opencv.org/4.x/dc/d6b/group__video__track.html#ga473e4b886d0bcc6b65831eb88ed93323.

[8] Carlo Tomasi and Takeo Kanade. Detection and Tracking of Point Features. Carnegie Mellon University Technical Report CMU-CS-91-132, April 1991. 

[9] S. Canu, “Object tracking from scratch - OpenCV and python,” Pysource, Oct. 05, 2021. https://pysource.com/2021/10/05/object-tracking-from-scratch-opencv-and-python/ (accessed Nov. 29, 2022).

[10] P. Joshi, “Hands-On Tutorial on Real Time Lane Detection using OpenCV,” Analytics Vidhya, May 12, 2020. https://www.analyticsvidhya.com/blog/2020/05/tutorial-real-time-lane-detection-opencv/

[11] “OpenCV: HOGDescriptor Struct Reference,” docs.opencv.org. https://docs.opencv.org/3.4/d5/d33/structcv_1_1HOGDescriptor.html

‌[12] A. Sobral, “andrewssobral/vehicle_detection_haarcascades,” GitHub, Sep. 22, 2020. https://github.com/andrewssobral/vehicle_detection_haarcascades

[13] P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple features," Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 2001, pp. I-I, doi: 10.1109/CVPR.2001.990517.