Yolo v5 Obstacle Detection

This work proposes a people identification and tracking method based on Yolact, Deepsort and color feature extraction for an agricultural robot using a single depth camera. First, Yolact identification method was used to detect all the people in the frame, then their bounding box were processed to extract the main color of theirs clothes. Thus, the target person can be identified and the remaining ones discarded. Finally, the Deepsort algorithm was used to track the target person around the image frame. Experimental results show that in an indoor environment, with controlled light intensity and no occlusion of the target, it is possible to track accurately a person. In future work, more feature will be added to the algorithm so the method performs better in complex environments

Introduction

Robots are playing an important role by changing the way how farms operate, decreasing costs and making up for the manpower shortage. From simple robots that only classify apples to complex autonomous mobile systems that navigate and harvest fruits by them self; most of them uses computer vision and deep learning for classification and navigation purposes. However, real time object detection and classification is difficult and computationally expensive.

Our proposed method for people identification and tracking consists in using Yolo, Deepsort and color feature extraction. 

The main application for our proposed method is for agricultural mobile robots that must follow a farmer

Method

From now on the target person will be referred as master, and the objective will be to identify the master first alone and then identify him when surrounded by other people or objects.

Our proposed method is divided into 4 stages, as shown in the picture. The first stage is the picture acquisition using a depth camera, in the second stage the image is analyzed using Yolact, neural network. In the third stage, color feature extraction is performed, using HSV color space. Lastly, in the fourth stage the master is identified and tracked using Deepsort.

For taking the images an Intel RealSense D435i depth camera was used. Because besides, extracting 2D features we expect to get some depth features such as height of the people to help to identify the master.

This camera is equipped with an IMU that allows to refine depth awareness when the camera moves. Furthermore, it allows a better point-cloud alignment for SLAM and tracking applications.

Even though, Yolo algorithm and its variants are more commonly used, due to their speed, simplicity to construct and can be trained directly on full images, they fail to perform semantic segmentation. Thus, to detect all the people in a frame and their corresponding semantic segmentation Yolact was used. Since Yolact is based on Yolo it can also generates boxes in addition to masks and can process video sequences in real-time.

For this stage, considering that Yolact was trained in a general dataset, a custom image dataset was created based on pictures took in an agricultural field and then the Yolact network was trained to improve its accuracy in people detection.

In this stage, only the region detected in the semantic segmentation is analyzed and then the main colors of the clothes are extracted and saved for future tracking.

HSV color space can be used to extract important features for image segmentation and color histogram generation. Thus, for extracting the colors of the clothes, a HSV (Hue, Saturation and Value) color space analysis is performed. Then the identified colors will be marked with bounding boxes.

Although, in this stage mostly 2D processing is done, we also estimate the height of the people using depth information and semantic segmentation method.

At the end of this stage, after processing all the features and information the master is identified and its features fed to the next stage. Meanwhile, the features of the other people in the frame are deleted.

After the master identification and all the others people’s IDs are eliminated, the deepsort algorithm was used to track the master.

Deepsort algorithm was choose because it is able to track through long periods of occlusion with high accuracy, but it remains simple to implement and runs in real time. To keep track of the objects in a video sequence Deepsort uses Kalman filters for tracking detections and Hungarian algorithm for solving the association problem.

Results

Three tests were performed to check our proposed method:

In the first one only the master is present, the color of the T-shirt is set to blue and the color of his pants is set to blue. As shown in first video, our proposed method can identify the master and track him with a performance around of 30 FPS.

In the second one, the master and a random person are present, the color of the T-shirt of the master is set to blue and the color of his pants is set to blue. As shown in the second video, our proposed method can identify the master and track him with a performance around of 28 FPS.

In the third test, the master and the random person were switched, thus the color of the T-shirt of the master is set to red and the color of his pants is set to black. As shown in last video, our proposed method can identify the master and track him with a performance around of 28 FPS.

Test 1

Master: Blue clothes

Test 2

Master: Blue clothes

Test 3

Master: Red clothes

Conclusions

The proposed method was able to identify and track a master person. First the Yolo algorithm is used to identify all the people in the frame, then the master is identified among all the people by extracting the HSV features and finally the master is tracked using Deepsort algorithm. The results from our tests show that the algorithm is consistent and robust under controlled environments. Also, the proposed method makes efficient use of the available computational resources as the fps count remain around 30 for all the experiments.

Despite the high success rate in the performance of the proposed method, it has some limitations. For example, person identification is fully dependent on the Yolo algorithm thus if the weights are not correctly trained the performance is not good. In addition, the master identification is based only in color feature extraction, which becomes meaningless when there are people wearing clothes that have similar color.