SPQRDataset

Structural Pruning for Multi-Object Detection

on NAO Robots

In this paper, we propose a real-time multiclass detection system for the NAO robot during Robocup Standard Platform League competition. The system is capable of detecting various objects, including the ball, goalposts, and other robots, using a single camera mounted on the robot. To achieve real-time performance, we first studied and report different architectures, then, we developed an approach capable to perform multiclass detection using state-of-the-art pruning techniques to reduce the number of nodes of the neural networks. The goal has been to guarantee high speed and accuracy trade-offs suitable for the limited computational resources of the NAO robot. Moreover, we demonstrate that our system can run in real-time on the NAO robot with a frame rate of 30 frames per second, which is sufficient for soccer competition.

We release our annotated dataset, which consists of over numero images of various objects in the RoboCup soccer field. In conclusion, our proposed real-time multiclass detection system, based on a structural pruning of MobileNet-SSD2 model, is a promising solution for the NAO robot in RoboCup soccer competition. The released dataset can serve as a benchmark for future research in this area.

In RoboCup scenarios and on-edge devices, it is often hard to reach a trade-off between accuracy and performance in execution time. Often the approaches rely on the manual adaptation of the neural network by changing layers and activation functions. This becomes harder when the pruning affects a high number of

network parameters.

To tackle this, we proposed a pipeline to shrink a Pytorch Neural Network in several steps, relying on an automated approach to structural pruning: train a neural network derived from tiny YOLOv7, prune it, perform a fine-tuning phase, then prune it again and perform many iterations until the best trade-off between accuracy and timing is achieved.

Proposed Models

The architecture of the neural network is based on tiny YOLOv7, provided with a reduced number of layers and parameters. The software automatically downloads the dataset, prepares it and trains the network. Then, it iteratively prunes the trained neural network and computes the accuracy. We propose three neural networks obtained by applying the described algorithm. The performance and timing are shown in the table below and the proposed networks have been written in bold. The YTv7 stands for "YoloV7 tiny - derived" and the number after the underscore is the total number of parameters of the neural network.

Examples of detections on real images

The images are taken from the RoboEireann dataset and the detections are shown with bounding boxes of different colors, one color for each class. The confidence is shown associated to the box itself.

Results

Below, the confusion matrix of the proposed model, namely, YTv7_12K.

Average inference timing computed on the NAO CPU using PyTorch and on the test set of the RoboEireann dataset. Input images were resized to be squared. Smaller input sizes have been not considered to avoid a high drop inthe mean average precision.

In the table below, we show relationship among latency, mAP, and input size. Each color corresponds to a specific model. For each model, we plotted four spots, one for each input size, written in the plot. Latencies were computed on the NAO V6 CPU.

Resources

Repository:

https://github.com/gracaliffo94/yolo_models_rc23

Dataset:

drive_link

https://tinyurl.com/38j6jd5x

Citation

1. Albani, D., Youssef, A., Suriani, V., Nardi, D., Bloisi, D.D.: A deep learning approach for object recognition with nao soccer robots. Lecture Notes in Computer Science (2017)

2. Bloisi, D., Duchetto, F.D., Manoni, T., Suriani, V.: Machine learning for realistic ball detection in robocup spl (2017)

3. Fang, G., Ma, X., Song, M., Mi, M.B., Wang, X.: Depgraph: Towards any structural pruning (2023)

4. Leiva, F., Cruz, N., Bugueño, I., Ruiz-del Solar, J.: Playing soccer without colors in the spl: A convolutional neural network approach. In: Holz, D., Genter, K., Saad, M., von Stryk, O. (eds.) RoboCup 2018: Robot World Cup XXII. pp. 122–134. Springer International Publishing, Cham (2019)

5. Narayanaswami, S.K., Tec, M., Durugkar, I., Desai, S., Masetty, B., Narvekar, S., Stone, P.: Towards a real-time, low-resource, end-to-end object detection pipeline for robot soccer. In: RoboCup 2022: Robot World Cup XXV, pp. 62–74. Springer (2023)

6. Szemenyei, M., Estivill-Castro, V.: Robo: Robust, fully neural object detection for robot soccer. In: RoboCup 2019: Robot World Cup XXIII 23. pp. 309–322. Springer (2019)

7. Thielke, F., Hasselbring, A.: A JIT compiler for neural network inference. In: RoboCup 2019: Robot World Cup XXIII, pp. 448–456. Springer International Publishing (2019). https://doi.org/10.1007/978-3-030-35699-6 3 6, https://doi.org/10.1007%2F978-3-030-35699-6_36

8. Yao, Z., Douglas, W., O’Keeffe, S., Villing, R.: Faster yolo-lite: Faster object detection on robot and edge devices. In: RoboCup 2021: Robot World Cup XXIV, pp. 226–237. Springer (2022)

Page updated

Report abuse