Our results show that both YOLO and ResNet can achieve moderate results on the task of cranberry growth stage classification, but there are a few differences that may make one model more attractive when compared to the other.
ResNet is much easier to train on an original dataset, such as the one we used here, because it does not require manual annotations of class instances within each image. The downside is that this means that ResNet cannot perform image segmentation in addition to object detection, which may be acceptable depending on the specific needs of the end user. Additionally, our results for both models were less than peak performance due to the limited amount of training data. YOLO in particular is recommended to have 10,000 labeled instances per class, while we only had around 500.
Furthermore, annotated images must contain labels for every visible object of every class, which is challenging when annotating images of cranberry plants. Because multiple growth stages may be visible in a single photo, annotators would need to be able to identify the growth stages for every plant, which further limits who can effectively annotate images. Insufficient annotation within images could contribute to decreased performance, and in particular to a high false negative rate as the model may be trained with data indicating a particular element within photos is part of the background.
The YOLO models were each trained for 1,000 epochs, which took around 3 hours on Google Colab per model. The ResNet model only took around 1 hour to train for 10 epochs on a personal computer. The image segmentation feature of YOLO leads to increased training requirements to achieve similar results as the ResNet model. As a result, the decision to use one model over the other must include the specific needs of the application, and whether they demand features from one model over the other. If a user does not have need for image segmentation, the increased training requirements (in addition to the time spent annotating training images) would be a waste, and would not be recommended. On the other hand, if more fine-grained analysis of real-world images is desired, this cannot be achieved using only the ResNet model due to its behavior of only classifying whole images, and not including any sort of segmentation or way of identifying multiple stages in one photo.
Given further opportunity to study this problem, we would aim to increase the amount and quality of training data. By including not only more images, but also more pictures taken with different lighting, weather, and angles, we would predict an increase in accuracy and generalizability of our models. In addition, and audit and formalization of annotation practices, as well as reviewing existing annotated images, could decrease the amount of false negatives produced by our YOLO model and produce a stronger dataset for training future models.
Due to the limited amount of time available for this project, some growth stages were omitted from our training. Including those stages would help provide a more complete system for identifying cranberry growth, leading to further utility for the users. Finally, a new method to evaluate the variety of growth stages detected in a scene and interpret the predicted development stage of the crop as a whole could provide a powerful tool for growers to monitor the health of their livelihood.
Entire image dataset is confidential. Few samples have been shown in the website.
References
Loresco, Pocholo & Dadios, Elmer. (2020). Vision-Based Lettuce Growth Stage Decision Support System Using Artificial Neural Networks. International Journal of Machine Learning and Computing. 10. 10.18178/ijmlc.2020.10.4.969.
Teimouri, Nima & Dyrmann, Mads & Rydahl, Per & Mathiassen, Solvejg & Somerville, Gayle & Jørgensen, Rasmus. (2018). Weed Growth Stage Estimator Using Deep Convolutional Neural Networks. Sensors. 18. 1580. 10.3390/s18051580.
Bochkovskiy, Alexey, et al. “YOLOv4: Optimal Speed and Accuracy of Object Detection.” ArXiv:2004.10934 [Cs, Eess], 1, Apr. 2020. arXiv.org, http://arxiv.org/abs/2004.10934.
Janowski, A., Kaźmierczak, R., Kowalczyk, C., & Szulwic, J. (2021). Detecting Apples in the Wild: Potential for Harvest Quantity Estimation. Sustainability, 13(14), 8054.
Santos, T. T., de Souza, L. L., dos Santos, A. A., & Avila, S. (2020). Grape detection, segmentation, and tracking using deep neural networks and three-dimensional association.
Computers and Electronics in Agriculture, 170, 105247.
Yu, Y., Zhang, K., Yang, L., & Zhang, D. (2019). Fruit detection for strawberry harvesting robot in non-structural environment based on Mask-RCNN. Computers and Electronics in Agriculture, 163, 104846.
Bochkovskiy, A., Wang, C., & Liao, H.M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. ArXiv, abs/2004.10934.
S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 1 June 2017, doi: 10.1109/TPAMI.2016.2577031