To train our model, we manually annotated 435 images of cranberry plants according to the identified growth stage of the plants visible in the image. On average, each image 7.1 annotations, with some having as many as 40. We used this annotated dataset, divided into training, validation, and test sets, to train the YOLO model for 1000 epochs, which took about 3 hours in a Google Colab notebook.
Our initial dataset was composed of 9 classes, encompassing the length of the cranberry growth cycle: Dormancy, Bud Tight, Bud Break, Roughneck, Hook, Blossom, Fruit Set, Blush, and Mature Fruit. Due to the low amount of images available to represent the Dormancy stage, our later experiments excluded that category from training. Additionally, due to the similarity between the Bud Tight and Bud Break classes, we trained two models, one where the Bud classes are separate, and one where they are merged into a single Bud class.
Example of image annotation
Confusion matrix of the 1st YOLO model - Separate Bud Classes
Using the first model, containing separate Bud Tight and Bud Break stages, we saw mixed performance. When identifying fruit stages - Fruit Set, Blush, and Mature Fruit - the model showed good performance in identifying the plants within an image. As expected, there was some confusion when identifying the Bud classes, as some examples may have been borderline cases that the model was not able to distinguish.
The Roughneck stage saw particularly bad performance, which could be attributable to the amount of training data available. Similar to the Dormancy stage, the Roughneck stage was under-represented in our training data due to a lack of available images to annotate. When combined with the growth stage's similarity to the (more represented) Hook stage that follows, our model was not as capable at discerning the growth stage in our images.
Test results for Model 1: Separate Bud Classes
Confusion matrix of the 2nd YOLO model - Combined Bud Classes
The second model saw largely similar results. The consolidation of the two Bud stages helped slightly with reduction of overall missed predictions, but the problems from the first model still remain. One stage in particular, Blossom, could benefit from a re-evaluation of the annotations on the training images. Image annotation for use with YOLO benefits from image accuracy and image consistency. This means that for each image all objects of all classes should have properly-sized labels. For annotating cranberry growth, this means that the annotator should be able to identify the classes present in each image to properly label all of them. For the Blossom stage in particular (though it applies to all stages to some degree) this is a challenge because there are many images that were classified using that label with signs of Fruit Set examples also present in the image. As a result, this annotation style may have contributed to valid examples being classified as background/classless objects, negatively affecting the performance of our models.
Test results for Model 2: Combined Bud Classes
Additionally, while we have around 500 labeled class instances in our dataset, the recommended amount for optimal results is greater than 10,000 instances per class. Due to the limited time and number of input images, worsened by the amount of time needed to manually annotate the densely-packed cranberry images, we were unable to produce this level of training data. As a result, while the model works fairly well on our test set, it does not generalize as well on new images outside of the conditions present in the training data. Future work on this example would include increasing not only the amount of training data, but also the variety of lighting, angles, sources, seasons, and weather represented by the dataset.
Example of YOLO inference on a video
We decided to add a ResNet architecture to our image classification and identification methods due to its relative simplicity compared to YOLOv3. ResNet uses supervised learning to classify images as a whole, instead of using object detection and identification within each image. This dramatically reduces training time and computing power required. For our project, we hypothesized that ResNet will perform as well or better than YOLO due to the small morphology of cranberry features with reduced training time.
In total, we had 2,054 images across 8 different stages of cranberry growth (Bud Tight, Bud Break, Roughneck, Hook, Blossom, Fruit Set, Blush, and Mature Fruit), with an unbalanced number of images per growth stage. Images were organized into folders by growth stage, then split randomly into 3 subsets per folder: training (60%), validation (20%), and testing (20%). ResNet50 was trained on our dataset for 10 epochs with 20 layers each, which took 1 hour to run on a personal computer.
Test results for ResNet50
Confusion matrix for ResNet model containing 8 growth stage classes
Models Performance
The validation accuracy of the trained ResNet model was 94%, while the overall testing accuracy of the model was 69%. Overall, the stages which contained more images performed better, such as Blush and Blossom. Although, some stages with fewer images still had high precision, such as the Bud Tight class. This could possibly be due to the small testing and validation sample set with small variability leading to biased testing metrics. Similar to YOLO, the Roughneck class saw particularly poor performance.
The ResNet architecture performed similarly to YOLO, with a less time and manual effort involved. Many of the potential reasons why this architecture did not perform better align with the reasons hypothesized for YOLO- with small, unbalanced datasets are most likely the main contributor to poor performance. ResNet shows promise for better performance with classifying cranberry growth stages, and future work should include more images with varied lighting, weather conditions, angles, and seasons to increase model robustness.