My group of thesis students and I participated in the PlantCLEF 2022 challenge, which consisted of identifying a plant species among 80,000 species. This is a highly complex problem, as the dataset is highly imbalanced and has a large number of species, aiming to eventually classify any plant species on Earth. We did not have enough computing power for the competition to try heavy models such as Transformers, so we focused on reducing model sizes while keeping good results. We developed a 2-level hierarchical softmax, following ideas on hierarchical loss functions of my own doctoral studies. The 2-level hierarchical softmax consists of 2 softmax layers connected to each other in cascade fashion, without using a binary tree as the original hierarchical softmax used commonly in NLP. This is because the plant taxonomy forces nodes to have more than 2 children. Each node of the top layer has paths to the bottom layer nodes, and the final classification is routed by trained parameters. The 2-level hierarchical softmax helped in reducing model sizes by a factor of 5.67 while not hurting accuracy, by replacing the vanilla logits layer, which grows linearly with the number of classes. The paper has been accepted for publication, and we won the 4th place in the competition, while having models 5.67x smaller than commonly used fine-tuned models, with limited computing resources. This will impact the development of plant identification systems, and help researchers with less access to large resources to have the possibility to study models at such large species scales.
(TEC, Costa Rica)
Herbarium images make the plant look old and dry, with even holes and missing parts. Some of those plants may exist only as herbarium samples, but are extinct in the field. This work aims to do image-to-image translation to convert old-looking images into green plant images, and vice versa, effectively rejuvenating the plant. It does not need pairs of images of the same plant. We have based our work in CycleGAN, a model used to learn a mapping between a domain A to domain B and B to A. This work is pioneering work in this topic. To our knowledge, previous work on this is non-existent. We also provide a dataset for the task, which is expected to be used by researchers who pick up this line of research too. We have found that shape is important and normally hard to keep between domains, so we have developed a loss function based on the well known Otsu segmentation, which is used to learn a mask of the plant and forces the generated image to keep it. The first paper is in acceptance with changes, and we are working on them for complete acceptance.
(TEC, Costa Rica)
The previous topic is a deterministic approach to rejuvenate plant images. However, one problem in the plant domain is the lack of images for a lot of species, causing high imbalance in the datasets. In this work we explore mapping noise to different styles, to generate multiple versions of images conditioned on the species. The aim is to generate plausible images that keep phenotypic characteristics of the species. Ther work is currently in progress, but we already have preliminary results that show the potential, using only flowers.
(TEC, Costa Rica)
Another way to tackle class imbalance is using the high amount of unlabeled plant images available in systems such as Pl@ntNet, where users capture plant images which are not yet confirmed by human experts. These images currently go to waste and are not used. However such unlabeled plant images may contain useful visual features, and by using self-supervision the intuition is that a pre-training phase may learn such features so species with fewer images get better performance out of the classifiers. This topic is still work in progress.
As part of my doctoral studies, together with my co-advisors from INRIA and CIRAD in Montpellier, France, I spent several months working on using deep learning with herbarium images (pressed, dried plants conserved in museums) as well as exploring hierarchical loss functions taking advantage of the plant taxonomy. Such data was not explored before, but is abundant as taxonomists have been digitizing millions of specimens in herbarium institutions. It is, however, visually very different from plant field images.
Our main publication, highlighted by Nature and NVIDIA, shows the potential of using herbarium images for automatic plant identification as an additional data source, opening a new domain adaptation (DA) research area. It opened the usage of herbarium images beyond work at herbaria and museums, impacting their importance. Additionally, subsequent work on DA has been done afterwards, including DA challenges in the PlantCLEF challenge.
We explored the plant taxonomy in the machine learning domain for the first time, as it offers a unique setting with an intrinsic class hierarchy (species, genera, family, etc). Exploiting such hierarchy offers the opportunity to learn representations at several class levels, which potentially could help on class imbalance, a common situation in the plant domain. I explored the construction of architectures for several taxonomic levels, as well as a new loss function which exploited the class hierarchy by taking the joint probability of a species with its higher taxa. This work led to a chapter in a biodiversity informatics book.
We wanted to study if specific, prior knowledge injected in deep learning models’ layers would improve classification performance. Image filters are the conclusion of decades of research on image enhancement, building up a useful prior for image pre-processing. We studied a traditional image filter, namely unsharp masking, as a differential layer within deep learning models. The filter’s parameters are traditionally dataset-dependent, and assigned manually by human intervention. Instead, our unsharp mask layer’s parameters were trainable, and I coded the differential version of the filter as a trainable layer in Pytorch. The model was able to find useful values for such filter parameters on its own, while classifying images in several datasets. This work shows how traditional image processing techniques (previous knowledge) can be injected in deep learning models, and be automatically calibrated using gradient descent. This work led to a paper, and the code of the new layer is publicly available here.