This software analyzes images from camera traps to determine whether there is an animal present or not and complementary to output SS05010008-V6 allows direct identification of present animal — usually on a species level, but for some organisms, the identification is on a higher taxon level, e.g., genus, family etc. Developed software takes an input image and uses a deep neural network to predict a posterior probability P(C|I) where C is the class, and I is the given image. Allowed classes and their appropriate number of samples available for training are visualized in Figure 1.
Figure1: Training dataset -- Category distribution
We established a baseline to measure an improvement when testing the developed machine learning method. We trained several deep neural networks (DNN) image-based classifiers, namely convolutional neural networks like VGG, ResNet, ResNeXt, EfficientNet, ConvNeXt, and Vision Transformers like ViT and SwinT.
We used ImageNet-1k pre-trained checkpoints and fine-tuned them using Seasaw Loss. We measure the performance of the classifiers using Top-1 accuracy and F1 score. Results are logged into the Weights & Biases experiment management system. The report showing important results is available below.
Several recent works (Willi et al., 2018, Norouzzadeh et al., 2018) focus on processing images from motion-sensor camera traps to extract information about the behaviour of animals in the wild. However, applied methods have two disadvantages for real-world applications: (1) they use a two-stage workflow, which first identifies empty images and then classifies animals, and (2) the methods cannot identify multiple animals in one image.
Our focus is to train the DNN classifiers for multi-label classification. Different from single-label classifier, which predicts exactly one class in each image, predictions of multi-label classifier are nonexclusive, and a single image can have zero or multiple classes. This allows to identify empty images by setting a threshold on class confidences and allows to identify multiple animals in one image.