iBioHash2024

Start Date: March 8 End Date:  18 May Competition URL: Kaggle

Context

Fine-grained image Analysis (FGIA) is a longstanding and fundamental problem in computer vision and pattern recognition and underpins a diverse set of real-world applications. The task of FGIA targets analyzing visual objects from subordinate categories, e.g., species of birds or models of cars. The small inter-class and large intra-class variations inherent to fine-grained image analysis make it a challenging problem.

Fine-grained image retrieval, as a crucial research area of FGIA, aims to retrieve images belonging to multiple subordinate categories of a super-category (aka a meta-category). Its key challenge therefore lies in understanding fine-grained visual differences that sufficiently distinguish objects that are highly similar in overall appearance, but differ in fine-grained features. Also, fine-grained retrieval still demands ranking all the instances so that images depicting the concept of interest are ranked highest based on the fine-grained details in the query.

In particular, with the explosive growth of fine-grained data in real applications, fine-grained hashing, as a promising solution for dealing with large-scale fine-grained retrieval tasks, has proven to be able to greatly reduce the storage cost and increase the query speed benefiting from the learned compact binary hash code representations. The large-scale fine-grained hashing search problem refers to the task of learning binary hashing codes for large-scale fine-grained image retrieval. It is desirable to generate compact binary hash bits for fine-grained images sharing both large intra-class variances and small inter-class variances.

Task Description and Data

Task Description: Different from last year, the task can now be described as large-scale zero-shot fine-grained image hashing. Participants have to provide low dimensional binary hash codes for all the images in the gallery/query set (total 1,000 sub-categories) by leveraging the provided training data obtained from iNaturalist. The overall goal of this challenge is to evaluate the retrieval performance of those state-of-the-art algorithms associated with retrieval and hashing, which helps greatly reduce storage costs and increase query speeds.

Data: The dataset contains 2,000 fine-grained natural biology categories. More concretely, 1,000 categories with 400,000 images are used for training (each category has 400 images), while the remaining 1,000 categories with 97,817 images are divided into the query set (having 9,754 images) and the gallery set (having 88,063 images) for evaluation. Different from last year, The query and gallery data do not overlap exactly in "genus" with the data in the training set.

Acknowledgements

The images are provided by the online biodiversity platform --> iNaturalist.

Organizers