HerbariumChallenge2022

Overview

The Herbarium 2022: Flora of North America is a part of a project of the New York Botanical Garden funded by the National Science Foundation to build tools to identify novel plant species around the world. The dataset strives to represent all known vascular plant taxa in North America, using images gathered from 60 different botanical institutions around the world.

In botany, a ‘flora’ is a complete account of the plants found in a geographic region. The dichotomous keys and detailed descriptions of diagnostic morphological features contained within a flora are used by botanists to determine which names to apply to plant specimens. This year's competition dataset aims to encapsulate the flora of North America so that we can test the capability of artificial intelligence to replicate this traditional tool —a crucial first step to harnessing AI’s potential botanical applications.

The Herbarium 2022: Flora of North America dataset comprises 1.05 M images of 15,500 vascular plants, which constitute more than 90% of the taxa documented in North America. We used the comprehensive Checklist of the Vascular Plants of the Americas (VPA) produced by Missouri Botanical Garden and aligned the taxonomic names to The World Checklist of Vascular Plants (WCVP) from the Royal Botanical Garden Kew. Our dataset is constrained to include only vascular land plants (lycophytes, ferns, gymnosperms, and flowering plants).

Our dataset has a long-tail distribution. The number of images per taxon is as few as seven and as many as 100 images. Although more images are available, we capped the maximum number in an attempt to ensure sufficient but manageable training data size for competition participants.

Competition

Organizers

Acknowledgements

The images are provided by the New York Botanical Garden and 59 other institutions around the world.