Taxonomic Categories

Introduction

What is taxonomy?

Biological taxonomy is concerned with naming and categorizing organisms. Taxonomic categories range from high level classifications - if an organism is an animal or a plant, for example - to low level classifications of what particular species an organism is. Research-grade observations from iNaturalist come with a complete set of taxonomic categories. This workflow explores ways of analyzing and visualizing this aspect of the data.

What questions does this workflow explore?

What are the most popularly identified taxonomic categories among iNaturalist's scientific observations?
Do different geographic regions have different popular taxonomic categories?

Image by Lauren Glevanik, some rights reserved

This is a California Scrub-Jay. But we can also say that this is an animal in Kingdom Animalia, a bird in Class Aves. Taken altogether, these are the categories that this single iNaturalist observation would have:

Tools Used

PostgreSQL as a local database for storing the iNaturalist dataset.
Python from an Anaconda installation, along with Jupyter Lab and Visual Studio Code for editing.
- Python was used to access the local database, to group, filter, and otherwise organize the data, and to make most visualizations.
- The Python libraries numpy, pandas, psycopg2, matplotlib, seaborn, wordcloud, and PIL were used.
Gephi to create an image of a graph from the data.
ArcGIS Insights to design embed-able maps from the data.
Adobe Illustrator to prepare images for the website.

Results

I. What are the most popularly identified taxonomic categories?

Pie charts were the initial attempt at answering this question. However, charts quickly became overcrowded when applied to anything below the rank of Kingdoms.

Bar charts were a better way to compare many different categories than pie charts. These visualizations were able to suggest quite a lot about the data:

iNaturalist has produced many more research-grade observations on animals and plants (Kingdoms Animalia and Plantae) that on fungi. Microscopic living organisms and viruses do get observed, but are basically negligible compared to other kingdoms.
iNaturalist has produced many more observations on chordates (animals with muscles and backbones) and arthropods (animals with exoskeletons) than any other phylum of animal. 99% of plant observations are in Phylum Tracheophyta - vascular plants. This excludes lichens and moss, which perhaps are considered "boring" to observe and identify among iNaturalist's user-base.
Class Aves (birds) and Class Insecta have many more observations in Kingdom Animalia that the runner ups: reptiles, mammals, amphibians, and Actinopterygii (ray-fined fish). Many animal classes have relatively few observations: one possible explanation is that deep-sea creatures are not easily observed and identified by most iNaturalist users. Other classes like Diplopoda (millipedes) might be explained by being a narrow class compared to a broad classes like Insecta.
Though there can be thousands of different species in a class of animals, top species can still represent over 1% of the whole class. This suggests a small handful of species are popular, with most species receiving relatively few observations.
The bird and insect species with the most observations are all well known, common species in North America. Most are migratory species. Their popularity and geographic range might contribute to their popularity.

Wordclouds were an interesting way to display taxonomic data. They allowed significantly more categories to be compared than allowed by pie or bar charts, at the cost of displaying an exact count of observations. They also allowed for more creative visualizations, such as masking all taxonomic categories in Class Aves by a silhouette of its most identified species.

Order Passiformes was the most frequent category among birds. This is reasonable, since Passiformes refers to songbird and includes over half of all species of birds.

Graphs were another way of visualizing taxonomic information. They have the advantage of showing the relation of taxonomic categories with other taxa. The graph below was generated using ForceAtlas2 algorithm in Gephi, with the parameters "Dissuade Hubs" and "Prevent Overlap" selected. Node and label sizes were scaled by the natural log of observation counts. Space was a bit limited: this visualization might have been better suited for an interactive dashboard that only display taxonomic information relative to a currently selected taxonomic category.

II. Do different geographic regions have different popular taxonomic categories?

The different classes in Kingdom Animalia in the United States were analyzed to see if there were any differences.

The percentage of observations for each class in Kingdom Animalia were calculated for the US as a whole, and then for each individual state and D.C. While insects and birds are the two most identified classes of animals, there is a lot of variation between them in individual states. Two other outliers:

Hawaii's animal identifications included over 20% fish, with fish representing no more than 5% in every other state.
Mammals make up significantly more observations in Wyoming than in any other state.

In an attempt to measure how different each state's animal identifications are, the following method was used:

The percentages of each class in a state were treated as an n-dimensional vector.
The euclidean distance between a state's vector and a baseline vector was calculated. In this analysis, the U.S. as a whole and the state of CA were used as baselines.

Interestingly, California was not in the top half of states most identical to the U.S.'s percentages as a whole, even though California has more identifications than any other state.

These tables were exported as .csv and used to plot an ArcGIS Insights map of these distances, in order to observe any spatial relationships between states.

California appears to be most similar to states on the Pacific West and least similar with states in the South and the Midlands of the U.S.