iNaturalist Activity in the United States
Users and Observations from 2014 - 2020
Initial Analysis
This workflow initially sought to classify the iNaturalist community by linking frequency of observations with county level demographic data from the 2018 American Community Survey 5 year estimates. All 2018 iNaturalist observations were aggregated to county level and scatter plots were created to compare observation frequency with population density, median income, educational attainment, and computer/broadband internet access. All R2 values for observation frequency and other variables were below 0.10 and a further exploratory regression analysis did not find any successful models for use in a geographic weighted regression analysis. This result is suggestive of a highly diverse iNaturalist community.
Early Visualization
A Kernel Density Estimation map was used to visualize the 1.4+ million iNaturalist observations recorded in the United States during 2018. A clear association between observations and major cities was discovered. This finding motivated further research into the spatial distribution of iNaturalist users over time.
Spatial Distribution of iNaturalist Users
In order to represent individual iNaturalist users, observations for every year from 2014 - 2020 were first aggregated by user. Coordinates associated with the records were used to determine each user's observation centroid. Although the centroid locations do not reveal residences, the team felt that the measure was appropriate for summarizing collections of distributed observations.
Heat maps showcase the spread of iNaturalist across the United States as the community grew from 8,998 users in 2014 to 138,990 users in 2020, as of September 25th.
Early observations were centered around San Francisco and Los Angeles, then spread in the Eastern Seaboard and Texas. By 2016, hot spots in the Pacific Northwest, the Great Lakes region, and Denver were beginning to emerge. By September of 2020, additional hot spots in Florida, Arizona, and throughout the Eastern United States appear.
Additional Insights
Total number of users and total observations have undergone rapid growth, especially during 2018 - 2019. Although 2020 data ends at September 25th, total users and observations have still increased. However, observations per user and mean distance from centroid both decreased in 2020, possibly in correspondence with COVID-19 related shut down orders. The mean distance between observations and a user's centroid indicate whether individual observations tend to be spread out across large distances or tightly clustered.
User Type: Single Observation or Repeat Users?
Approximately 2/3 of all iNaturalist users make more than one observation per active year!
Methodology & Tools
The iNaturalist GBIF DarwinCore Archive, American Community Survey data, and TIGER/Line shapefiles were loaded into a PostgreSQL database storage with cleaning and preprocessing conducted using SQL in the command line. The PostGIS extension was enabled to facilitate spatial operations, such as calculation of observation centroids and observation-centroid distances. Summary statistics and graphs were created in Python with the Pandas and Matplotlib libraries and ArcGIS Pro was used for map creation.
For more information about iNaturalist activity in the United States, contact Jessica Embury at jembury8568@sdsu.edu