Data

The Data Table

The eBird Basic Dataset contains geographic location, checklist duration (effort) in minutes, sample and observer IDs and an abundance vector for all species in North America. These data on bird abundance were transformed to an incidence vector (presence/absence), and then summed to create a species richness measurement for each checklist. This table was intersected with the pond geometries to attach the pond attributes to each observation. This formed the "subsample data table".

The study data table (Table 2) is a derivative of the subsample table, grouped by pond location, the year, and the month that the checklists took place. Each row is a sampling unit and has a unique sample ID with categorical pond attributes and effort bin classes in minutes. Richness is a continuous response variable of avian species richness on average checklists. Categorical predictor variables Pond Type, Perimeter/Area Ratio Class, Feature, and categorical covariate Effort Strata are shown.

Table 2. Study data table showing five random rows of data. Unique sample IDs were generated for each pond-month-week sampling unit row. Categorical predictor variables Pond Type, Perimeter/Area Ratio Class, Feature, and categorical covariate Effort are shown. The response variable is Avian Species Richness.

Data Exploration

The Effort Problem

Early exploration of the eBird basic dataset identified that the amount of time spent conducting a bird count was positively related to the species richness. Scatterplots of effort vs. species richness, facetted by categorical variables were plotted for all predictors (Figure 7). Some artifacts of the casual sampling style become visible, with higher number of counts on round numbers of minutes (30, 35, 45, 60, etc.). This informed the choice to classify effort by ranges, or effort strata, since eBird users appear to be roughly estimating their time, which indicates the effort variable is not continuous but discrete.

Sample Mean Distribution

Histograms were plotted for the richness response variable to determine if the distribution was normal in general and in different factor combinations. The distribution appears to be normal with some bimodal characteristics (Figure 8).  Richness histograms were also plotted within each effort strata, and all four showed approximately normal distributions with some bimodal qualities.

Study Balance

Exploration of study balance between the factors as well as checking for errors and outliers was performed by facetting multiple factors in boxplots (Figure 9). This was completed for all possible combinations of the predictor variables. In all of the facet plots, the positive relationship between search effort to species richness is again visible, further demonstrating the need to control for effort.  These graphs also pointed to some limitations in that there were few high effort observations at conventional stormponds with high perimeter-to-area ratios. 

Single predictors were also plotted with effort against richness to look for gaps or inconsistencies (Figure 10). In general, single variables had sufficient data with good spread and evenness between effort classes. Summary statistics were also computed for single predictors with effort covariate to numerically assess if different factor combinations had sufficient representation for further analysis (Table 3).

Table 3. Summary statistics of species richness observations at conventional and constructed wetland stormponds, stratified by effort levels in 30 minute blocks.

Figure 7. Scatterplot of avian species richness versus search effort at Edmonton stormwater ponds with or without internal features such as islands and peninsulas.

Figure 8. Histogram of avian species richness, coloured by minutes of checklist effort.

Figure 9. Exploring the interaction of Perimeter/Area Ratio and stormpond type on data spread, study balance. The data is stratified by birder effort to allow comparisons at different level of search effort.

Figure 10. Effect of internal features on type on data spread, study balance. The data is stratified by birder effort to allow comparisons at different level of search effort.