Data
Conventional data for these sites was collected as percent cover and with seperate sections for each site. An operational taxonomic unit (OTU) is a way to classify similar sequences together and cluster different species together. OTU's were clustered 97% to form species identity after comparing them to a local reference library for common vascular plants of Alberta. To remove "noise" from the metabarcoding data, all singletons, or species that were only observed once in sequencing data were removed (Clare et al. 2016). Processing was completed to combine these datasets including characterizing the "Observer" as either the field expert for the conventional survey, or the primer set that was used to identify that barcode in the DNA metabarcoding methods. Due to limitations in laboratory techniques, absolute abundance data is not available using DNA metabarcoding. As a result, we converted data to presence-absence data to facilitate comparisons of species composition across different methodologies.
Table 3: A simplified dataset as example of the presence absence data for the plant data collected. There are 263 plant species identified for this analysis.
Table 3 shows a simplified data table that I will use for my analyses:
Observer is a categorical variable and is the primer or the person used to observe the species, acting as the predictor variable that we are testing to determine if it influences species identification dependent on who or what was "observing" the species.
Site is an observed categorical variable that is a predictor of what species will be present in the plot.
Site type is a categorical variable that indicates whether the site is in a grassland or forested ecoregion.
Method is a categorical variable that is a predictor of the species communities that were identified.
Species are discrete variables with binary presence absence data. These are the response variables that I will be measuring dependent on the method and site variables.
The sampling unit is the different species that were observed in this study by each method and location.
Exploratory graphics
In figure 6, these example species show that it is 0 inflated data for species composition data with a higher frequency of absences over all the quadrats that were observed.
Figure 6: Four examples of species presence absence data, notice that these example species histograms are 0 inflated and that all of the species collected were only found in certain sites using certain methodologies (Starting at the top left, moving clockwise, common names include Ball Cacti, Box Elder, American vetch, and spiny phlox).
As we can see, these data are 0 inflated community compositions and we should use NMDS with bray-curtis distance methods (comparing the similarity between different compositions) to visualize the different factors.
Comparison between sites:
Results of NMDS exploratory analyses show the correlations and dissimilarities between methodology, ecoregion and field site. The final stress value for these plots is 0.198, indicating that this is a reasonable fit for the data. There is some overlap between the different forested sites (shown in Fig. 7 in green) as the ellipses overlap independent of the method of identification and we see the same similarites in the grassland ecosites. By clustering the species observation by quadrat, we get assemblages of presence/absence species data that shows species level assemblages independent of the methodology, to identify general communities. However, we can see from this analysis, that there is a distinct difference in plant assemblages between the different ecosites. Within the same NMDS ordination, we visualize the differences between ecoregions and can see that the sites within an ecoregion have similarities and overlap no matter the methodologies used to uncover the species identification (Fig. 7).
Figure 7: Visualizing the differences and similarities of the different sites where data was collected, coloured by the ecosite (Forest ecosites coloured green, Grasslands coloured yellow).
Different assemblages dependent on methodology:
Further analysis comparing conventional survey data to metabarcoding results reveals differences in the species collected. Clustering the observations by quadrat, we show that the composition of species collected by conventional methods in both ecoregions is not similar to the community composition that was collected using genetic methods by the quadrat (Fig. 8). It is apparent that there is a distinct plant assemblages collected from conventional methods. Each site has a distinct plant community that is visibly different using our field experts to delineate species. However, using genetic markers to resolve species identification, gives us a broader ecosite level resolution and differs significantly from the community composition by our field experts (See Fig. 8). Although we could not tease apart the distinctions between the different sites that we visited through the use of genetic tools, metabarcoding did collect a broader spectrum of species that could have been potentially uncovering a broader spectrum of species, including those that may have been missed or cryptic and underrepresented. There are significant changes that can be made in the bioinformatics pipeline that can potentially clean up some of these challenges in the future.
Figure 8: Comparison between methodologies coloured by the Observer at the quadrat community level.