Data

Preparing Data for Analysis

The lichen community survey was converted into a pivot table in which presence in a plot is indicated with a 1, and absence is indicated with a 0. This approach will facilitate the calculation of species richness and enable more straightforward analysis of lichen community composition at each site. I began by looking at my lichen dataset to ensure that any unknown samples were removed and that every site had a paired wellpad and reference sample location. Sites 8, 4, 16, and 17 were removed from the lichen data for this reason, leaving me with 14 sites total. I also looked at the number of samples for reference and wellpads at each age class, and noticed that the 20-year age class has two extra sites compared to the other age classes. In the end, I decided to keep these extra sites because they all have matching pairs, and PCA, NMDS, and PerMANOVA will deal well with this variability, and it will not affect my end analysis.

All the soil and vascular data were aggregated to the site level for better comparison with the lichen dataset.

General Recovery Patterns and Data Structure

Next, I began by visualizing the general community recovery and structure of my lichen data in all my age classes and sample locations. I plotted the richness of lichen species across the 3 age classes (Figure 3) and the sample location (Figure 4). Initially, we are seeing some differences in distribution across the 3 age classes, but it is very minimal. The most notable difference is that the median lichen species richness is almost double 30 years after reclamation. Between the two sample locations, again, they look quite similar, but the median lichen species richness in the wellpad sample locations is noticeably higher than that of the reference community.

Figure 3. Boxplot of Lichen Species Richness 10, 20, and 30 years post reclamation.

Figure 4. Boxplot of Lichen Species Richness on both Wellpad and Reference Sites.

Next, I visualized the general community recovery trajectories for the vascular plant cover by plotting the total vascular plant cover across the 3 age classes (Figure 5) and Sample Location (Figure 6). Across the age classes, 30 years post reclamation seems to have a very narrow range of percent cover and a significantly higher median than 10 and 20 years post reclamation. The 20-year age class seems to have a bit more dispersion than the 10-year age class, but they have very similar medians. The 30-year age class has two outliers that I decided to keep in my dataset for now. The vascular plant cover is slightly higher on reference locations, but generally speaking, they have similar medians and distributions.

Figure 5. Boxplot of total vascular cover across sites of all three age classes (10, 20, and 30 years post-reclamation).

Figure 6. Boxplot of total vascular cover on wellpad and reference sites.

Cluster Analysis: Discovering Initial Patterns or Important Recovery Influences

I continued by conducting a cluster analysis using a combined dataset of all the soil, vascular plant percent cover, and lichen community data using Site ID and Sample Location to preserve the paired reference and wellpad sites. This analysis would help me to identify initial influences on the recovery of lichens and vascular plants on wellpad sites compared to reference sites. I removed the same sites that I initially eliminated from my lichen data in all my soil and vascular data sets so that they would join and could be analyzed properly. I began by scaling my data using a Hellinger transformation, as many of the measurements use different units. A distance matrix was created using vegdist, and I selected 2 clusters (k=2) for my analysis based on the results of my dendrogram. Clusters were created using agglomerative ward.D2. I ran a Principal Component Analysis (PCA) and created a biplot by sample location and age class, overlaid with the vectors created using envfit to show which factors were having the biggest impact on the clusters. PC1 explains 31% of the variance, while PC2 explains 18% of the variance. You can see this biplot in Figure 7 below. A table of the key species discussed in my analyses and their abbreviations is included in Table 5.

Table 5. Key Species IDs used in ordination, their species names, and common names.

PC1 seems positively correlated with the Electricity Conductivity of soil and Agrocri (Agropyron cristatum ssp. Pectiniforme), and negatively correlated with Boutgra (Bouteloua gracilis), depth of soil measurements, pH, and several other plant species. This suggested that PC1 represents disturbance or soil chemistry, where higher conductivity and the presence of invasives like Crested Wheatgrass are affecting clusters. PC2 is positively correlated with several plant species, and less so with Agrocri (Agropyron cristatum ssp. Pectiniforme), Liatpun (Liatris punctata), and depth of soil measurements/pH, suggesting differences in species composition between clusters. The strongest drivers of my clusters are Electrical Conductivity, Agrocri (Agropyron cristatum ssp. Pectiniforme), and Boutgra (Bouteloua gracilis). Cluster 2 (blue) is more aligned with PC1, and is likely to be wellpad sites with more homogenous community structure, where Cluster 1 (red) appears to be less disturbed and more variable in composition. When I looked at the differences between clusters using the Cluster package (cluster_diff), the variables that were separating them most were the presence of vascular species such as Agrocri (Agropyron cristatum ssp. Pectiniforme), Taraoff (Taraxacum officinale), Pascsmi (Pascopyrum smithii), Tragdub (Tragopogon dubius), and Boutgra (Bouteloua gracilis).

Figure 7. PCA biplot using Cluster Analysis (k=2, Ward's Method) to identify initial influences on recovery of lichen and vascular plants on wellpad sites compared to reference sites.

Using the same clusters and PCA scores from above, I reclassified my points based on age class in order to visualize initial general patterns of recovery at different times since reclamation, as shown in Figure 8. Sites in reference locations are identified with circles, and sites in the wellpad locations are identified with triangles. Confidence ellipses (95% confidence) have also been added around clusters. At 10 years post-reclamation, Cluster 1 (red) is mostly dominating and seems to have very high variability in species composition and soil chemistry. Reference and Wellpad sites are remaining distinct in structure. At 20 years, elements of both clusters are clearly present and overlapping. There is less variability at this stage, indicated by tighter confidence ellipses and groups of points, so we can infer that communities are gaining more structure. Wellpad and reference points are more mixed together, suggesting that they are becoming more similar in composition. At 30 years, Cluster 1 is dominating once again, and we are still seeing less variability, with tighter ellipses and points more grouped together. Reference and Wellpad sites seem more consistent at this stage, but are once again separate in composition.

Figure 8. PCA biplot using Cluster Analysis (k=2, Ward's method) to identify initial patterns between clusters in the different age classes, 10, 20, and 30 years post reclamation. Sites in reference locations are identified with circles, and sites in the wellpad locations are identified with triangles.

Preparing for further analysis

Due to the results of my cluster analysis and PCA biplots, I decided to focus on the graminoid, forb, and lichen recovery, comparing different sample locations and age classes. Additionally, I will use univariate statistics to support my results and test for relationships between key graminoid or forb species and the lichen community richness and composition.

I conducted a Principal Coordinate Analysis (PCoA) analysis using Jaccard Distances on the lichen species matrix, removing any metadata (e.g. sample location, species richness, and Site ID), and making K=2. Afterwards, I recombined the metadata with the ordination scores and calculated eigenvalues to get the variance explained by both PCoA axes. PCoA 1 explains 24% of the variance, and PCoA 2 represents 16% of the variance.

I used a Non-Metric Multidimensional Scaling (NMDS) ordination and Bray-Curtis distances for analyzing my vascular data. Once again, I separated the species matrix from the rest of the metadata so that I could properly run the analysis, and then recombined them later once I had the NMDS scores. I created species vectors with the envfit() function and permutations of 999 to show which vascular plant species were most strongly associated with the changes in community composition across sites. To make my biplot cleaner, I created vectors on my biplot only for the top 10 species that had the most significant r2 and p-values (less than or equal to 0.05). These species were Agrocri (Agropyron cristatum ssp. Pectiniforme), Hespcom (Hesperostipa comata), Taraoff (Taraxacum officinale), Boutgra (Bouteloua gracilis), Pascsmi (Pascopyrum smithii), Tragdub (Tragopogon dubius), Nassvir (Nassella viridula), Ceraarv (Cerastium arvense), Descsop (Descurainia sophia), and Liatpun (Liatris punctata). The vectors were also scaled for readability, and confidence ellipses (95% confidence levels) were added to help with visualizing differences between sample locations and age classes. The stress level for my NMDS is 0.20, which, though not ideal, makes sense given the small sample size and variability in my dataset, and should allow me to interpret any results.

Both my NMDS and PCoA biplots are included in the results section.

Page updated

Report abuse