Data

Data tables

The raw data for this project consisted of soil, cover, tree basal area, and age, for well pads and reference plots in Lower Foothills and Central Mixedwood natural regions (Table 1). Data was subsetted to only include wells as reference data was not required for this project. In addition, fern, clubmoss, and lichen data was excluded due to most or all values being zero for those variables. 

Data was split into two datasets, one with age and soil biophysical variables and a second with vegetation cover data. This aided in visualizing relationships in the data through the lens of soil characteristics and vegetation separately to better inform recommendations for reclamation. 

Abbreviations used in Tables 1 and 2 are as follows: 

LFHmean_mm: LFH soil horizon depth (organic layer)

BD_0-15cmdepth_cm: Bulk density of the soil - 0-15 cm depth

pH_0: measured pH of the soil - 0-15 cm depth

TOC_0: Total organic carbon in the soil - 0-15 cm depth

TN_0: Total nitrogen in the soil - 0-15 cm depth

CNratio_0: Carbon to nitrogen ratio of the soil - 0-15 cm depth

tph_total: Number of live and dead trees/ha

LiveBA_m2/ha: Live basal area (BA; m2/ha) for all trees combined.

DeadBA_m2/ha: Dead basal area (BA; m2/ha) for all trees combined.

Table 1. Condensed site / environmental data for reference and reclaimed well sites including age, soil, and cover data. Columns not shown include chemical properties such as total Nitrogen, total organic Carbon, Carbon-Nitrogen ratio, and pH.

Units

All soil data and site age are continuous quantitative variables with units described in Table 2. Cover data contains continuous quantitative data in the form of trees/hectare and square meters per hectare as well as percent cover of life-form plant groups (visually determined).

Table 2. Units for environmental dataset including categorical vs. quantitative designation.

Age and Soil Data

Age and soil data (Table 3) also referred to as Site Data includes the site age (years since reclamation) and soil physical (bulk density), chemical (pH, total organic carbon, total nitrogen, and carbon-nitrogen ratio), and biological (organic layer depth) properties.

The youngest site in this study was 7 years post-reclamation and the oldest site was 48 years post-reclamation; however, a majority of the sites fall around 15-20 years post-reclamation (Fig 8.).  The age median was 17.5 years post-reclamation certification (Fig 8.) A limitation to assessing the effect of time since reclamation is that the distribution of the data is not even. Here, the best case scenario would include sites spanning the 7 to 48 years post reclamation range consistently rather than a high concentration of sites reclaimed ~15 years ago. This is likely due to reclamation still being a relatively new practice for oil and gas disturbance in 1966 (when the oldest well site in this study was certified reclaimed).  Although there are fewer data points for sites older than 30 years post-reclamation, they provide important insights for successional trajectories of reclaimed well pads.

It's important to acknowledge that large variances in reclamation practices are not captured in this dataset but are likely contributing to much of the variance in cover and soil data of these sites. While reclamation criteria sets the standards for reclamation work, there are many different ways to meet those requirements, and different companies and consultants can approach a reclamation project in their own way. However, this data is not easily obtained as past reclamation records can be inconsistent or nonexistent. 

Table 3. Age (years since reclamation) and soil biological, physical, and chemical data for reclaimed well pads. Site ID includes whether the sites were in the boreal or foothill natural region. 

Fig 8. Histogram showing the age distribution (years since reclamation) for the reclaimed well pads in the Fox Creek and Slave Lake study areas. 

Cover Data

Cover data (Table 4) includes percent cover of herbs, shrubs, graminoids, and non-native species and cover of trees via density per hectare and basal area (both live and dead). The inclusion of dead woody debris is important as a source of structure and organic material . The tree, snag, and stump measurements are more precise than vegetation cover, which was estimated visually and not always by the same people although they were trained together and calibrated through group training. 

Table 4. Tree density and cover and vegetation cover data for reclaimed well pads. Basal area is represented as m2/ha. Percent cover data was aggregated from cover estimates from 5 x 5 m occular estimates. 

Predictor and Response Variables 

Tree and cover data (response variables are being assessed in relation to predictor variables which include the natural region (Boreal vs. Foothill) of the well pads, site age (years since reclamation), and site biophysical data (including soil nutrients, bulk density, pH, and organic layer depth).  Predictor variables are observed not manipulated. Natural subregion is the only categorical variable, all others are continuous numeric. 

Exploratory graphics 

DISTRIBUTIONS

Preliminary assessments (boxplots, histograms) of the data distributions of both soil/age and cover data showed skewed data. Scaled boxplots (Figs 9 and 10) were used to identify which variables were already relatively normally distributed and which variables could be transformed. Age, Organic layer depth, Total organic Carbon, and Total Nitrogen were all selected as having significant outliers or skewed distributions which could benefit from transformation (Fig 9.). Bulk density, pH, and Carbon-Nitrogen ratio were not transformed due to their already satisfactory distributions. 

The scaled boxplots for cover data revealed skewed data for tree and shrub variables which represent the lack of woody species on many reclaimed well pads in arrested or slowed succession but could also reflect younger sites in early successional stages (Fig 10.). Live and dead trees per hectare, Live basal area, Dead basal area, and Shrub % cover were therefore transformed while Herb, Graminoid, and Non-native % cover were left as they were already relatively normally distributed.

Fig 9. Boxplots of scaled soil data and site age showing skewed distributions (namely Age, Organic layer depth, Total organic Carbon, and Total Nitrogen) that were identified to transform to more normal distributions.

Fig 10. Boxplots of scaled percent cover and basal area data showing skewed data for all tree measurements (Live basal area, Dead basal area, Live+dead trees/ha) specifically zero-inflated data for trees/ha and skewed Shrub % cover which were identified for transformation for normality. 

DATA TRANSFOMATIONS

The following transformations were made to normalize both datasets: 

Soil and age data (Fig 11.) log transformations of total organic Carbon, site age, and organic layer depth, and 1-(1/(Total Nitrogen+0.1))+5 (Fig 12); 

Cover data (Fig 13. )log transformations of live and dead trees/hectare+0.1, dead basal area+0.1, and shrub cover+0.1, and squareroot transformation of live basal area+0.1. 

Even with log transformation, dead basal area was zero-inflated, and still represents skewed data in this dataset. This is ecologically important however, as it shows that many well pads had no trees present likely due to young age or the competitive advantage of planted agronomic species which keep the well sites in arrested succession.

Fig 11. Boxplots of scaled, transformed soil biophysical and site age data showing transformed data adjusted for outliers resulting in the error bars being less influenced by extreme values. 

Fig 12. Example of histogram of transformed data. Here, total Nitrogen was log and 1-reciprocal transformed with additions to avoid NaN or -Inf errors. 

Fig 13. Boxplots of scaled, transformed percent life-form plant cover and tree basal arera data showing transformed data adjusted for outliers resulting in the error bars being less influenced by extreme values. Note that even after transformation, Dead basal area is still zero-inflated and right skewed.