Water samples were collected from lakes across the NWT during the 2017, 2023 and 2024 field seasons collaboratively by researchers from the University of Alberta, Aurora Research Institute, and Geological Survey of the Northwest Territories. Concentrations of major ions, nutrients, and trace metals were submitted to and analyzed by Taiga Environmental Laboratory (commercial laboratory, NWT); the exact suite chosen for each lake varied due to both budget constraints and sampling campaign inconsistencies across years.
The present research uses a subset of this dataset, centered around the Mackenzie Valley and Beaufort Delta regions (Figure 1).
At the time of sampling, each lake was classified into a geomorphological category (reflecting shape, depth, size, and likely process which formed its basin) by trained quaternary geologists based on inspection of satellite data and physical visitation (See Table 1).
In what exemplifies the goal of this research (to increase accuracy of generalization under limited field data availability), the lake chemistry data used in this analysis is, like many Northern datasets, limited in scope due to the cost-prohibitive nature of Northern fieldwork. The lakes sampled were clustered around road-accessible locations. Due to this, samples may show relation to each other not because of geological or geomorphic class, but because of geographic proximity, particularly when many of a single class were sampled close together. The worst example of this is the ice contact glaciofluvial / ICD class of lakes (shown in purple, Figure 1); all lakes of this class occupy a very small geographic area, and one mapped geological polygon. Because of this, generalizations about the chemical characteristics of ICD lakes as a whole may not be supported.
The surficial geology class (the type of sediment present at surface level, classified based on the geological process which deposited it) of each lake point in the 2017-2024 chemistry dataset was assigned through spatial join of a Geological Survey of Canada digital compilation (Côté et al., 2013) of the mapped surficial geological units of the Mackenzie Valley, Beaufort Delta, and surrounding regions to the chemistry datapoint locations within QGIS. The compilation was mapped between a 1:125000 and 1:250000 scale, dependent on individual maps within the compilation (example mapsheet: Figure 2).
While the original lake chemistry dataset included samples outside the Mackenzie Valley and Beaufort Delta (namely lakes surrounding Yellowknife), equivalent digital surficial geology data at sufficient scale was not available for the Yellowknife region. During analysis the lake chemistry dataset was subset to only include points with surficial geology classes assigned.
Surficial geology is mapped using a series of letter codes which indicate the genetic origin of the sediment found at surface level (e.g. alluvial, glacial, lacustrine), its geomorphic characteristics (e.g. hummocks, thermokarst), and in some cases sediment texture. Each mapsheet within the compilation had been mapped using both different surficial geological code systems, and different fields within its shapefile data (i.e. some mapsheets separated origin, geomorphology and texture into three columns, while others combined them into one), which required standardization before use in this analysis.
To do so, after appending surficial geology classes to the lake chemistry dataset, the unique codes and combinations of codes belonging to lake points from each mapsheet were queried, and cross-referenced them with each mapsheet’s code system. From this, a standardized genetic origin and morphological classification through replacement rules was developed (Table 2).
Textural modifiers were excluded in this classification due to infrequent use on some mapsheets. Three surficial geology classes generated (colluvium, glaciomarine, and moraine veneer) were excluded from further analysis due to limited sample size.
Lake chemistry data was filtered to include only major ion and nutrient data, as trace metal concentration data was run on only a limited number of lakes. In some cases, missing chemistry parameters for a given sample were simulated using RandomForest imputation. Imputation accuracy (observed vs predicted value), alongside proportion of datapoints missing, was used to support the decision-making between chemical parameters to impute, remove entirely, or omit missing datapoints from the analysis.
This chemical data, after imputation was normalized (details available within Data Exploration), scaled, and used within a preliminary PCA (Principal Component Analysis).
After inspection of PCA results, imputation accuracy, and proportion missing, two subsets of chemical parameters were chosen for canonical discriminant analysis. One subset maximized the number of datapoints included, while the other required exclusion of one geomorphic class due to sample size after missing-parameter omission, but included the majority of chemical parameters. Canonical discriminant analysis was run on both variable subsets first using geomorphic class as the grouping, and then using surficial geology as the grouping, with results compared.
Côté, M.M., Duchesne, C., Wright, J.F., and Ednie, M., 2013. Digital compilation of the surficial sediments of the Mackenzie Valley corridor; Yukon Coastal Plain, and the Tuktoyaktuk Peninsula; Geological Survey of Canada, Open File 7289. doi:10.4095/292494