The geomorphic classes assigned to each lake datapoint show notable overlap with the mapped surficial geology units each datapoint sits within. The majority of organic lakes occur within organic deposits, and the majority of moraine lakes occur within moraine deposits of varying geomorphology (Mk, Mh, Mp, Mv), with the two other most contributing categories being glaciofluvial sediments which contained ice, and marine sediments reworked by glacial activity (Gx, Yk). Two geomorphic classes (alluvial plain and ice-contact deposit) occur entirely within one surficial geology class (Ap, ICD).
Three surficial geology classes were excluded from analysis due to small sample size: Yk, Mv and C.
The major ions and nutrients analyzed in this sampling campaign are shown in Table 4. Due to inconsistencies between sampling campaigns and budget constraints, not all lake samples were analyzed for all chemical components (Figure 4). Notably, only one RLL sample had data for the majority of the chemical suite.
Because of this, it was decided that two sets of variables would be used, one which included the RLL class (and was therefore limited), and a broader suite which excluded all RLL samples.
Fortunately, when working with geological units the missing data was better-distributed, and two of the units missing most were omitted entirely due to small sample size (Yk and C). (Figure 5)
Out of the variables which were missing rows/samples, a missForest (RandomForest-derivative package) imputation was tested by artificially removing datapoints, and comparing the model predictions to the true values. From these imputations, it was determined that Alk, ColourAp, ColourTrue, Cond, DN, Hard, pH, SO and TDS could be imputed to fill in missing values. Imputations of DP, F, NH3, NO3 and Si were not satisfactory; decisions were made on a case-by-case basis whether to omit these variables or remove only NA-containing datapoints.
As the RLL-class would only have one datapoint (Sample size of 11, ~90% of datapoints missing for the majority of chemical parameters in Figure 3), this class was omitted from the data subset undergoing imputation.
The majority of the lake chemistry data was right-tailed (Figure 8), and transformed to approximate normal distribution (Figure 9) using square root transformations with a constant value determined through manual trial-and-error; minimum values per variable were calculated to ensure all datapoints remained above zero before transformation. Data transformations were tested on each variable before imputation, and performed after imputation of the dataset. Transformations are available in Table 3, below.
For the subset of datapoints containing the RLL class, TN, TP, Ca, DOC, K, Mg, and Na were used; none of these variables required imputation.
For the subset of datapoints omitting the RLL class, all variables were used, save the conditions listed in Table 4. The decision to omit F was made due to the inconsistencies in the data itself, likely reflecting error in sample handling. The decision to omit Si was made due to the number of rows which would be omitted if it was left in.
This principal component analysis was not the objective of the research, but does allow for better visualization and understanding of the lake chemistry data. When all chemical variables in the analysis were included (this requiring the omission of RLL-classified datapoints), overall variation between samples occurred in terms of cations (roughly parallel to PC1), and nutrients (roughly parallel to PC2). TDS, which is the weight of evaporite from a given volume of water, sits between these two axes. One geomorphic class, polygonal patterned ponding, exhibited much more variation than the other classes.