Data

Data tables

There are two data tables showing simplified data with fungal endophytes abundance, terpenes (monoterpene and sesquiterpene) concentration, and their proportional value among all study sites.

Table-2 Simplified data indicates total fungal endophytes abundance, total monoterpene/sesquiterpene concentration, and abundance of each identified fungal endophyte genera in all 30 spruce trees across six study sites among five provinces. 29 fungal endophytes genera are identified, the abundance of fungal endophytes is shown as the number of amplicon sequence variants (ASVs), and the concentration of monoterpene and sesquiterpenes are shown as ng/mg.

Table-3 Simplified data indicates total fungal endophytes abundance, total monoterpene/sesquiterpene concentration, and concentration of each detected terpene in all 30 spruce trees across six study sites among five provinces. 28 different monoterpenes/sesquiterpenes are detected, the concentration of monoterpene and sesquiterpenes are shown as ng/mg while the values in total fungal endophytes shown as the number of amplicon sequence variants (ASVs).

Sampling units and variables

In this study, each location is a sampling unit and we have six locations from five different provinces. In total, there are six sampling units and each sampling unit contains five individual samples.

The predictor variable in this study is the location and it is a categorical variable. The response variables in this study are fungal endophyte abundance (in table-2) and terpene concentration (in table-3). They are continuous variables.

Data exploration

Boxplots of endophytes and terpenes (monoterpene/sesquiterpene) in each location


Figure-4 Boxplot of the abundance of each fungal endophyte genera among six locations (AB-S: Alberta-Slave lake forest; NB-U: Newbrunswick-Upper Green River; ON-T: Ontario-Twist lake; QU-C: Quebec-Cimon; QU-D: Quebec-Dasserat; SK-O: Saskatchewan-Old channel river).

Figure-5 Boxplot of the concentration of each monoterpene in six locations (AB-S: Alberta-Slave lake forest; NB-U: Newbrunswick-Upper Green River; ON-T: Ontario-Twist lake; QU-C: Quebec-Cimon; QU-D: Quebec-Dasserat; SK-O: Saskatchewan-Old channel river).

Figure-6 Boxplot of the concentration of each sesquiterpene in six locations (AB-S: Alberta-Slave lake forest; NB-U: Newbrunswick-Upper Green River; ON-T: Ontario-Twist lake; QU-C: Quebec-Cimon; QU-D: Quebec-Dasserat; SK-O: Saskatchewan-Old channel river).

The lower border of the box represents 25% of the total data, while the upper border of the box indicates 75% of the total data. Solid lines inside each box suggest the median, and the whiskers represent the greatest and lowest values without outliers. The dots located upper and lower boxes are outliers in the data.

Figures-4 shows the boxplot of each genus's fungal endophyte abundance among six study sites. Some genera indicate many 0 values. These suggest an ordination application based on the bray-Curtis distance, such as principal coordinate analysis (PCoA) and gradient analysis. Large fluctuations of fungal endophyte abundance indicate the small sample size. Some genera have some outliers and significant variations in the distribution and abundance of different fungal genera. So, a permutational multivariate analysis of variance (perMANOVA) followed by univariate pairwise comparison is suggested to test the diversity and variation of fungal endophyte abundance among six study sites.

Figures-5 and 6 indicate the boxplots of the concentration of each terpene compound in six study sites. Near half of the monoterpene compounds show 0 values of their concentration among six study sites, while there are a few 0 values in the concentration of sesquiterpenes. In addition, the dataset has upper and lower outliers (figure-5 and 6). These findings also suggest that a principal coordinate analysis (PCoA) or gradient analysis based on the bray-Curtis distance is better to be used here. Also, a perMANOVA followed by a univariate pairwise comparison is suggested to test the diversity and variation of terpene (monoterpene/sesquiterpene) concentration among six study sites.

Boxplot of total fungal endophyte abundance and total terpene concentration

Figure-7 Boxplot of total fungal abundance, monoterpene concentration, and sesquiterpene concentration in six study sites. a) Total fungal abundance (ASVs) in six locations. b) Total sesquiterpene concentration (ng/mg) in six locations. c) Total monoterpene concentration (ng/mg) in six locations. (AB-S: Alberta-Slave lake forest; NB-U: Newbrunswick-Upper Green River; ON-T: Ontario-Twist lake; QU-C: Quebec-Cimon; QU-D: Quebec-Dasserat; SK-O: Saskatchewan-Old channel river)

As the sample size of each study site is small (n=5) and contains many zeros, boxplots with total abundance and concentration were made to show variations of data within each study site. The lower border of the box represents 25% of the total data, while the upper border of the box indicates 75% of the total data. Solid lines inside each box suggest the median, and the whiskers represent the greatest and lowest values without outliers. The dots located upper and lower boxes are outliers in the data. 

From figure-7 a), there is no outlier in the abundance of total fungal endophyte across six study sites. Most medians overlap with the borders of boxes, and only data in Dasserat in Quebec indicates the median inside the box. Upper Green River in New Brunswick has the lowest total fungal endophyte abundance, while the data in Slave lake in Alberta shows a significantly larger variation. Figures-7 b) and 7 c) suggest that data from 4 sites show similar variations in total monoterpene and sesquiterpene concentrations. Data on total monoterpene and sesquiterpene concentrations in the Old channel river in Saskatchewan illustrate the largest variation. There are two upper outliers in data on total sesquiterpene concentration in Quebec-Dasserat and Saskatchewan-Old channel river sites. One lower outlier is found in the data on total sesquiterpene concentration in site Ontario-Twist lake. Total monoterpene concentration data has upper and lower outliers in New Brunswick-Upper Green River. The same data in sites Ontario-Twist lake and Quebec-Dasserat has lower outliers.

Principal coordinate analysis (PCoA)

Figure-8 Principal coordinate analysis (PCoA) of total monoterpene/sesquiterpene concentration with the abundance of different fungal endophyte genera in six study sites. (AB-S: Alberta-Slave lake forest; NB-U: Newbrunswick-Upper Green River; ON-T: Ontario-Twist lake; QU-C: Quebec-Cimon; QU-D: Quebec-Dasserat; SK-O: Saskatchewan-Old channel river)

Figure-9 Principal coordinate analysis (PCoA) of total fungal endophyte abundance with different terpenes in six study sites. (AB-S: Alberta-Slave lake forest; NB-U: Newbrunswick-Upper Green River; ON-T: Ontario-Twist lake; QU-C: Quebec-Cimon; QU-D: Quebec-Dasserat; SK-O: Saskatchewan-Old channel river)

The principal coordinate analysis (PCoA) between fungal endophyte abundance and terpene concentration in six study sites was applied based on histograms and boxplots. There was a weak relationship between the abundance of each fungal endophyte genera and its location (figure-8). But as the sample size of each study site is small (n=5) and the dataset contains many 0 values, a grouped bar chart and a perMANOVA followed by a univariate pairwise comparison are more reliable for assessing the variations in fungal endophyte abundance. Some fungal endophyte genera indicate strong positive correlations between terpene concentrations.

A weak PCoA between total fungal endophyte abundance and concentration of each terpene compound was created based on data table 2 (figure-9). A data transformation is possible to solve this problem.

Log transformation and furteher PCoA

Figure-10 Log transformed principal coordinate analysis (PCoA) of total fungal endophyte abundance and terpene concentration in six study sites. (Left is fungal genera concentration with total monoterpene/sesquiterpene concentration, right is the concentration of each terpene with total fungal abundance; AB-S: Alberta-Slave lake forest; NB-U: Newbrunswick-Upper Green River; ON-T: Ontario-Twist lake; QU-C: Quebec-Cimon; QU-D: Quebec-Dasserat; SK-O: Saskatchewan-Old channel river)

Log transformation provides a good view between fungal endophyte genera and total monoterpenes/sesquiterpenes, but it failed to indicate a good view between each terpene compound and total fungal abundance (figure-10). Indicating a more powerful transformation is required for better ordination. 

Square root transformation and furteher PCoA

Figure-11 Square root transformed principal coordinate analysis (PCoA) of total fungal endophyte abundance and terpene concentration in six study sites. (Left is fungal genera concentration with total monoterpene/sesquiterpene concentration, right is the concentration of each terpene with total fungal abundance; AB-S: Alberta-Slave lake forest; NB-U: Newbrunswick-Upper Green River; ON-T: Ontario-Twist lake; QU-C: Quebec-Cimon; QU-D: Quebec-Dasserat; SK-O: Saskatchewan-Old channel river)

Square root transformation provides worse ordination in both PCoA plots (figure-11). So, inverse data transformation was done to better visualize ordinations.

Inverse transformation and furteher PCoA

Figure-12 Inversely transformed principal coordinate analysis (PCoA) of total fungal endophyte abundance and terpene concentration in six study sites. (Left is fungal genera concentration with total monoterpene/sesquiterpene concentration, right is the concentration of each terpene with total fungal abundance; AB-S: Alberta-Slave lake forest; NB-U: Newbrunswick-Upper Green River; ON-T: Ontario-Twist lake; QU-C: Quebec-Cimon; QU-D: Quebec-Dasserat; SK-O: Saskatchewan-Old channel river)

After Inverse transformation, the visualization of both plots increased, but it was hard to find the relationship between fungal endophytes and the total concentration of monoterpene/sesquiterpene (figure-12). So, Helligner transformation was introduced to have a better ordination.

Helligner transformation and furteher PCoA

Figure-13 Helligner transformed principal coordinate analysis (PCoA) of total fungal endophyte abundance and terpene concentration in six study sites. (Left is fungal genera concentration with total monoterpene/sesquiterpene concentration, right is the concentration of each terpene with total fungal abundance; AB-S: Alberta-Slave lake forest; NB-U: Newbrunswick-Upper Green River; ON-T: Ontario-Twist lake; QU-C: Quebec-Cimon; QU-D: Quebec-Dasserat; SK-O: Saskatchewan-Old channel river)

Helligner transformation was made to convert the terpene data to satisfy the assumptions of principal coordinate analysis (PCoA), and the Helligner-transformed PCoA plot indicated that the ordination is not good (figure-13). So, a gradient analysis with two steps to add vectors is needed for ordination and to find the relationship between fungal endophyte abundance and terpene concentration.

Limitations

One major limitation of these datasets is that the sample size in each study site is tiny (n=5), and there are many 0 values. As a result, the correlations between study sites and fungal abundance/terpene concentration are weak. A grouped bar chart and a perMANOVA followed by a univariate pairwise comparison are more reliable and straightforward in showing the variations of fungal endophyte abundance and terpene concentration.

Moreover, the dataset is non-normally distributed and failed in four transformations. This can limit the accuracy of ordination.