Data

Data table

The data table (Table 4) including 2095 plots of lodgepole pine site index and the annual climate data from 1961-1990. The company, plot, latitude, longitude, elevation and plot site index (plot_si) in this data table comes from the PGYI dataset, using data grouping method. The climate variable I used are from climate system ClimateAB v3.21. They are mean annual temperature (MAT), mean warmest month temperature (MWMT), mean coldest month temperature (MCMT), temperature difference between MWMT and MCMT, or continentality (TD), mean annual precipitation (MAP), mean annual summer (May to Sept.) precipitation (MSP), annual heat:moisture index (AHM), summer heat:moisture index (SHM), degree-days below 0°C, chilling degree-days (DD.0), degree-days above 5°C, growing degree-days (DD.5), the number of frost-free days (NFFD), frost-free period (FFP), and precipitation as snow (PAS). The details showing in table 4. Predictor variable is climate variables and response variable is lodgepole pine site index. The dataset is divided into 2 parts, managed stands and natural stands.

The table 5 shows the coordinates and climate data for prediction (after I get the random forest model). I used the Alberta shapefile, elevation data and Digital elevation model (DEM) file of Alberta to extract coordinates of whole Alberta. I got 53900 points of Alberta totally. Then using the same climate system ClimateAB v3.21, I got the climate data of each point of 2025S, 2055S and 2085S for managed stands and natural stands to do the prediction.

Table 4 : A sample of the data table used for the analyses, including data distribution, PCA, correlation matrix, heat map and random forest.

Table 5: A sample of the climate data in 2025S for prediction using random forest model.

Data Exploration

Figure 4: Histogram of plot site index. x axis represent lodgepole pine site index, y axis represent the frequency.

2. To make the climate data less skewed and outliers less extreme, it is necessary to do data transformation. Taking the climate data which are skewed in logarithms applies to the characteristics of a skewed distribution. This can bring the data closer to a normal distribution. 

Figure 5:  Samples of comparison of the distribution of original climate variable data and after transforming climate variable data. First graphs of the first row and second row is the original data of TD and MAP, while second graphs of the first row and second row is the data of TD and MAP after transforming. X axis represents climate variables, y axis represents the frequency. 

3. Correlation matrix and heat maps of managed stands and natural stands.

Figure 6: This figure is a correlation matrix and the heat map with a histogram of managed stand. The matrix shows the pairwise correlation coefficients between different climate variables and plot site index (plot_si). Values close to 1 or -1 indicate strong positive or negative correlations, respectively, while values close to 0 suggest no correlation. The colors range from blue (negative correlation) to red (positive correlation), with the intensity of the color indicating the strength of the relationship. The histogram along the top visualizes the distribution of the correlation coefficients. 

Figure 7: This figure is a correlation matrix and the heat map with a histogram of natural stand. The matrix shows the pairwise correlation coefficients between different climate variables and plot site index (plot_si). Values close to 1 or -1 indicate strong positive or negative correlations, respectively, while values close to 0 suggest no correlation. The colors range from blue (negative correlation) to red (positive correlation), with the intensity of the color indicating the strength of the relationship. The histogram along the top visualizes the distribution of the correlation coefficients.