Results & Discussion

1.The Julian date on which growing degree-days above 5°C  reaches 100 exhibits the most significant correlations with other factors in the correlation matrix heat map and PCA analyses

Figure 8:  Principal component analysis of managed stands between site index and climate variables, it shows the relationship between different climate variables and site index. The colors red, yellow, green and blue represent the class I, class II, class III and class IV. Class IV is the site index between 5 to 10, Class III is the site index between 10 to 15, Class II is the site index between 15 to 20, Class I is the site index between 20 to 25.


Figure 9:  Principal component analysis of natural stands between site index and climate variables, it shows the relationship between different climate variables and site index. The colors red, yellow, green and blue represent the class I, class II, class III and class IV. Class IV is the site index between 5 to 10, Class III is the site index between 10 to 15, Class II is the site index between 15 to 20, Class I is the site index between 20 to 25.


Figure 10:  Comparison of correlation heat map between managed stands (first graph) and natural stands ( second graph).

Table 6: Acronym of climate variables.

The PCA scatter plot (Figure 8 and 9) shows  site index (plot_si) and climatic variables plotted against two principal components (Comp.1 and Comp.2). The length and direction of the vectors indicate the strength and direction of the relationship between these factors and the principal components. 

For the managed stand site index (Figure 8), the PCA indicates that variables DD5 and DD5_100 (the Julian date on which growing degree-days above 5°C  reaches 100) exhibits strong factors for site index, especially for Class III and Class IV. These 2  vectors show the longest length and has the same direction with site index. This observation suggests that DD5_100 is a pivotal factor, potentially acting as a proxy for the length of the growing season or as an indicator of climatic influences on temperature regimes, thus substantially influencing the site index within managed stands. Meanwhile, PAS, MWMT and AHM also shows good correlation with site index. It indicates that most of the climate variables that can affect the site index and drive the variability are related to the heat in managed stands. 

The PCA for the natural stand site index (Figure 9) corroborates the significance of DD5_100, with a similarly extended strong vector projecting toward the left quadrant of the plot, albeit exhibiting a marginally broader scatter. This denotesDD5_100 is a critical determinant of site index variability in natural stands. The dispersion of the data points, however, indicates that the natural stands' site index may be subject to a more complex set of environmental interactions than those of managed stands, reflecting the inherent variability in unmanaged ecological systems. And in this figure, the NFFD, FFP and MWMT (same to managed stand) is also the important climate factor to site index. The frost-free days play an important role in natural stands than in managed stands.

The heat map (Figure 10) provides a visual representation of the correlation coefficients between different variables. Red squares indicate a positive correlation, while blue squares indicate a negative correlation. The intensity of the color corresponds to the strength of the correlation.

In the heat map, we can see that the results are similar to PCA results. In the managed stands, variables such as MWMT (r = 0.3), AHM (r = 0.28), DD.5 (r = 0.32), DD5_100 (r = -0.34) and PAS show stronger correlations when compared to other climate variables. Although these are not strong correlation, like there are no numbers bigger than 0.5, it still indicate the DD5_100 could be a consideration that affects site index in managed stand. Meanwhile in the natural stands, there are also some good correlations between DD.5 (r = 0.42), DD5_100 (r = -0.44), MWMT (r = 0.39), NFFD (r = 0.31), FFP (r = 0.31) and site index, which are more stronger than managed stand. This reinforces the interpretation from the PCA plot that measures of DD5_100 and frost-free days are strong predictors of the site index in natural stand. 

2.Geospatial Site Index Modelling and Prediction

Figure 11: The Variable Importance Plot of random forest for managed stands.

Figure 12: The Variable Importance Plot of random forest for natural stands.

In my analysis, I employed a random forest regression model to predict the target variable 'plot_si', using a set of predictor variables including MAT, MWMT, MCMT, MAP, MSP, TD, AHM, SHM, DD.0, DD.5, NFFD, FFP, PAS and DD5_100. The Random Forest methodology is a regression tree method that achieves a high level of predicted accuracy by randomizing predictors and using bootstrap aggregation. The model was built with 100 decision trees (ntree = 100) and was configured to evaluate the importance of predictor variables (importance = TRUE). At each node split in the decision trees, four variables (mtry = 4) were considered. The model explained 47.23% of the variance in the dependent variable for managed stands and 57.3% for natural stands, which is indicative of a moderately strong relationship between the predictors and the response. The mean of squared residuals for the model was found to be 6.92 for managed stand and 4.42 for natural stands, suggesting the average squared deviation of the predicted values from the actual values. These results indicate that the random forest model provided a reasonable level of predictive accuracy for site index in the given dataset. However, based on the statistical data stated above, we can find that the model of the managed stands is not as accurate as the model of the natural stands, the reason for this may be that the sample size of the managed sample is smaller, only 450, unlike the natural stands which has 1645, which may result in a large error.

The figure 11 and 12 represents variable importance scores obtained from a random forest regression analysis. For managed stands (Figure 11), on the left, the plot indicates the percentage increase in Mean Squared Error (%IncMSE) when the data for each variable is permuted. This metric helps in identifying the variables that, when shuffled, significantly degrade the model’s performance, thus indicating their predictive importance. AHM, SHM, and MAP appear to be the top three variables with the highest %IncMSE scores, implying that they are the most important predictors in the model. The right plot displays the Increase in Node Purity (IncNodePurity), which is another metric of variable importance based on the reduction of impurity in the nodes of the trees in the forest; this is usually measured through some impurity metric like Gini impurity or entropy in classification tasks, and variance reduction in regression. The variables DD.5, PAS, and AHM are the top three variables that lead to the most significant increase in node purity, suggesting they are crucial for creating more homogenous nodes within the model. 

Comparing to the PCA and correlation matrix result, which shows the DD5_100 is important factor. But in random forest model, it does not show the DD5_100 is an important predictor. The reason is PCA and correlation matrix are more for calculating the correlation between different factors, but cannot exclude the covariance between each factor. Random Forest, on the other hand, is an integrated machine learning model that obtains more accurate and stable predictions by building multiple decision trees and merging them together. It can handle classification and regression tasks and can capture covariance between factors. The significant predictors obtained through random forest modeling are more indicative of the degree of contribution to the accuracy of the prediction and therefore will vary.

For the natural stands (Figure 12), the PAS, MAP, and MAP appear to be the top three variables with the highest %IncMSE scores, implying that they are the most important predictors in the model. And the variables DD.5, PAS, and MWWT are the top three variables that lead to the most significant increase in node purity, suggesting they are crucial for creating more homogenous nodes within the model. These results are also different from the PCA and Correlation matrix results, the reason is same as the managed stand I explained above.

Figure 13: From left to right are the predicted site index distributions for 2025S, 2055S and 2085S for managed stands, respectively. The light green line is the road, the dots represent the plot locations, and the different colored dots represent the site index for different intervals, black, blue, yellow and red represent Class IV (5-10), Class III ( 10-15), Class II (15-20) and Class I (20-25). The same color band of the map are the predicted new future site index.

Figure 14: From left to right are the predicted site index distributions for 2025S, 2055S and 2085S for natural stands, respectively. The light green line is the road, the dots represent the plot locations, and the different colored dots represent the site index for different intervals, black, blue, yellow and red represent Class IV (5-10), Class III ( 10-15), Class II (15-20) and Class I (20-25). The same color band of the map are the predicted new future site index.

Utilizing a Random Forest model, I predicted the lodgepole pine site index using the new climate data in 2025S, 2055S and 2085S from the whole Alberta, which encompasses a diverse range of elevations and climatic conditions. The predictions were made based on a comprehensive dataset that included both geographical and climate variables. My model's performance was visualized through the GIS, displaying the spatial distribution of the predicted site index (Figure 13 and 14). 

The site index for managed stands clearly shifts over time to higher site index classes, indicating that management techniques may be strengthening climate change resilience. Natural stands, on the other hand, show a greater distribution of site index classes, particularly in 2055, with a sizable region shifting into lower site index classes. This spread indicates that different people will react differently to climate stresses, which could result in lower productivity in some places. Different patterns show up when natural stands (Figure 14) and managed stands (Figure 13) are compared. Active forest management strategies that lessen some of the negative effects of climate change may be the reason managed stands exhibit a reasonably constant distribution of site index across time. Natural stands, on the other hand, exhibit more noticeable swings in site index values, which may be related to the absence of human intervention to mitigate the effects of climate change. 


Conclusion

The main topic of this project is predicting Lodgepole Pine site index variability and modeling the influence of climate change on forest productivity dynamics In Alberta. The Principal Component Analysis (PCA) and correlation heat map analysis provided compelling insights into the relationships between climatic variables and the lodgepole pine site index. Both in the managed stands and natural stands, the PCA and correlation matrix highlighted that the DD5_100 is among the most influential predictor, they are in the same direction with the site index, especially class III and class IV. For the managed stands, some predictors related to heat like MWMT and AHM also have a good correlation with site index. For nature stands, the frost-free days could be another important climate variable. The convergence of results from both the PCA and the heat map underscores the critical role of DD5_100 in influencing the site index for lodgepole pine. Such correlations not only validate the PCA findings but also highlight the potential for these climatic variables to be used as predictors in models aimed at managing and conserving lodgepole pine forests. Moving forward, the identified climatic factors could be pivotal in refining models for forest growth and productivity, ultimately aiding in the development of more accurate and efficient forest management strategies. The apparent collinearity among the climate variables and their combined predictive power offers a robust framework for further ecological studies and practical applications in forestry.

My random forest regression analysis revealed that AHM, SHM, and MAP are the most influential predictors for the managed stand site index, as evidenced by the highest percentage increases in Mean Squared Error (%IncMSE) when their values were permuted. PAS, MSP, and MAP are the most influential predictors for the natural stand site index. The result is different from the PCA and correlation matrix analysis, which is because the PCA and correlation matrix focus on exploring the relationship between each variable while the random forest model focuses on which variable can affect the model. Additionally, based on the prediction of site index in 2025s, 2055S, and 2085S, we can find that the managed stand site index will go to the higher site index classes while the natural stand site index will be more decentralized and will decline to varying degrees in 2055. This shows that human-managed forests are able to maintain a high level of forest productivity in the face of climate change. Therefore, it is necessary to take climate factors into account in future forest management in order to make forests more adaptable and resilient in the face of climate change.