Species maps show promising results for abundant species at high and low resolutions
Fig 12. Species frequency map of Douglas fir at 1km habitat level resolution (right) and 250m inventory resolution (right) with level 4 ecozones delineated
The predictive modelling outputs for western tree species closely align with their known distributions, providing confidence in the model’s performance. In particular, patterns of predicted species frequency correspond well with ecological variation across Level 4 ecozones in the western United States, reflecting changes in elevation and forest composition.
For example, Douglas-fir (Fig. 12), a species that dominates much of the Pacific Northwest, is prominently represented in the 1 km resolution prediction map. The model captures both the widespread distribution of coastal Douglas-fir in lowland areas and the more specific patterns of interior Douglas-fir, which is strongly associated with elevation, slope, and rain shadow effects in mountainous regions of the interior west.
Fig 13. 1 km species frequency map of trembling aspen.
Fig 14. 1 km species frequency map of black spruce.
Early model iterations struggled predicting species with widespread ranges such as black spruce (Fig. 13) and quaking aspen (Fig. 14), these issues appear to have been well solved with the addition of pseudo plots and accounting for the zero inflated dataset through the use of a hurdle model.
Variable importance metrics show known range limiting metrics as highly important in presence/absence modelling with a different suite of variables being most important for frequency modelling.
Fig 15. Variable importance of predictor variables (y-axis) by root mean square error loss after removal of variable for presence absence model (DNN 2a) of Douglas fir.
Fig 16. Variable importance of predictor variables (y-axis) by root mean square error loss after removal of variable for frequency model (DNN 2b) of Douglas fir.
The above figures show which environmental and land cover factors were most important for improving the model’s accuracy. A higher value means that removing that factor made the model perform worse. In the presence/absence model (Fig. 15), the most important factor was prob1, which represents the probability of an area being covered by needle-leaf forest. Other highly ranked factors included a dryness index (logAHM) and the extreme minimum temperature (EMT). These two climate measures are well-known limits for where Douglas-fir can grow. The fact that the model identified them as key predictors is a strong sign that it is picking up on the real environmental conditions that shape the species’ natural range.
Figure 16 shows the most important factors for predicting how abundant Douglas-fir is in areas where it can already grow. Unlike the first model (Fig. 15), which focused on the basic climate limits for where the species can survive, this model highlights local site conditions that influence abundance. The top factors include elevation, whether a slope faces east or west, distance from the ocean, and how exposed the site is. Other contributors include snow levels, summer dryness, and the probability of shrubland cover. Together, these help explain why Douglas-fir may dominate in some suitable locations but be less common in others. In the frequency model some of the most important variables from the presence absence model were some of the leasr important to the model; displaying that the frequency model doesn't pick up on range limiting factors, but those that determine abundance.
Validation Metrics are Promising.
Fig 17. Accuracy of DNN over time
Table 2. Mean Absolute Error of four key tree species after running DNN with added variables
DNN models predicted species frequency values onto training data set for model validation. 70% of training data was excluded for model training, and model was used to predict species frequencies on remaining training plots. Table 2 shows Mean Absolute Error values after adding different variables, with percent improvement from baseline (climate only) displayed in brackets. As the model is predicting onto zero inflated values (any given species isn't present in most plots) metrics are artificially high, however, improvements are visible with the addition of topographic variables and land cover probabilities.
Figure 17 shows how the accuracy of our deep neural network model improved over multiple training cycles, known as epochs. Both training and validation accuracy increased steadily, with the model achieving over 94% accuracy for both datasets. We chose to stop training after 15 epochs to prevent overfitting, ensuring the model performs well on new, unseen data rather than simply memorizing the training set.
To make our predictions as accurate and ecologically realistic as possible, we took several steps to refine our models. First, we reduced the chance of species misidentification by filtering observations using historic range maps, with a 200 km buffer to allow for natural variation. We also added “pseudo-plots” in areas without forest cover to help the model avoid overpredicting in landscapes where the species could not grow.
We incorporated both land cover probabilities and detailed topographic variables to better capture habitat suitability. Land cover data was processed to remove heavily human-altered areas from model training, so predictions reflected the ecological potential of a site rather than just where the species happens to be found today. Elevation, slope direction, exposure, and distance from the ocean added important context for how climate and terrain shape where species can thrive.
Finally, we used a two-step “hurdle” model, which first predicted the probability of a species being present, and then used only plots where the species was observed to train a frequency model estimating how common it is likely to be. This approach helped reduce bias from data gaps, improved predictions for species that are underrepresented in inventory data, and added ecological realism by separating the factors that control a species’ range from those that determine its local abundance.
This modelling approach can be used for forest inventories when combined with remote sensed land cover data, for seed selection recommendations, or assisted migration prescription selections through climate matching with our predicted species frequencies as backing data.
Next Steps:
Expand inventory species from 5 to over 200 at inventory resolution (250m).
Expand species habitat maps up to 400 species at 1km resolution.
Add Mexico plot data for complete ranges of pine species.
More rigorous validation such as checkerboard validation.