Methods

Ecosystem Data

We combined the highest resolution ecosystem geospatial data publicly available in North America (level four ecoregions in the continuous US, Ecodistricts in Ca, level three ecoregions in Alaska and Mexico, the seed zones of Alberta and the ecozones of BC) (BEC; Terrestrial Ecodistricts of Canada; Level III and IV Ecoregions of the Continental United States). This resulted in a GIS file of ecoregions for all of North America, represented as spatial polygons shown in figure 2. Alaska and Mexico only had level III ecosystem data. These ecosystems were too broad to provide the in-depth seed sourcing information required for assisted migration. We used elevation lines to divide the larger ecosystems. Specifically, we looked at ecosystems that had a mean annual temperature range grater than 4 and had a Pearson  correlation of at least -0.7 between elevation and MAT. Then using the linear relationship between MAT and elevation we calculated what change in elevation should result in a 2 degree Celsius change in MAT. Then we  dived the ecosystems into 2 degree Celsius elevation bands if the split resulted in an ecosystems of at least 1,500 km squared. Figure 3 bellow shows how there are fewer ecosystems with huge MAT ranges after elevation contouring.

Figure 2.  Map of ecosystems polygons. Level 1 is the broadest ecosystem resolution available for each dataset. level 4 is the smallest level ecosystem classification available. Notice how the ecosystems in Alaska and Mexico are still quite large.

Figure 3. Histograms of the average mean annual temperature of the ecosystems. a). before elevation contouring b). after elevation contouring 

Climate Data

Envelope models need historic climate data to train on however raw weather station data is not at the climate temporal scale and also not continuous. Wang et al. 2016 took the daily weather data from 1961–1990 and aggerated it to the longer climacteric scale (30 years). They then used a combination of bilinear interpolation and local elevation adjustment to create continuous scale-free historic climate data across north America. Additionally Wang et al. 2016 focused on including bioclimatic variables for future ecological climate research. We  generated the historic climate data (1961–1990) that we trained the models on using the ClimateNA v6.40b softwarepackage, available at http://tinyurl.com/ClimateNA, based on methodology described by Wang et al. (2016)

Climate change projections simulate Earth's climate system and project future climate conditions based on different assumptions about future human activities, particularly greenhouse gas emissions (Mitchell et al., 2004; Wilby et al., 1997). We specifically  used an of ensemble 8GMC models  for our projections as described in Mahony et al. 2022. The climate  projection data was generated for the 2050s and for the 370 social shared pathway with the ClimateNA v6.40b softwarepackage, available at http://tinyurl.com/ClimateNA, based on methodology described by Wang et al. (2016). 

Feature Selection

Base on previous literature (Gray & Hamann 2011; O'Neill et al. 2017) we used ten climate variables that are important for tree growth but not overly correlated (climate variables are often highly correlated). We also calculated what climate variables are predicted to change the most and ended up adding the start and end of the frost period.


Modeling

We extracted the ecoregions and climate data at 5 km intervals across all of North America. There were 2270 unique ecosystems after elevation contouring. We removed any ecosystems that were smaller than 100 km squared resulting in 2123 unique ecosystems. We then randomly selected a balanced training and test set for modeling method comparison. We first trained an LDA model on the data. To prepare the data we transformed four features to create more normalized distributions and the scaled the data. Specifically we took the log of mean annual precipitation and the mean summer precipitation and the square root of the dryness indices. We also trained a Random forest with 100 estimators on the raw data. . Lastly we trained a neural network on the normalized data with 3 hidden layers and 256, 512, and 1024 nodes per layer respectively.