Methods

Data Collection

Interpolated weather station data was generated using ClimateNA software (Wang et al., 2016) for the 1960s (1951-1980 normal period). 25 annual climate variables were used in the study, which can be found in the "Climate Variable Glossary" section of this page.
Land cover data was collected via MODIS (2005) at a 250m resolution raster, which was resampled to 1km for computational efficiency.

Study locations were generated via a grid of points in 5km intervals covering North America, for a total sample size of 892209 locations (herein referred to as 'master locations').
Annual climate variables were generated for all locations using ClimateNA. Erroneous values (-9999) were removed.
These locations were systematically sampled to 279793 locations (herein referred to as 'sample locations') in order to reduce computational demand while preserving representation.
Land cover class was spatially assigned to locations from the land cover raster. NA values were removed.

The models created for this study can be summarized as:

- Model 1 - Random Forest
- Model 2 - Random Forest, Human Classes Removed
- Model 3 - DNN
- Model 4 - DNN, Human Classes Removed

Further details on the models used can be found in the sections below.

Random forest training groups were generated as a stratified sample of landcover class in order to preserve proportionality of classes, with a sample fraction of 10% (~28000 locations).
Random forest models predicted land cover class as a function of 25 annual climate variables and elevation, with 500 trees and 5 variables sampled at each split.
For model validation, predicted land cover classes were generated onto the master locations and then compared to observed land cover class.
To reclassify human development land cover classes, these classes (Cropland and Urban) were removed from the training dataset, which was then used to generate a new random forest model using the same predictor variables and parameters used above. The resulting classification algorithm was then used to make predictions on the master locations.

Similar to the Random Forest models, DNN training groups were generated as a stratified sample of landcover class. However, because DNNs can efficiently process much higher volumes of data, a sample fraction of 60% was used (~160 000 locations).
DNN models were developed with the Tensorflow Python model architecture and Keras R package. Models were created with an input layer of 4096 neurons, 7 hidden layers using 'ReLu' activation functions, and an output layer with softmax activation for predicting class probability. Models employed cross-entropy loss function for classification. Full model architecture can be found in the Data Exploration section - Appendix A.
DNN models predicted probability of landcover class using the same climate variables as the Random Forest models, elevation, as well as Latitude and Longitude.
For model validation, the models were tested on the remaining 40% of sample data not included in the training data, comparing predicted landcover class to observed land cover class.

Below are the suite of annual climate variables generated by ClimateNA used in this study:

Directly calculated annual variables:

MAT: mean annual temperature (°C)

MWMT: mean warmest month temperature (°C)

MCMT: mean coldest month temperature (°C)

TD: temperature difference between MWMT and MCMT, or continentality (°C)

MAP: mean annual precipitation (mm)

AHM: annual heat-moisture index (MAT+10)/(MAP/1000))

SHM: summer heat-moisture index ((MWMT)/(MSP/1000))

Derived annual variables:

DD<0 (or DD_0): degree-days below 0°C, chilling degree-days

DD>5 (or DD5): degree-days above 5°C, growing degree-days

DD<18 (or DD_18): degree-days below 18°C, heating degree-days

DD>18 (or DD18) : degree-days above 18°C, cooling degree-days

NFFD: the number of frost-free days

FFP: frost-free period

bFFP: the day of the year on which FFP begins

eFFP: the day of the year on which FFP ends

PAS: precipitation as snow (mm). For individual years, it covers the period between August in the previous year and July in the current year

EMT: extreme minimum temperature over 30 years (°C)

EXT : extreme maximum temperature over 30 years (°C)

Eref: Hargreaves reference evaporation (mm)

CMD: Hargreaves climatic moisture deficit (mm)

MAR: mean annual solar radiation (MJ m‐2 d‐1)

RH: mean annual relative humidity (%)

CMI: Hogg’s climate moisture index (mm)

DD1040: degree-days above 10°C and below 40°C

Page updated

Report abuse