We created a random forest model that predicts land cover class as a function of annual climate variables, with an overall accuracy of 62.6%. The model was most accurate at predicting Urban and Tropical/sub-tropical broadleaf deciduous forest with 88.16% and 76.65% correct prediction rate respectively. The model was least accurate at predicting Temperate/subpolar grassland, Tropical/subtropical grassland, and Snow and Ice, only correctly predicting these classes in 25% to 28% of cases.
Pictured below is a map of the land cover classes predicted via random forest as a function of annual climate. A map of the original land cover classes is also included for reference (Figure 3).
When withholding data for the Urban and Cropland land cover classes from the random forest training dataset, the resulting model will not predict these classes. Using the resulting algorithm to make predictions on the full dataset will reclassify areas with these original land cover types with the natural classes that most closely match the area's annual climate variables. This model had an overall accuracy of 64.8%. Comparing original land cover classes to these predictions, we can see that 35.45% of points with the original class of Cropland are predicted as Temperate/sub-polar broadleaf deciduous, and 28.79% are predicted as Temperate/sub-polar grassland (Figure 4). For points originally classed as Urban, 48.31% are predicted as Temperate/sub-polar deciduous, 16.32% as Wetland, and 11.37% as Temperate/sub-polar needleleaf forest.
We trained a DNN model to predict land cover class as a function of annual climate variables, with an overall accuracy of 64.2%. The model was most accurate at predicting Cropland and Tropical/sub-tropical shrubland with 83.24% and 82.74% correct prediction rate respectively. The model was extremely inaccurate at predicting Tropical/subtropical grassland, with a correct prediction rate of only 7.10%.
Model 4 employs a DNN model to reclassify Cropland and Urban Landcover classes with an overall accuracy of 67.9%. Comparing original land cover classes to these predictions, we can see that 38.83% of points with the original class of Cropland are predicted as Temperate/sub-polar broadleaf deciduous, and 32.68% are predicted as Temperate/sub-polar grassland (Figure 7). For points originally classed as Urban, 52.48% are predicted as Temperate/sub-polar deciduous, 11.78% as Wetland, and 11.86% as Temperate/sub-polar needleleaf forest. These are the same top predicted classes as Model 2, with similar proportions. The removal of Cropland also had the side effect of improving predictive accuracy for several classes compared to Model 3, such as Tropical/subtropical broadleaf deciduous which showed an almost 10% increase in accuracy.
The confusion matrix between observed land cover classes and land cover class predicted by Model 1 (Figure 1) highlights the strengths and weaknesses of the model. The model is most accurate when it comes to predicting Urban and Temperate needleleaf forest, however shows markedly poor performance in predicting certain classes. Most concerningly, for points originally classed Snow and Ice, the model predicts more of a single erroneous class than the correct class (27.97% Snow and Ice, 38.45% Urban) - and a similar pattern is observed for predictions of Tropical/subtropical grassland. This is particularly troubling as we may expect snow and ice to be very well predicted by climate as it should correlate highly with low temperatures; this should be an area of further investigation when refining the model.
Comparing the maps of land cover as predicted by Model 1 (Figure 2) and true land cover (Figure 3) provides some insight into the high error rates of the model. The map of true land cover features many 'fuzzy' boundaries and dispersion of land cover classes, whereas the predicted land cover tends to 'smooth out' boundaries and create larger contiguous areas of a single land cover class. This may be due to the fact that the model is based purely on climatic variables, which may vary less over a small spatial extent than land cover class. This could possibly be addressed by adding a vegetative metric such as NDVI or VCF to the model. Adding vegetative metrics to the model may also result in better predictions of sparsely vegetated land cover classes such as Barren or Snow, but could possibly also result in increased confusion when reclassifying Urban zones (eg. reclassifying Urban as Barren due to low vegetation). Further investigation is required.
Model 2 resulted in predictions of land cover class in the absence of human development based on climate variables (Figure 4). The predictions exhibited a propensity for Temperate/sub-polar broadleaf deciduous, predicting 35.45% of Cropland and 48.31% of Urban as this class. Removing Cropland and Urban from the model also had the side effect of reducing confusion in other classes and increasing accuracy: for example, Tropical/sub-tropical shrubland increased to 90.10% accuracy from 72.58% in the original model. A similar pattern was observed between Models 3 and 4, where the exclusion of Cropland led to less class confusion. This suggests that we may be able to further increase the model's accuracy by removing other land cover classes that are of limited relevance to the prevalence of forest cover in relation to climate, such as Water.
The map of Model 2 predicted land cover (Figure 5) shows some spatial consistency in the reclassification of these classes. Areas previously designated as Cropland and Urban appear to generally be re-predicted as land cover classes that exist in proximity; for example, Cropland areas in the Eastern USA, which can be seen to exist within a matrix of Temperate grassland on the original land cover map (Figure 3) are reclassified to a large contiguous area of Temperate grassland by the model. This is a promising result as it demonstrates that the model is reclassifying Cropland in a manner that is consistent with spatial patterns of land cover class.
Comparing the base Random Forest and DNN models (Models 1 and 3) reveals that the models had varying strengths and weaknesses. The DNN model had a slightly higher accuracy than the Random forest model (64.2% vs 62.6%). The two models varied with which classes were well predicted and which classes were poorly predicted - for example, the DNN was markedly stronger than RF at predicting Snow and Ice, but weaker at predicting Tropical/subtropical grassland. This may be taken into consideration when choosing a model to use in further applications - users will likely want to use the model that is stronger at predicting classes relevant to their study area.
No model created in this study achieved an overall accuracy rate over 90%. The results of this study highlight the need for model refinement in order to minimize error when using modeled data in further analysis. Along with the previously discussed avenues of model refinement, future work may seek to improve model accuracy by tweaking model parameters, removing highly correlated predictor variables, or including new predictors such as topology. Additionally, to minimize error when applying predicted landcover, we may choose to only use predicted land cover class in cases where the land cover class is a human development class, and use the observed classes everywhere else. Despite the study's failure to develop highly accurate models for the reclassification of human developed land cover classes to natural classes, the study was still able to produce reclassified land cover delineations for these areas, and in the case of the DNN models, the probability of the existence of a landcover class at a given point. Once model refinement has raised accuracy to acceptable thresholds, the model's predictions can then be applied to various applications.