R has hundreds of packages that perform specific tasks and you will use often.Contributed by Agustin Lobo
Contents of this page is available under We need rgdal now. For installing it, use the menu or type: install.packages("rgdal")> require(rgdal)Loading required package: rgdalLoading required package: spGeospatial Data Abstraction Library extensions to R successfully loadedLoaded GDAL runtime: GDAL 1.6.2, released 2009/07/31Path to GDAL shared files: /usr/share/gdal16Loaded PROJ.4 runtime: Rel. 4.6.1, 21 August 2008Path to PROJ.4 shared files: (autodetected)and repeat require(rgdal). Once installed, the packages remain installed and you do not have to re-install them in subsequent sessions, but you must use require() to load them for each session in which you need them. This is done to save memory: you just load what you need. We start by importing the data, which is the set of shape files that you produced with QGIS: > ndvidataSDF <- readOGR(dsn="/media/Transcend/MASTER_ICTA2007_2008/ExercisesJEMES/Exercise3/samplingpointsvalues_2",layer="samplingpointsvalues_2")OGR data source with driver: ESRI Shapefile Source: "/media/Transcend/MASTER_ICTA2007_2008/ExercisesJEMES/Exercise3/samplingpointsvalues_2", layer: "samplingpointsvalues_2"with 500 features and 25 fieldsFeature type: wkbPoint with 2 dimensionsndvidataSDF is a complex spatial object, according to definitions in package sp (this is why rgdal requires sp): > class(ndvidataSDF)The "@" are slots that organize the structure of sub-objects within the spatial object. Note that all files of the shape set become one single spatial object in R, so it has to be a complex object. While we will not be using it, in order to make sure we have really imported what we intended to import, you can plot the spatial object just using plot() (remember R is object-oriented): > plot(ndvidataSDF)Here we will be using just the values which originally were in the *.dbf file, which are the ones in slot @data now, so we select this data for the rest of the exercise: > ndvidata <- ndvidataSDF@datahead(ndvidata) reveals that the table of the vector shape file we are importing has many more columns (variables) than we actually need: > head(ndvidata) ELEMENTID AREA AREA_HA SLOPE CLOUDS_SUB DEM10 NDVI2003F SHDEM101 0 41677529 4167.753 8.98384 1 631 64.7887 0.013642 0 41677529 4167.753 19.46910 1 538 62.2951 0.226893 0 41677529 4167.753 12.05580 1 580 70.0000 0.131454 0 41677529 4167.753 15.61440 1 590 83.8710 -0.131885 0 41677529 4167.753 17.15500 1 588 58.3333 -0.220126 0 41677529 4167.753 14.72260 1 622 120.8790 0.16212 ASPECT NDVI2005 NDVI2004 S N W E A BN BS BW1 288.43500 98.5764 76.6334 NEGATIVO NEGATIVO VERDADERO NEGATIVO 0 0 0 12 45.00000 135.2460 124.5400 NEGATIVO VERDADERO NEGATIVO NEGATIVO 0 1 0 03 110.55600 83.3185 70.0233 NEGATIVO NEGATIVO NEGATIVO VERDADERO 0 0 0 04 280.30500 104.3720 85.4120 NEGATIVO NEGATIVO VERDADERO NEGATIVO 0 0 0 15 238.24100 67.8495 66.3468 NEGATIVO NEGATIVO VERDADERO NEGATIVO 0 0 0 16 2.72631 119.2950 108.2880 NEGATIVO VERDADERO NEGATIVO NEGATIVO 0 1 0 0 BE CN CS CW CE NDVI_TOTAL1 0 0 0 0 0 342 0 0 0 0 0 733 1 0 0 0 0 134 0 0 0 0 0 215 0 0 0 0 0 106 0 0 0 0 0 -2We also observe that ELEMENTID is not a unique ID for each case, so we redefine this variable as: > ndvidataori <- ndvidata> ndvidata <- ndvidata[,1:11]> ndvidata[,1] <- 1:nrow(ndvidata)
> head(ndvidata) ELEMENTID AREA AREA_HA SLOPE CLOUDS_SUB DEM10 NDVI2003F SHDEM101 1 41677529 4167.753 8.98384 1 631 64.7887 0.013642 2 41677529 4167.753 19.46910 1 538 62.2951 0.226893 3 41677529 4167.753 12.05580 1 580 70.0000 0.131454 4 41677529 4167.753 15.61440 1 590 83.8710 -0.131885 5 41677529 4167.753 17.15500 1 588 58.3333 -0.220126 6 41677529 4167.753 14.72260 1 622 120.8790 0.16212 ASPECT NDVI2005 NDVI20041 288.43500 98.5764 76.63342 45.00000 135.2460 124.54003 110.55600 83.3185 70.02334 280.30500 104.3720 85.41205 238.24100 67.8495 66.34686 2.72631 119.2950 108.2880CLOUDS_SUB, the 5th column in this example, is the variable indicating the presence of clouds, which we can check with summary() and table() > summary(ndvidata$CLOUDS_SUB)> summary(ndvidata[,5]) 0 1 94 406 > table(ndvidata[,5]) 0 1 94 406 In order to discard those observations having clouds, we first make a second object ndvidata2 and then have two alternatives. The first one is the simplest: > table(ndvidata[,5]) 0 1 94 406 > ndvidata2 <- ndvidata> ndvidata2 <- ndvidata2[ndvidata2$CLOUDS_SUB==1,]> head(ndvidata2) ELEMENTID AREA AREA_HA SLOPE CLOUDS_SUB DEM10 NDVI2003F SHDEM10
1 1 41677529 4167.753 8.98384 1 631 64.7887 0.01364
2 2 41677529 4167.753 19.46910 1 538 62.2951 0.22689
3 3 41677529 4167.753 12.05580 1 580 70.0000 0.13145
4 4 41677529 4167.753 15.61440 1 590 83.8710 -0.13188
5 5 41677529 4167.753 17.15500 1 588 58.3333 -0.22012
6 6 41677529 4167.753 14.72260 1 622 120.8790 0.16212
ASPECT NDVI2005 NDVI2004
1 288.43500 98.5764 76.6334
2 45.00000 135.2460 124.5400
3 110.55600 83.3185 70.0233
4 280.30500 104.3720 85.4120
5 238.24100 67.8495 66.3468
6 2.72631 119.2950 108.2880> table(ndvidata2[,5]) 1 406 The second alternative recodes "0" as NA and then filters out cases with NA by making use of na.omit(): > table(ndvidata[,5]) 0 1 94 406 > ndvidata2 <- ndvidata> ndvidata2$CLOUDS_SUB[ndvidata2$CLOUDS_SUB==0] <- NA> ndvidata2 <- na.omit(ndvidata2)> head(ndvidata2) ELEMENTID AREA AREA_HA SLOPE CLOUDS_SUB DEM10 NDVI2003F SHDEM101 1 41677529 4167.753 8.98384 1 631 64.7887 0.013642 2 41677529 4167.753 19.46910 1 538 62.2951 0.226893 3 41677529 4167.753 12.05580 1 580 70.0000 0.131454 4 41677529 4167.753 15.61440 1 590 83.8710 -0.131885 5 41677529 4167.753 17.15500 1 588 58.3333 -0.220126 6 41677529 4167.753 14.72260 1 622 120.8790 0.16212 ASPECT NDVI2005 NDVI20041 288.43500 98.5764 76.63342 45.00000 135.2460 124.54003 110.55600 83.3185 70.02334 280.30500 104.3720 85.41205 238.24100 67.8495 66.34686 2.72631 119.2950 108.2880> table(ndvidata2[,5])
1
401 If you review the output of summary(ndvidata) you observe that there are 6 cases in which ASPECT is NA also (a result of the algorithm used by QGIS to calculate aspect from the DEM layer). This is obviously dependent upon the actual data you are using, but, in general, this second method is better for this purpose since you discard any observation with at least one NA in any variable: > summary(ndvidata)Now we are going to define the terrain classes according to (http://www.iiasa.ac.at/Research/LUC/GAEZ/landres/terslopes.htm) : Class a: level to undulating, dominant slopes ranging between 0% and 8% Class bN: rolling to hilly, dominant slopes ranging between 8% and 30%, facing N (315 - 45) Class bE: facing E (45 - 135) Class bS: facing S (135 - 225) Class bW: facing W (225 - 315) Class cN: steeply dissected to mountainous, dominant slopes over 30%, facing N (315 - 45) Class cE: facing E (45 - 135) Class cS: facing S (135 - 225) Class cW: facing W (225 - 315) As SLOPE comes from QGIS calculated in degrees, we create a new variable SLOPEPER as SLOPEPER = 100*tan(SLOPE) > SLOPEPER <- 100*tan(ndvidata2$SLOPE*pi/180)> hist(SLOPEPER)> ndvidata2 <- cbind(ndvidata2,SLOPEPER=SLOPEPER)> ndvidata2[1,] ELEMENTID AREA AREA_HA SLOPE CLOUDS_SUB DEM10 NDVI2003F SHDEM101 1 41677529 4167.753 8.98384 1 631 64.7887 0.01364 ASPECT NDVI2005 NDVI2004 SLOPEPER1 288.435 98.5764 76.6334 15.80953Note that we multiply the angle by pi/180 to convert into radians, which are the units expected by tan(x) (see help(tan)) In a similar way we also create ASPCLAS as a 4-classes vector of orientation: > ASPCLAS <- ndvidata2$ASPECT * 0> ASPCLAS[ndvidata2$ASPECT>=(360-45) | ndvidata2$ASPECT<45] <- "N"> ASPCLAS[ndvidata2$ASPECT>=45 & ndvidata2$ASPECT<(90+45)] <- "E"> ASPCLAS[ndvidata2$ASPECT>=( 90+45) & ndvidata2$ASPECT<(180+45)] <- "S"> ASPCLAS[ndvidata2$ASPECT>=(180+45) & ndvidata2$ASPECT<(270+45)] <- "W"> table(ASPCLAS)ASPCLAS E N S W 102 68 97 134 > and now we are ready to create the vector of combined terrain classes TERRCLAS: > TERRCLAS = SLOPEPER*0> TERRCLAS[ndvidata2$SLOPEPER>=0 & ndvidata2$SLOPEPER<8] <- "a"> TERRCLAS[ndvidata2$SLOPEPER>=8 & ndvidata2$SLOPEPER<30 & ASPCLAS=="N"] <- "bN"> TERRCLAS[ndvidata2$SLOPEPER>=8 & ndvidata2$SLOPEPER<30 & ASPCLAS=="E"] <- "bE"> TERRCLAS[ndvidata2$SLOPEPER>=8 & ndvidata2$SLOPEPER<30 & ASPCLAS=="S"] <- "bS"> TERRCLAS[ndvidata2$SLOPEPER>=8 & ndvidata2$SLOPEPER<30 & ASPCLAS=="W"] <- "bW"> TERRCLAS[ndvidata2$SLOPEPER>=30 & ASPCLAS=="N"] <- "cN"> TERRCLAS[ndvidata2$SLOPEPER>=30 & ASPCLAS=="E"] <- "cE"> TERRCLAS[ndvidata2$SLOPEPER>=30 & ASPCLAS=="S"] <- "cS"> TERRCLAS[ndvidata2$SLOPEPER>=30 & ASPCLAS=="W"] <- "cW"> table(TERRCLAS)!!!!> ndvidata2 <- cbind(ndvidata2,ASPCLAS=ASPCLAS,TERRCLAS=TERRCLAS)> ndvidata2[1,] ELEMENTID AREA AREA_HA SLOPE CLOUDS_SUB DEM10 NDVI2003F SHDEM101 1 41677529 4167.753 8.98384 1 631 64.7887 0.01364 ASPECT NDVI2005 NDVI2004 SLOPEPER ASPCLAS TERRCLAS1 288.435 98.5764 76.6334 15.80953 W bWA barplot is a good way of visualizing the output of table: > barplot(table(ndvidata2$TERRCLAS))You can actually have both the graphic and the actual values on it: > x <- barplot(table(ndvidata2$TERRCLAS))> y <- table(ndvidata2$TERRCLAS)> barplot(table(ndvidata2$TERRCLAS),ylim=c(0,150))> text(x,y+3,y)In both cases, we observe that, with the actual dataset that we imported from the vector shape layer in this case, the number of cases is sufficient for statistical analysis for all terrain classes. We plot NDVI values of 2003 and 2005 against the terrain classes: > par(mfrow=c(1,2))> boxplot(ndvidata2$NDVI2003F~as.factor(ndvidata2$TERRCLAS),ylim=c(0,200),main="2003",xlabel=ndvidata2$TERRCLAS)> boxplot(ndvidata2$NDVI2005~as.factor(ndvidata2$TERRCLAS),ylim=c(0,200),main="2005",xlabel=ndvidata2$TERRCLAS)The variable of interest is (NDVI2005-NDVI2003F), which we assume is tracking vegetation recover. In order to check the distributions for each terrain class, we take advantage of package lattice, which you must install if you don't have it yet: > require(lattice)Loading required package: lattice> histogram(~(NDVI2005-NDVI2003F)|TERRCLAS,data=ndvidata2)As all distributions approach normality, we can run a simple one-way ANOVA: > ndviterraov <- aov((NDVI2005-NDVI2003F)~as.factor(TERRCLAS), data=ndvidata2)
> summary(ndviterraov)
Df Sum Sq Mean Sq F value Pr(>F)
as.factor(TERRCLAS) 8 13979 1747 3.8805 0.0002027 ***
Residuals 392 176522 450
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 The ANOVA states that there are significant differences on NDVI change between 2005 and 2003 among terrain classes (the probability of being wrong if we reject the null hypothesis (=no differences among classes) is 0.0002027) A boxplot with notches let us visualize which classes are actually different: if the notches of a given pair boxes' notches do not overlap there is strong evidence that their medians differ (see Chambers et al., 1983, Graphical Methods for Data Analysis. Wadsworth & Brooks/Cole. p. 62). The test "Tukey Honest Significant Differences" let us check the associated probabilities: > boxplot((ndvidata2$NDVI2005-ndvidata2$NDVI2003F)~ndvidata2$TERRCLAS,ylim=c(-50,100),main="2005 - 2003",notch=T)> TukeyHSD(ndviterraov) Tukey multiple comparisons of means 95% family-wise confidence levelFit: aov(formula = (NDVI2005 - NDVI2003F) ~ as.factor(TERRCLAS), data = ndvidata2)$`as.factor(TERRCLAS)` diff lwr upr p adjbE-a 5.96550930 -6.5474644 18.478483 0.8611788bN-a 15.35127941 0.6962344 30.006324 0.0320245bS-a -2.09981870 -14.9117626 10.712125 0.9998806bW-a 4.01048170 -7.8216122 15.842576 0.9797102cE-a 11.60517773 -3.9634984 27.173854 0.3295585cN-a 15.70877475 -0.6760637 32.093613 0.0719880cS-a 0.23079559 -15.3378805 15.799472 1.0000000cW-a 11.62887870 -2.6652480 25.923005 0.2173435bN-bE 9.38577011 -4.7396884 23.511229 0.4936919bS-bE -8.06532800 -20.2679564 4.137300 0.5012739bW-bE -1.95502760 -13.1244806 9.214425 0.9998039cE-bE 5.63966843 -9.4315594 20.710896 0.9628237cN-bE 9.74326546 -6.1696590 25.656190 0.6073252cS-bE -5.73471371 -20.8059416 9.336514 0.9589604cW-bE 5.66336940 -8.0872776 19.414016 0.9354086bS-bN -17.45109811 -31.8420667 -3.060130 0.0055680bW-bN -11.34079771 -24.8667918 2.185196 0.1835564cE-bN -3.74610168 -20.6380449 13.145841 0.9988743cN-bN 0.35749534 -17.2895157 18.004506 1.0000000cS-bN -15.12048382 -32.0124270 1.771459 0.1209813cW-bN -3.72240072 -19.4474167 12.002615 0.9982017bW-bS 6.11030040 -5.3930938 17.613695 0.7722948cE-bS 13.70499643 -1.6153595 29.025352 0.1215149cN-bS 17.80859345 1.6595204 33.957667 0.0184469cS-bS 2.33061429 -12.9897416 17.650970 0.9999322cW-bS 13.72869739 -0.2945579 27.751953 0.0603540cE-bW 7.59469603 -6.9161911 22.105583 0.7863649cN-bW 11.69829306 -3.6849804 27.081567 0.3024023cS-bW -3.77968611 -18.2905733 10.731201 0.9964734cW-bW 7.61839700 -5.5156895 20.752484 0.6760565cN-cE 4.10359702 -14.3091793 22.516373 0.9988334cS-cE -11.37438214 -29.0648054 6.316041 0.5403991cW-cE 0.02370097 -16.5560897 16.603492 1.0000000cS-cN -15.47797917 -33.8907555 2.934797 0.1808034cW-cN -4.07989606 -21.4283459 13.268554 0.9982838cW-cS 11.39808311 -5.1817075 27.977874 0.4446260According to this, moderate slopes facing N vs. flat terrain and, in particular, steeper slopes facing S vs. N are the ones with significant differences in terms of increase of NDVI. A complementary way of testing the effect of terrain on the recovery of vegetation after fire as recorded by NDVI, is to regress the increase of NDVI against the cosinus of the angle of incidence of solar radiation over the surface (COSI), which we can calculate as (http://www.isprs.org/congresses/beijing2008/proceedings/6b_pdf/40.pdf): > szen= (90-63.119507)*pi/180where szen and saz are the solar zenith and azimuth angles, respectively. A figure representing solar angles can be found in http://www.solsticeamateur.com/AzimuthElevation.jpg . For the actual values, dependent upon the day and time, we take the ones provided for the 2003 image. The idea behind this approach is that assuming a uniform solar radiation on the top of the atmosphere for this region, the differences on the actual incidence of solar radiation on the surface are essentially due to topographic effects. Note that this approach is not considering shading. COSI is highest for flat terrain and monotonously decreases as the angle of incidence increases: > plot(seq(0,90,1), cos(seq(0,90,1)*pi/180)> lines(sort(acos(cosi)*180/pi),cosi[order(acos(cosi))],col="red",lwd=2)The relationship of COSI and the previously calculated terrain classes can be explored through the boxplot > boxplot(cosi~ndvidata2$TERRCLAS)Finally, we calculate and plot the linear regression: > plot(cosi,(ndvidata2$NDVI2005-ndvidata2$NDVI2003F))> cosilm <- lm((ndvidata2$NDVI2005-ndvidata2$NDVI2003F)~cosi)> abline(coef(cosilm),lwd=2,col="red")> summary(cosilm)According to these results, there is a significant effect of COSI on NDVI increase, but this effect accounts for only a 5% of the total varaibility of NDVI increase. This result is consistent with the previous result of the ANOVA by terrain classes, where only moderate slopes facing N vs. flat terrain and steeper slopes facing S vs. N where showing significant differences on NDVI increase. |
