Introduction to R: Session 3


Contributed by Agustin Lobo
Contents of this page is available under

Data Analysis of Exercise Re-vegetation rate after fire. Are there topographic differences? with R

R has hundreds of packages that perform specific tasks and you will use often.
We need rgdal now. For installing it, use the menu or type:
install.packages("rgdal")
select a repository and proceed. Then type:
> require(rgdal)
Loading required package: rgdal
Loading required package: sp
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 1.6.2, released 2009/07/31
Path to GDAL shared files: /usr/share/gdal16
Loaded PROJ.4 runtime: Rel. 4.6.1, 21 August 2008
Path to PROJ.4 shared files: (autodetected)
to load the package. Note that another package (sp) gets loaded as a dependency. If  by any reason you do not have it installed, you have to install it as well
and repeat require(rgdal).
Once installed, the packages remain installed and you do not have to re-install them in subsequent sessions, but you must use require() to load them for each session
in which you need them. This is done to save memory: you just load what you need.

We start by importing the data, which is the set of shape files that you produced with QGIS:
> ndvidataSDF <- readOGR(dsn="/media/Transcend/MASTER_ICTA2007_2008/ExercisesJEMES/Exercise3/samplingpointsvalues_2",layer="samplingpointsvalues_2")
OGR data source with driver: ESRI Shapefile
Source: "/media/Transcend/MASTER_ICTA2007_2008/ExercisesJEMES/Exercise3/samplingpointsvalues_2", layer: "samplingpointsvalues_2"
with 500 features and 25 fields
Feature type: wkbPoint with 2 dimensions
where dsn is the directory and layer the name of your shape files

ndvidataSDF is a complex spatial object, according to definitions in package sp (this is why rgdal requires sp):
> class(ndvidataSDF)
[1] "SpatialPointsDataFrame"
attr(,"package")
[1] "sp"
> str(ndvidataSDF,max.level=2)
Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
  ..@ data       :'data.frame':    500 obs. of  25 variables:
  ..@ coords.nrs : num(0)
  ..@ coords     : num [1:500, 1:2] 418144 418829 421918 420216 422408 ...
  .. ..- attr(*, "dimnames")=List of 2
  ..@ bbox       : num [1:2, 1:2] 417481 4613259 425773 4623230
  .. ..- attr(*, "dimnames")=List of 2
  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slots

str() reveals the structure of an R object. Note that max.level=2 in str() is important here to abbreviate the otherwise very long output produced by str() for spatial objects.
The "@" are slots that organize the structure of sub-objects within the spatial object. Note that all files of the shape set become one single spatial object in R, so it has to be a complex object.

While we will not be using it, in order to make sure we have really imported what we intended to import, you can plot the spatial object just using plot() (remember R is object-oriented):
> plot(ndvidataSDF)

Here we will be using just the values which originally were in the *.dbf file, which are the ones in slot @data now, so we select this data for the rest of the exercise:
> ndvidata <-  ndvidataSDF@data

head(ndvidata)
reveals that the table of the vector shape file we are importing has many more columns (variables) than we actually need:
> head(ndvidata)
  ELEMENTID     AREA  AREA_HA    SLOPE CLOUDS_SUB DEM10 NDVI2003F  SHDEM10
1         0 41677529 4167.753  8.98384          1   631   64.7887  0.01364
2         0 41677529 4167.753 19.46910          1   538   62.2951  0.22689
3         0 41677529 4167.753 12.05580          1   580   70.0000  0.13145
4         0 41677529 4167.753 15.61440          1   590   83.8710 -0.13188
5         0 41677529 4167.753 17.15500          1   588   58.3333 -0.22012
6         0 41677529 4167.753 14.72260          1   622  120.8790  0.16212
     ASPECT NDVI2005 NDVI2004        S         N         W         E A BN BS BW
1 288.43500  98.5764  76.6334 NEGATIVO  NEGATIVO VERDADERO  NEGATIVO 0  0  0  1
2  45.00000 135.2460 124.5400 NEGATIVO VERDADERO  NEGATIVO  NEGATIVO 0  1  0  0
3 110.55600  83.3185  70.0233 NEGATIVO  NEGATIVO  NEGATIVO VERDADERO 0  0  0  0
4 280.30500 104.3720  85.4120 NEGATIVO  NEGATIVO VERDADERO  NEGATIVO 0  0  0  1
5 238.24100  67.8495  66.3468 NEGATIVO  NEGATIVO VERDADERO  NEGATIVO 0  0  0  1
6   2.72631 119.2950 108.2880 NEGATIVO VERDADERO  NEGATIVO  NEGATIVO 0  1  0  0
  BE CN CS CW CE NDVI_TOTAL
1  0  0  0  0  0         34
2  0  0  0  0  0         73
3  1  0  0  0  0         13
4  0  0  0  0  0         21
5  0  0  0  0  0         10
6  0  0  0  0  0         -2
so we make a backup object (ndvidataori) just in case and select the first 10 columns for ndvidata.

We also observe that ELEMENTID is not a unique ID for each case, so we redefine this variable as:
> ndvidataori <- ndvidata
> ndvidata <- ndvidata[,1:11]
> ndvidata[,1] <- 1:nrow(ndvidata)
> head(ndvidata)
  ELEMENTID     AREA  AREA_HA    SLOPE CLOUDS_SUB DEM10 NDVI2003F  SHDEM10
1         1 41677529 4167.753  8.98384          1   631   64.7887  0.01364
2         2 41677529 4167.753 19.46910          1   538   62.2951  0.22689
3         3 41677529 4167.753 12.05580          1   580   70.0000  0.13145
4         4 41677529 4167.753 15.61440          1   590   83.8710 -0.13188
5         5 41677529 4167.753 17.15500          1   588   58.3333 -0.22012
6         6 41677529 4167.753 14.72260          1   622  120.8790  0.16212
     ASPECT NDVI2005 NDVI2004
1 288.43500  98.5764  76.6334
2  45.00000 135.2460 124.5400
3 110.55600  83.3185  70.0233
4 280.30500 104.3720  85.4120
5 238.24100  67.8495  66.3468
6   2.72631 119.2950 108.2880

CLOUDS_SUB, the 5th column in this example, is the variable indicating the presence of clouds, which we can check with summary() and table()
> summary(ndvidata$CLOUDS_SUB)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  0.000   1.000   1.000   0.812   1.000   1.000

> summary(ndvidata[,5])
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  0.000   1.000   1.000   0.812   1.000   1.000
> table(ndvidata$CLOUDS_SUB)

  0   1
 94 406
> table(ndvidata[,5])
  0   1
 94 406
Note we select the variable using either its name or its position

In order to discard those observations having clouds, we first make a second object ndvidata2 and then have two alternatives.
The first one is the simplest:
> table(ndvidata[,5])
  0   1
 94 406
> ndvidata2 <- ndvidata
> ndvidata2 <- ndvidata2[ndvidata2$CLOUDS_SUB==1,]
> head(ndvidata2)
  ELEMENTID     AREA  AREA_HA    SLOPE CLOUDS_SUB DEM10 NDVI2003F  SHDEM10
1         1 41677529 4167.753  8.98384          1   631   64.7887  0.01364
2         2 41677529 4167.753 19.46910          1   538   62.2951  0.22689
3         3 41677529 4167.753 12.05580          1   580   70.0000  0.13145
4         4 41677529 4167.753 15.61440          1   590   83.8710 -0.13188
5         5 41677529 4167.753 17.15500          1   588   58.3333 -0.22012
6         6 41677529 4167.753 14.72260          1   622  120.8790  0.16212
     ASPECT NDVI2005 NDVI2004
1 288.43500  98.5764  76.6334
2  45.00000 135.2460 124.5400
3 110.55600  83.3185  70.0233
4 280.30500 104.3720  85.4120
5 238.24100  67.8495  66.3468
6   2.72631 119.2950 108.2880
> table(ndvidata2[,5])
  1
406


The second alternative recodes "0" as NA and then filters out cases with NA by making use of na.omit():
> table(ndvidata[,5])
  0   1
 94 406
> ndvidata2 <- ndvidata
> ndvidata2$CLOUDS_SUB[ndvidata2$CLOUDS_SUB==0] <- NA
> ndvidata2 <- na.omit(ndvidata2)
> head(ndvidata2)
  ELEMENTID     AREA  AREA_HA    SLOPE CLOUDS_SUB DEM10 NDVI2003F  SHDEM10
1         1 41677529 4167.753  8.98384          1   631   64.7887  0.01364
2         2 41677529 4167.753 19.46910          1   538   62.2951  0.22689
3         3 41677529 4167.753 12.05580          1   580   70.0000  0.13145
4         4 41677529 4167.753 15.61440          1   590   83.8710 -0.13188
5         5 41677529 4167.753 17.15500          1   588   58.3333 -0.22012
6         6 41677529 4167.753 14.72260          1   622  120.8790  0.16212
     ASPECT NDVI2005 NDVI2004
1 288.43500  98.5764  76.6334
2  45.00000 135.2460 124.5400
3 110.55600  83.3185  70.0233
4 280.30500 104.3720  85.4120
5 238.24100  67.8495  66.3468
6   2.72631 119.2950 108.2880
> table(ndvidata2[,5])
  1
401
Note that the result is not the same: 406 cases through the first method, 401 cases through the second.
If you review the output of summary(ndvidata) you observe that there are 6 cases in which ASPECT is NA also (a result of the algorithm used by QGIS to calculate aspect from the DEM layer).
This is obviously dependent upon the actual data you are using, but, in general, this second method is better for this purpose since you discard any observation with at least one NA in any variable:
> summary(ndvidata)
   ELEMENTID          AREA             AREA_HA         SLOPE      
 Min.   :  1.0   Min.   :41677529   Min.   :4168   Min.   : 0.000 
 1st Qu.:125.8   1st Qu.:41677529   1st Qu.:4168   1st Qu.: 6.762 
 Median :250.5   Median :41677529   Median :4168   Median :12.116 
 Mean   :250.5   Mean   :41677529   Mean   :4168   Mean   :13.138 
 3rd Qu.:375.2   3rd Qu.:41677529   3rd Qu.:4168   3rd Qu.:18.540 
 Max.   :500.0   Max.   :41677529   Max.   :4168   Max.   :33.703 
                                                                  
   CLOUDS_SUB        DEM10         NDVI2003F         SHDEM10       
 Min.   :0.000   Min.   :476.0   Min.   : 13.33   Min.   :-0.22689 
 1st Qu.:1.000   1st Qu.:568.8   1st Qu.: 50.00   1st Qu.:-0.09025 
 Median :1.000   Median :612.0   Median : 60.12   Median : 0.06023 
 Mean   :0.812   Mean   :626.6   Mean   : 63.19   Mean   : 0.03391 
 3rd Qu.:1.000   3rd Qu.:665.0   3rd Qu.: 72.00   3rd Qu.: 0.16044 
 Max.   :1.000   Max.   :932.0   Max.   :166.90   Max.   : 0.22689 
                                                                   
     ASPECT           NDVI2005         NDVI2004     
 Min.   :  1.736   Min.   : 28.12   Min.   : -2.513 
 1st Qu.:107.441   1st Qu.: 70.94   1st Qu.: 60.485 
 Median :192.494   Median : 87.37   Median : 75.659 
 Mean   :186.543   Mean   : 88.18   Mean   : 75.910 
 3rd Qu.:269.349   3rd Qu.:106.42   3rd Qu.: 93.319 
 Max.   :360.000   Max.   :159.78   Max.   :217.545 
 NA's   :  6.000                                    

Now we are going to define the terrain classes according to (http://www.iiasa.ac.at/Research/LUC/GAEZ/landres/terslopes.htm) :
Class a:   level to undulating, dominant slopes ranging between 0% and 8%
Class bN:  rolling to hilly, dominant slopes ranging between 8% and 30%, facing N (315 - 45)
Class bE:  facing E  (45 - 135)
Class bS:  facing S  (135 - 225)
Class bW: facing W (225 - 315)
Class cN:  steeply dissected to mountainous, dominant slopes over 30%, facing N (315 - 45)
Class cE:  facing E  (45 - 135)
Class cS:  facing S  (135 - 225)
Class cW: facing W (225 - 315)

As SLOPE comes from QGIS calculated in degrees, we create a new variable SLOPEPER as
SLOPEPER = 100*tan(SLOPE)
> SLOPEPER <- 100*tan(ndvidata2$SLOPE*pi/180)
> hist(SLOPEPER)
> ndvidata2 <- cbind(ndvidata2,SLOPEPER=SLOPEPER)
> ndvidata2[1,]
  ELEMENTID     AREA  AREA_HA   SLOPE CLOUDS_SUB DEM10 NDVI2003F SHDEM10
1         1 41677529 4167.753 8.98384          1   631   64.7887 0.01364
   ASPECT NDVI2005 NDVI2004 SLOPEPER
1 288.435  98.5764  76.6334 15.80953

Note that we multiply the angle by pi/180 to convert into radians, which are the units expected by tan(x) (see help(tan))
In a similar way we also create ASPCLAS as a 4-classes vector of orientation:
> ASPCLAS <- ndvidata2$ASPECT * 0
> ASPCLAS[ndvidata2$ASPECT>=(360-45) | ndvidata2$ASPECT<45] <- "N"
> ASPCLAS[ndvidata2$ASPECT>=45 & ndvidata2$ASPECT<(90+45)]  <- "E"
> ASPCLAS[ndvidata2$ASPECT>=( 90+45) & ndvidata2$ASPECT<(180+45)] <- "S"
> ASPCLAS[ndvidata2$ASPECT>=(180+45) & ndvidata2$ASPECT<(270+45)] <- "W"
> table(ASPCLAS)
ASPCLAS
  E   N   S   W
102  68  97 134
>

and now we are ready to create the vector of combined terrain classes TERRCLAS:
> TERRCLAS = SLOPEPER*0
> TERRCLAS[ndvidata2$SLOPEPER>=0 & ndvidata2$SLOPEPER<8] <- "a"
> TERRCLAS[ndvidata2$SLOPEPER>=8 & ndvidata2$SLOPEPER<30 & ASPCLAS=="N"] <- "bN"
> TERRCLAS[ndvidata2$SLOPEPER>=8 & ndvidata2$SLOPEPER<30 & ASPCLAS=="E"] <- "bE"
> TERRCLAS[ndvidata2$SLOPEPER>=8 & ndvidata2$SLOPEPER<30 & ASPCLAS=="S"] <- "bS"
> TERRCLAS[ndvidata2$SLOPEPER>=8 & ndvidata2$SLOPEPER<30 & ASPCLAS=="W"] <- "bW"
> TERRCLAS[ndvidata2$SLOPEPER>=30 & ASPCLAS=="N"] <- "cN"
> TERRCLAS[ndvidata2$SLOPEPER>=30 & ASPCLAS=="E"] <- "cE"
> TERRCLAS[ndvidata2$SLOPEPER>=30 & ASPCLAS=="S"] <- "cS"
> TERRCLAS[ndvidata2$SLOPEPER>=30 & ASPCLAS=="W"] <- "cW"
> table(TERRCLAS)
!!!!
> ndvidata2 <- cbind(ndvidata2,ASPCLAS=ASPCLAS,TERRCLAS=TERRCLAS)
> ndvidata2[1,]
  ELEMENTID     AREA  AREA_HA   SLOPE CLOUDS_SUB DEM10 NDVI2003F SHDEM10
1         1 41677529 4167.753 8.98384          1   631   64.7887 0.01364
   ASPECT NDVI2005 NDVI2004 SLOPEPER ASPCLAS TERRCLAS
1 288.435  98.5764  76.6334 15.80953       W       bW

A barplot is a good way of visualizing the output of table:
> barplot(table(ndvidata2$TERRCLAS))

You can actually have both the graphic and the actual values on it:
> x <- barplot(table(ndvidata2$TERRCLAS))
> y <- table(ndvidata2$TERRCLAS)
> barplot(table(ndvidata2$TERRCLAS),ylim=c(0,150))
> text(x,y+3,y)
While boxplot() is a graphic function, it can be used to save (into x, in this case) the x-axis positions of the bars, which we later use to overlay the number of cases for each terrain class.

In both cases, we observe that, with the actual dataset that we imported from the vector shape layer in this case,  the number of cases is sufficient for statistical analysis for all terrain classes.

We plot NDVI values of 2003 and 2005 against the terrain classes:
> par(mfrow=c(1,2))
> boxplot(ndvidata2$NDVI2003F~as.factor(ndvidata2$TERRCLAS),ylim=c(0,200),main="2003",xlabel=ndvidata2$TERRCLAS)
> boxplot(ndvidata2$NDVI2005~as.factor(ndvidata2$TERRCLAS),ylim=c(0,200),main="2005",xlabel=ndvidata2$TERRCLAS)
par() controls many aspects of the plotting style. Here we are using mfrow to split the graphic window into 1 row and 2 colums

The variable of interest is (NDVI2005-NDVI2003F), which we assume is tracking vegetation recover. In order to check the distributions for each terrain class, we take advantage of package lattice, which you must install if you don't have it yet:
> require(lattice)
Loading required package: lattice
> histogram(~(NDVI2005-NDVI2003F)|TERRCLAS,data=ndvidata2)

As all distributions approach normality, we can run a simple one-way ANOVA:
> ndviterraov <-  aov((NDVI2005-NDVI2003F)~as.factor(TERRCLAS), data=ndvidata2)
> summary(ndviterraov)
                     Df Sum Sq Mean Sq F value    Pr(>F)   
as.factor(TERRCLAS)   8  13979    1747  3.8805 0.0002027 ***
Residuals           392 176522     450                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The ANOVA states that there are significant differences on NDVI change between 2005 and 2003 among terrain classes
(the probability of being wrong if we reject the null hypothesis (=no differences among classes) is 0.0002027)

A boxplot with notches let us visualize which classes are actually different: if the notches of a given pair boxes' notches do not overlap there is strong evidence that their medians differ (see Chambers et al., 1983, Graphical Methods for Data Analysis. Wadsworth & Brooks/Cole. p. 62). The test "Tukey Honest Significant Differences" let us check the associated probabilities:
> boxplot((ndvidata2$NDVI2005-ndvidata2$NDVI2003F)~ndvidata2$TERRCLAS,ylim=c(-50,100),main="2005 - 2003",notch=T)
> TukeyHSD(ndviterraov)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = (NDVI2005 - NDVI2003F) ~ as.factor(TERRCLAS), data = ndvidata2)

$`as.factor(TERRCLAS)`
              diff         lwr       upr     p adj
bE-a    5.96550930  -6.5474644 18.478483 0.8611788
bN-a   15.35127941   0.6962344 30.006324 0.0320245
bS-a   -2.09981870 -14.9117626 10.712125 0.9998806
bW-a    4.01048170  -7.8216122 15.842576 0.9797102
cE-a   11.60517773  -3.9634984 27.173854 0.3295585
cN-a   15.70877475  -0.6760637 32.093613 0.0719880
cS-a    0.23079559 -15.3378805 15.799472 1.0000000
cW-a   11.62887870  -2.6652480 25.923005 0.2173435
bN-bE   9.38577011  -4.7396884 23.511229 0.4936919
bS-bE  -8.06532800 -20.2679564  4.137300 0.5012739
bW-bE  -1.95502760 -13.1244806  9.214425 0.9998039
cE-bE   5.63966843  -9.4315594 20.710896 0.9628237
cN-bE   9.74326546  -6.1696590 25.656190 0.6073252
cS-bE  -5.73471371 -20.8059416  9.336514 0.9589604
cW-bE   5.66336940  -8.0872776 19.414016 0.9354086
bS-bN -17.45109811 -31.8420667 -3.060130 0.0055680
bW-bN -11.34079771 -24.8667918  2.185196 0.1835564
cE-bN  -3.74610168 -20.6380449 13.145841 0.9988743
cN-bN   0.35749534 -17.2895157 18.004506 1.0000000
cS-bN -15.12048382 -32.0124270  1.771459 0.1209813
cW-bN  -3.72240072 -19.4474167 12.002615 0.9982017
bW-bS   6.11030040  -5.3930938 17.613695 0.7722948
cE-bS  13.70499643  -1.6153595 29.025352 0.1215149
cN-bS  17.80859345   1.6595204 33.957667 0.0184469
cS-bS   2.33061429 -12.9897416 17.650970 0.9999322
cW-bS  13.72869739  -0.2945579 27.751953 0.0603540
cE-bW   7.59469603  -6.9161911 22.105583 0.7863649
cN-bW  11.69829306  -3.6849804 27.081567 0.3024023
cS-bW  -3.77968611 -18.2905733 10.731201 0.9964734
cW-bW   7.61839700  -5.5156895 20.752484 0.6760565
cN-cE   4.10359702 -14.3091793 22.516373 0.9988334
cS-cE -11.37438214 -29.0648054  6.316041 0.5403991
cW-cE   0.02370097 -16.5560897 16.603492 1.0000000
cS-cN -15.47797917 -33.8907555  2.934797 0.1808034
cW-cN  -4.07989606 -21.4283459 13.268554 0.9982838
cW-cS  11.39808311  -5.1817075 27.977874 0.4446260

According to this, moderate slopes facing N vs. flat terrain and, in particular, steeper slopes facing S vs. N are the ones with significant differences in terms of increase of NDVI.

A complementary way of testing the effect of terrain on the recovery of vegetation after fire as recorded by NDVI, is to regress the increase of NDVI against the cosinus of the angle of incidence of solar radiation over the surface (COSI), which we can calculate as (http://www.isprs.org/congresses/beijing2008/proceedings/6b_pdf/40.pdf):
> szen= (90-63.119507)*pi/180
> saz = 145.294377*pi/180
> COSI= cos(szen)*cos(ndvidata2$SLOPE*pi/180)+sin(szen)*sin(ndvidata2$SLOPE*pi/180)*cos(saz-ndvidata2$ASPECT*pi/180)

where szen and saz are the solar zenith and azimuth angles, respectively. A figure representing solar angles can be found in http://www.solsticeamateur.com/AzimuthElevation.jpg . For the actual values, dependent upon the day and time, we take the ones provided for the 2003 image.
The idea behind this approach is that assuming a uniform solar radiation on the top of the atmosphere for this region, the differences on the actual incidence of solar radiation on the surface are essentially due to topographic effects. Note that this approach is not considering shading.

COSI is highest for flat terrain and monotonously decreases as the angle of incidence increases:
> plot(seq(0,90,1), cos(seq(0,90,1)*pi/180)
> lines(sort(acos(cosi)*180/pi),cosi[order(acos(cosi))],col="red",lwd=2)

The relationship of COSI and the previously calculated terrain classes can be explored through the boxplot
> boxplot(cosi~ndvidata2$TERRCLAS)
Note that N and W facing slopes have lower values of radiation incidence than flat, E and S facing slopes, and that this difference increases with slope.

Finally, we calculate and plot the linear regression:
> plot(cosi,(ndvidata2$NDVI2005-ndvidata2$NDVI2003F))
> cosilm <- lm((ndvidata2$NDVI2005-ndvidata2$NDVI2003F)~cosi)
> abline(coef(cosilm),lwd=2,col="red")
> summary(cosilm)

According to these results, there is a significant effect of COSI on NDVI increase, but this effect accounts for only a 5% of the total varaibility of NDVI increase. This result is consistent with the previous result of the ANOVA by terrain classes, where only moderate slopes facing N vs. flat terrain and steeper slopes facing S vs. N where showing significant differences on NDVI increase.

Č
ċ
ď
Agustin Lobo,
Mar 15, 2011 12:42 PM