In this section, we are going to increase R capabilities by adding programming structures that will help us accomplish things we have not done before, as well as showing some of the power of trellis graphics with the lattice package.
if statement
Now we are going to introduce a series of ways to control the flow for our program. The first one allows us to evaluate a condition and ask r to run a particular statement based on the result of that evaluation.
x = 3
if(x > 1)
x= x + 1
We might be able to evaluate a series of nested if statements as well
x = 15
if(x >12)
if(x <15)
print("Your number is between 12 and 15")
we might as well add an else statement, indicating the piece of code that needs to be run when under the false condition
x = 10
if(x >12)
{ if(x<15)
{print("Your number is between 12 and 15")}
else
{print("Your number is just larger than 15")}}
else {print("the number is smaller than 12")}
## [1] "the number is smaller than 12"
The condition is going to be evaluated for each element in our data frame, resulting in a new vector, we can use the ifelse function.
x <- c(21,33,12,35,24,54,11,34,12,44,11)
prod <- ifelse(x > 12, 1, 2)
Another way to cycle through different statements is to have repeat evaluating a series of expressions. In this case, we have to include some type of comparison to avoid getting trapped in an infinite loop. To exit the loop, we use the break statement.
x <- 0
repeat{ x <- x + 1
print(x)
if(x == 10) break }
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
Another way (with explicit evaluation) is the while statement. This takes a condition and evaluates it at the end of all expressions within the brackets:
x <- 0
while(x < 10)
{x = x + 1
print(x)}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
Another way to control the flow of the program is the for Another way to control the flow of the program is the for cycle. In this case, we are asking the computer to count between 1 and 10, assign the value to i and repeat that until number 10.
for(i in 1:10)
print(i)
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
We might as well have a for cycle that goes over a sequence of non-consecutive values, like
for(i in c(1,10,11,13,14)) print(i)
## [1] 1 ## [1] 10 ## [1] 11 ## [1] 13 ## [1] 14
The last one of our examples evaluates an expression and accordingly chooses one of the further arguments
x <- runif(10)
type = "mean"
switch(type,
mean = mean(x),
max = max(x),
sd = sd(x))
## [1] 0.4285708
For this exercise we are going to use the nursery data set. This corresponds to a series of consecutive measurements of clonal radiata pine plants performed over the course of a year in several plots.
my_data <- read.csv("./DATA/nursery.csv")
names(my_data)
## [1] "X" "PLOT" "CUM_DAYS" "HEIGHT"
We are going to use the lattice package to help with the exploratory data analysis and to exemplify more advanced analysis we can do with R. Let's check our data with a simple graph.
library(lattice)
xyplot(HEIGHT ~ CUM_DAYS, data = my_data)
It is very hard to tell what is going on by just looking at this graph. So we are going to try something a bit different, let's try to group by plot and to make a line graph
xyplot(HEIGHT ~ CUM_DAYS,
data = my_data,
type = "l",
groups = PLOT)
Now we can see a major trend, but every line is obscuring the behavior of the others. We can solve this making some of the lines transparent
xyplot(HEIGHT ~ CUM_DAYS,
data = my_data,
groups = PLOT,
type = "l",
alpha = .2)
Now we can see the different plot series, but, in our case, we would like to be able to check all individual series, one by one.
n_plots <- length(unique(my_data$PLOT))
If you look at the PLOT field and query for the number of unique plots, you will find out there are many plots. Doing one graph at a time would be something very time-consuming. We can automate the analysis of multiple series using a for cycle.
pdf("output.pdf")
for(i in 1:length(unique(my_data$PLOT)))
print(xyplot(HEIGHT ~ CUM_DAYS,
data = my_data,
subset = PLOT == unique(my_data$PLOT)[i],
type = "l",
xlab = "Cumulative Days",
ylab = "Seedling Height (cm)"))
dev.off()
## quartz_off_screen
## 2
This types of graphics are very useful for times where we have many plots with multiple observations, and we would like to quickly inspect every data series searching for anomalous data. We might as well want to fit a polynomial regression to this data and output the results in each graph, saving the parameters for a later analysis. We can expand previous analysis including the regression. However, we need to define an empty data frame to hold the results of our analysis first:
my_betas <- data.frame(PLOT = numeric(),
b0 = numeric(),
b1 = numeric(),
b2 = numeric(),
b3 = numeric(),
b4 = numeric(),
rquared = numeric())
plot_vector <- unique(my_data$PLOT)
for (i in 1:length(plot_vector))
{
reg <- lm(HEIGHT ~ 1+ CUM_DAYS +
I(CUM_DAYS^2) +
I(CUM_DAYS^3) +
I(CUM_DAYS^4),
data = subset(my_data, PLOT == plot_vector[i]))
my_betas <- rbind(my_betas, c(plot_vector[i],
coefficients(reg),
summary(reg)$r.squared)) }
names(my_betas) <- c("PLOT", "b0", "b1", "b2", "b3", "b4", "r.squared")
This way, we can actually inspect the results from our regressions all at the same time. Notice here we use the function expression to draw a nicer version from the r squared.
xyplot(r.squared ~ PLOT,
data = my_betas,
type = "l",
xlab = "Plot Number",
ylab = expression("r"^2))
We might as well see all the graphs at the same time in only one single graph. In this case, we will need to assign the result from each graph to a new variable that we are going to create on the fly with the assign command.
for(i in 2:7) {
assign(paste("graph",(i-1), sep = ""),
xyplot(eval(parse(text = names(my_betas)[i])) ~ PLOT,
data = my_betas,
xlab = "Plot Number",
ylab = names(my_betas)[i],
type = "l")) }
In order to display all grid images we need to load the grid.Extra package.
library(gridExtra)
grid.arrange(graph1, graph2, graph3, graph4, graph5, graph6)
At this point, we have a data frame with all parameters from our regression process as well as different data.