Week 7. Creating and working with R functions
What's a function you ask? Well, you've been using them all term-- mean(), count(), write.csv(), sum(), seq(), rep() are all examples of built in functions. One of R program’s greatest strengths is the user’s ability to easily (and elegantly) write and add functions. The benefit of writing your own functions allows multiple computational arguments to be run within the same series of code.
As you have noticed, there are literally 1000's of R functions for performing a variety of tasks and you can generally find one or more to meet your particular needs. However, sometimes you have a unique task to perform or maybe you have a repetitive task to perform (e.g., summarize data sets and create reports) and you don't want to rewrite your code every time. That's when you want to create your own function.
The basic format of functions:
function.name <- function(arguments) {
computation of the function i.e. calculation of the argument(s)
}
Let’s create our own function for calculating a mean and call it "my.mean". Remember that a mean is the sum of the elements in a data divided by the number of elements in a data. Looking at the functions listed above, we use sum() to add up the elements and length() to could the number of elements with the mean equal to sum()/length().
To create a function, we declare it using function and assign the function a name. Similar to for loops and if else comparisons, function actions are delimited by curly brackets "{}". The function can take one or more arguments (these are the inputs to the function). For instance, my.mean (below) takes one argument: variable. If there are more than one input to a function, THE ORDER THEY ARE LISTED MATTERS (more later).
EXAMPLE #1
# create a function for calculating the mean
# first name it and identify arguments
my.mean <- function(variable){
# users of the function provide "variable" and this is what is done with it
sum(variable)/length(variable)
}
# create a dummy data set
data.catch <- c(3,5,0,12,4,8,4,1,1,6,7)
# use the new function
my.mean(data.catch)
#compare to built-in R mean function
mean(data.catch)
Our function for calculating the mean is spot on. Now, lets create something that requires more than 1 input, remember ORDER MATTERS. Examine the code below and interpret.
EXAMPLE #2
## create a function for dividing 2 numbers and squaring the result
my.add.square <- function(a,b){
(a/b)^2
}
# use the function
my.add.square(5,2)
# use the function but reverse the numbers
my.add.square(2,5)
# use the function reverse the numbers, but use the arguments to assign the values
my.add.square(b=2,a=5) ## order only matters when don't assign values to argument objects
For grins, look at the contents of your working directory. Can you find “a” and “b” objects? The answer should be no, because R functions create and use local variables. This means that anything created in a function is not saved to your working directory.
RETURN, LIST and PRINT STATEMENTS and FUNCTIONS
Often, we have a function which performs multiple tasks (see calc.function below). We need to have a way of returning the results from our function that allows the multiple results to be reported. Using a return, list or print statement (with c()) inside the function allows us to see all the defined results in our function. In the following example the function calc.function doesn’t return a result.
EXAMPLE #3
calc.function <- function(x, y) {
a <- x*5 + y
b <- (a/x)^2
d <- a + b*1.05
}
calc.function(3,4) ## nothing happens, why?
# How about we just print out d afterward?
d
# NOTE object created within a function are temporary objects that do not
# exist outside of the functions
## Now just calculate d, don't assign it to object d?
calc.function <- function(x, y) {
a <- x*5 + y
b <- (a/x)^2
a + b*1.05 ## note that is the last command executed
}
calc.function(3,4)
If we add a return, list or print statement to the function, calc.function this will return all values listed in the statement. The function calc.function.two has a return function and therefore reports a, b and c from the function.
EXAMPLE #4
calc.function.two <- function(x, y) {
a <- x*5 + y
b <- (a/x)^2
c <- a + b*1.05
return(c(a,b,c))
}
calc.function.two(3,4)
calc.function.two(4,3)
calc.function.two(y=4,x=3)
**Try using list and print instead of return! Any differences? Using the list statement allows the variables to be reported in a list format (handy for further analysis). Using the print function allows the variables at any stage through out the function's computation to be reported (compared with return which is only at the end of a function).
BEWARE: If using the return statement this exits your function! Note, that if using return use it at the end of your function. Using return before the end of your functions may result in the loss of any future arguments computed in your function.
Furthermore, we commonly use and write functions which perform multiple computations. All variables computed within a function only exist there! A way to remember these values once the function has been run is to define a data frame with the function (see example #5). Saving the outputs as individual elements to a data frame using the function allows you to access the results at a later stage. By using list () within the function and naming your data frame we can easily extract and save information from a function.
EXAMPLE #5
calc.function.three <- function(x, y) {
a1 <- x*5 + y
b1 <- (a1/x)^2
c1 <- a1 + b1*1.05
list(temp=a1, temp.square=b1,add.temp=c1)
}
df <- calc.function.three(2, 5)
str(df)
df$temp.square
df$add.temp
Example 6: Global objects vs. temporary objects
Objects created outside of functions are ususlly global objects. That is they can be used inside and outside of a function. This can be dangerous if you don't know what you are doing. To illustrate, let's first create a global object BAD outside of the function and then use it within a function.
BAD<-100
# now create a function
add.one.mtply<-function(x){
y<-x+1
y*BAD
}
## now invoke the function
add.one.mtply(5)
# change value of BAD
BAD<-0
## now invoke the function
add.one.mtply(5)
I know you are now wondering, what happens if we try to change the value of a global object in a function? Let's see.
add.one.mtply<-function(x){
y<-x+1
BAD <- y*BAD
}
BAD<-50
add.one.mtply(5)
### Nothing happens!!
BAD
SUGGESTION-- use global objects (never when you are a beginner) very sparingly within a function.
For loops within FUNCTIONS
You have learned how to use 'for loops' and 'if else' functions. You can develop functions within functions to reduce the amount of repetitive coding you need to do. FUNCTIONS within FUNCTIONS - tricky and allows your code to look efficient!
EXAMPLE #7
calc.function.four <- function(x) {
for(i in 1:x) {
y <- i*2
print(y)
}
return(y*2)
}
calc.function.four(12)
Combining and pasting within a FUNCTION
Frequently in fisheries and wildlife research we use scientific names in our data. I common problem is combining this information into a single 'workable' column. For example, often we have genus and species names listed in a data frame and for ease of reporting we want to combine these. We can just use genus + species (try it below using the code!). We need to use the paste function and teh syntax for creating a new binary operator in R. Using the user-defined binary operator %[name]% we can combine the two columns.
EXAMPLE #8
Genus <- 'Greenis'
Species <- 'backyardis'
Genus + Species
'%p%' <- function (x,y){paste(x,y, sep=' ')}
Genus %p% Species
Fun with functions
The above demonstrates the usefulness of functions and the idea that the order of the inputs matters. Ok all of that is fine and good why would you want to bother learning how to create a function? Lets make something that you might actually use, like finding and replacing NA with mean values.
#' first create a dataframe with fake data lets call it fau_dat
fau_dat<-data.frame(Abcs=letters[1:22],Num1=c(1:22),Num2= seq(-5,15, length = 22),
Num3=seq(10, -22,length=22))
# now assign missing values to a few places
fau_dat[3:4,2]<-fau_dat[10,3]<-fau_dat[20,4]<- NA
# take a look
fau_dat
# create a copy for later use
fau_datII<-fau_dat
All function creation usually starts with creating the code for doing the work, then we make the code generic so we can use it with other objects, the last thing we do is turn the code into a function. So lets start by creating code that replaces missing values, recall from the ifelse lesson.
fau_dat$Num<-ifelse(is.na(fau_dat$Num1),mean(fau_dat$Num1, na.rm = T),fau_dat$Num1)
This above is a bit clunky. If we want to apply the function to another object, we will have to make it generic enough to apply to any dataframe. We know how use for loops and bracket notation to accomplish this task but first we need to find out how many column are in the dataframe.
z<-ncol(fau_dat)
for(zz in 1:z){
fau_dat[,zz]<-ifelse(is.na(fau_dat[,zz]),mean(fau_dat[,zz], na.rm = T),fau_dat[,zz])
}
## lets take a look
fau_dat
Woa, what just happened? All the letters turned to numbers maybe we should restrict the filling-in to just numeric columns. How do we identify those?
is.numeric(fau_dat[,2])
Note that the function returns 1 value either TRUE or FALSE. Lets combine with an if function to prevent filling in of non-numeric columns but first, lets restore the original dataframe
fau_dat<-fau_datII
## now give it a try
z<-ncol(fau_dat)
for(zz in 1:z){
if(is.numeric(fau_dat[,zz])) fau_dat[,zz]<-ifelse(is.na(fau_dat[,zz]),
mean(fau_dat[,zz], na.rm = T),fau_dat[,zz])
}
# take a look, it worked!
fau_dat
Now we need to fix the code so it is generic. Here we replace the actual dataframe name with a temporary object name that is used in the function lets call it df.
df<-fau_datII
z<-ncol(df)
for(zz in 1:z){
if(is.numeric(df[,zz])) df[,zz]<-ifelse(is.na(df[,zz]),mean(df[,zz], na.rm = T),df[,zz])
}
## did it work? Yes!
df
# now we create the function with a single argument df
# let's call it "fills.nas"
fills.nas<-function(df){
z<-ncol(df)
for(zz in 1:z){
if(is.numeric(df[,zz])) df[,zz]<-ifelse(is.na(df[,zz]),mean(df[,zz], na.rm = T),df[,zz])
}
return(df)
}
# now lets try it out, it works!
fills.nas(fau_datII)
# don't forget to assign the results to an object
fau_datII.fix<-fills.nas(fau_datII)
Pretty neat. Now lets create another object and try to use it. Let's try a vector.
vect<-seq(5,55,by = 4)
vect[c(3,6,8)]<-NA
## try it out, whoops what happened?
fills.nas(vect)
Maybe we should put in an error catcher what kind of objects do we want to be able to use? Lets say just dataframes for this example. How do we test to see if an object is a dataframe? Hmmmm......
is.data.frame(vect)
Lets add this to the beginning of the function. Here's a useful function in R-- "stop"-- that stops the execution of a function and prints out the message of your choice.Remember that is.data.frame returns a FALSE when the object is not a dataframe, so we need to be sure to turn the comparison to a TRUE comparison to execute the stop.
fills.nas<-function(df){
if(is.data.frame(df) == FALSE) stop("Object is not a dataframe")
z<-ncol(df)
for(zz in 1:z){
if(is.numeric(df[,zz])) df[,zz]<-ifelse(is.na(df[,zz]),mean(df[,zz], na.rm = T),df[,zz])
}
return(df)
}
# now lets try it out, it works!
fills.nas(vect)
# now lets try it out, it works!
fills.nas(fau_datII)
Now lets modify the function so that it writes the corrected file to a csv file too. This means that the function will require 2 arguments.
fills.nas<-function(df,fle.name){
if(is.data.frame(df) == FALSE) stop("Object is not a dataframe")
z<-ncol(df)
for(zz in 1:z){
if(is.numeric(df[,zz])) df[,zz]<-ifelse(is.na(df[,zz]),mean(df[,zz], na.rm = T),df[,zz])
}
write.csv(df,fle.name)
return(df)
}
# now lets try it out, it works!
fills.nas(fau_datII,"Fixed fau data.csv")
If time allows, lets make another function.
Finally... loading saved functions
The neat thing about functions is that they can be saved for later use. To load a function, use the source command. Be sure the script name and path are correct. Here, we use the bonus inverse logit link function we downloaded above.
# Load the function
source("Inverse logit link function.R")
## what's in the function?
inv.logit
## now we use it
inv.logit(0.75)
I wonder if it can use a vector of values?
This Weeks Assignment
Download the following 2 comma separated (csv) text files for this week's assignments:
coral_count.csv: contains coral counts from 50 quadrats
weather.csv: weather measurements made concurrently with coral counts
Due 1 week from tomorrow (Tuesday) by 5pm Pacific. Use the 2 data frames (above): coral_count and weather to complete the following:
1) Combine the two data sets into a single dataframe. (Hint... think merge, also note that one dataframe contains “Date” and the other contains "year", so you need to extract year form the date before merging.
Hint 2: thoroughly examine BOTH dataframes--there is another variable that you should be using to merge.)
NOTE: 2-5 below should be done within a SINGLE function. The function should be sufficiently generic that you can use it on any dataframe.
2) Create a function that calculates the mean, standard deviation, minimum and maximum of all the
numeric columns in the merged dataframe. except year and quadrat number.
3) The summary values should be in a single data frame with the following columns: variable name,
mean, sd, minimum, and maximum.
4) The function also should write the summary dataframe to a csv file. (Hint... the function should
take 2 arguments the name of the dataframe to summarize and the name of a comma separated
value (csv) file to write the summary.)
5) The function also should output the summary dataframe to R.
Bonus material: reading functions from files
The code below creates a folder "Funcs4fun" in your working directory. It then creates 2 cool (maybe) functions and writes them to the folder. This step is needed to create a folder and function files so I can demonstrate below how to automatically read folders (directories) and load functions from the folders. Note that all of the code below can be found in this script.
####################################################################
### assign the working directory path to an object
orig.wd<-getwd()
## now create a folder in the working directory
dir.create("Funcs4fun")
# set the working directory to the newly created function folder
setwd(paste(orig.wd, "/Funcs4fun", sep =""))
##### create logit link function with error checker and write to an R script file
sink(file="Logit link function.R")
cat("logit.lnk<-function(p){ if ( p>0 & p<1) log(p/(1-p)) else stop('Not a probability')}",fill = TRUE)
sink()
##### Create inverse logit link function too
sink(file="Inverse logit link function.R")
cat("inv.logit.lnk<-function(eta){ 1/(1+exp(-eta))}",fill = TRUE)
sink()
## reset the original working directory
setwd(orig.wd)
###################################### End of creating folder and functions
This is the code that you would use to load R functions from an external file. I assume that you know how to set your working directory to the location where you have the folder containing the function folder. I usually have population or decision models in a single folder. Inside that folder, I have a folder containing functions and another containing inputs ( e.g., weather, initial population size etc).
## first, for grins lets try to run the functions created above
logit.lnk(.5) ## you should get errors because they are not loaded
inv.logit.lnk(0)
## This just used the current working directory and adds the name/location of the
## file containing the functions
func.place<-paste(getwd(), "/Funcs4fun", sep ="")
## Read in functions from common location. "dir" will provide a list of file names
## ending in ".R"
functs<- dir(path = func.place, full.names = FALSE, pattern = ".R", recursive = FALSE)
### lets see what that did for us
functs
## now we read than in using a for loop and the source command
for(ia in 1:length(functs)) source(paste(func.place,functs[ia], sep = "/"))
## Now lets try to run the functions
logit.lnk(.5) ## They work!
inv.logit.lnk(0)
# nested functions, why? Because we can!
inv.logit.lnk(logit.lnk(.2))
Enjoy!