A function is simply a small program or collection of code that performs a particular task. There are many functions built into R (either the basic R installation or add-on packages or libraries. In addition, we will see that it is quite possible (and useful) for you, the user, to write your own user- defined functions.
Built-in R functions
Many built-in mathematical functions are available in R, as illustrated by a few examples. Note that the function “log” refers to “base e or natural logarithm”; if you need base 10 logs then the appropriate function is log10.
> log(2)
[1] 0.6931472
> exp(.6931472)
[1] 2
> log10(100)
[1] 2
> 10**2
[1] 100
> sqrt(16)
[1] 4
> pi
[1] 3.141593
> sin(pi)
[1] 1.224606e-16
> cos(pi)
[1] -1
> tan(pi)
[1] -1.224606e-16
> abs(-5)
[1] 5
> factorial(10)
[1] 3628800
>
Note that in some cases the result of a mathematical operation might seem not quite accurate. For instance, we know from our basic trigonometry that tan(pi) =0, but R says it's -1.224606e-16, which is not quite zero. However, it is zero within the computational tolerances of your operating system. In such cases we might simply want to round the result to an acceptable precision tolerance, e.g.,
> round(tan(pi),15)
[1] 0
Other built in functions, many of which we'll see in coming labs, include:
Function libraries
A library (or package) is a collection of functions that generally relate to specific purpose. For example, the matrixcalc library contains are large number of specific functions, all of which relate to matrix operations (addition, multiplication, inverse, etc.). For the library functions to be available for your R session, the library must have been installed and loaded into memory. Some libraries are installed as part of the base installation of R and load automatically every time R is installed. Others must be installed by the user, and may need to be loaded into the session memory. To load a particular package that has been installed, use the require() or library() functions.
> require(matrixcalc)
Loading required package: matrixcalc
require() will give you a warning if the package has not been installed, whereas library() will produce an error. If not installed, the command
>install.packages("matrixcalc")
will direct you to a CRAN mirror site where the package can be downloaded (obviously, you must be connected to the internet for this to work). Often easier will be to use the package/install tool in RStudio, which by default will look for the package in the most recent CRAN site you visited (e.g., the mirror site from which you installed R).
Finally, sometimes R libraries are available from other websites (or can be made by you, the R user). These are commonly stored as .zip or .tar.gz file. Whether the library is installed from a CRAN site, or via uploading a .zip or .tar.gz file from your computer, is controlled by the packages/install tool in the lower right window R Studio by selecting "Repository (CRAN)" or "Package Archive".
How functions work
Functions generally take the form
>function.name(arguments)
where the function name references a built in function that is either installed in the base R package or loaded as an optional package (we will see many examples of the latter as we proceed.). Later we will also see how to construct user-defined functions. The arguments are one or more inputs that may be required to run the function. The function is executed by providing the function name followed by (), with the inputs within the (). So for example:
> mean(temp)
[1] 21.8
> temp
takes the previously-defined list temp and calculated the arithmetic mean. Note that leaving off the () and arguments results in not executing the function. For example
> mean
function (x, ...)
UseMethod("mean")
<bytecode: 0x0000000011d2cae0>
<environment: namespace:base>
>
simply provides returns R code that tells the user that function is specified but does not actual evaluate (or "instantiate") the function. Note that not all function arguments need to be provided in some cases. For example, the function runif has arguments n,min, and max. The argument n (number of must be provided, but the defaults for min and max are 0 and 1, respectively. So to obtain 10 uniform (0,1) random numbers we invoke:
> runif(10)
[1] 0.10783063 0.06191951 0.15210022 0.44208642 0.55448936 0.03416427 0.75675533 0.76505096 0.37950976 0.09797775
>
To obtain help for a function, simple type the name preceded by?., e.g.,
>?mean
This will open an html help file. If you are unsure of the spelling of the function name (or even it it exists), try ?? for a more general help search, e.g.,
>??Mean
Finally, in the above examples we simply displayed but did not save a function evaluation. If we desire we can assign the results of a function operation to an object. For example,
x<-runif(10)
will place the values resulting from runif(10) in x. Note though that if you do not evaluate the function the assignment will simple mean that x become the function. That is
x<-runif
simply states that x is now assigned to the function object runif, and can now serve as the function. E.g.,
> x<-runif
> x(1)
[1] 0.6861883
List function
An alternative way of storing data or other information is using a list function. This function creates lists of sub objects within the main object. For example
> obs.list<-list(temperature=temp,humidity=hum)
> obs.list
$temperature
[1] 20 25 19 21 24
$humidity
[1] 100 70 60 NA 80
creates 2 named sub objects, obs.list$temperature and obs.list$humidity, which in turn contain the lists of data. Note that the lists can nested in other lists, so for example temp could have itself been a created by a list function. As we will see, this is one means of creating the typical nested hierarchy in R objects.
Finally, functions can (and often are) used together, but one must be careful to understand how the combination works. For example,
> sum(sqrt(obs.list$temp))
produces
>
[1] 23.31
whereas
> sqrt(sum(obs.list$temp))
produces
[1] 10.44031
This is because the inner function is evaluated first -- in the first case, square roots are taken of the temperature values, which are then summed. In the second, the sum of the temperature values is computed first, and then used as input for the square root function. In other cases the order doesn't matter, e.g.,
> log(40)+sqrt(2)
[1] 5.103093
> sqrt(2)+log(40)
[1] 5.103093
produce the same outcome because of the commutative property of addition.
User-defined functions
Writing functions in R
Although R has many built-in functions that will perform a wide variety of tasks, sooner or later most R users will want to write their own functions in R. Essentially a function is just a small program that produces a defined output given a specified input. To take a very simple example, supposed we want to write our own function to compute the sample mean. We do this by writing a new object called (for example) my_mean, which is now a function.
my_mean<-function(x){sum(x)/length(x)}
The input to the function is a list of numbers x, and the function computes the mean by first computing the sum and then dividing by the number of elements (both from built-in functions). Notice that if we simple type the function on the input line (or input it from script) we just get the function description
> my_mean
function(x){sum(x)/length(x)}
To actually use the function we must provide inputs (x), so a list of numbers. For example,
> data<-c(1,2,3,4,5,5,6)
> my_mean(data)
[1] 3.714286
>
applies the value of a list (“data”) to the function and returns the mean. More complex functions can be written that involve 2 or more inputs. For example the function
> my_fun<-function(x,y){x*2+log(y)}
produces the function
You confirm that the function produces correct results for sample data by performing these same calculations on a calculator, e.g.,
> my_fun(2,3)
[1] 5.098612
> 2*2+log(3)
A slightly more complicated example here illustrates that we can compute several things in a function but chose to report only 1. This function:
Although the above examples seem somewhat trivial-- because they are-- the idea is readily extended to more important and useful applications. Examples included;
Next: Displaying data with R