04. R functions and Flow Control

Simple Functions and Control Structures

One of R's main powers is the great number of predefined functions. Taken together the set of built-in functions alongside those contained in various R packages constitute an extensive toolbox with which you can perform mathematical calculations, conduct complex statistical analysis and create elegant and highly informative graphs.

A detailed summary of the complete "dictionary" of R functions is beyond the scope of these notes (and may well be beyond the scope of most of R manuals). Nonetheless you can find the index of R functions contained in the base "core" distribution here.

For the purposes of our classes we will discuss only a subset of all available functions, focusing on basic mathematical operations, functions that deal with basic (and a little more advanced) statistics and plotting functions.

An overview of operators in R

We have already seen how mathematical calculations may be performed in R in more or less the way one uses a calculator. Remember that we can simply type the number of variables with operators inbetween and get the result by hitting return (enter)

a<-5 b<-6 a + b [1] 11

The basic mathematical operators are recapped in the list below

Arithmetic Operators

+ addition
- subtraction
* multiplication
/ division
^ or ** exponentiation
x %% y modulus (x mod y) 5%%2 is 1
x %/% y integer division 5%/%2 is 2

Out of those you may not be familiar with the last two. Modulus provides the remainder of a division between two integers, while integer division is the result of a division without proceeding further than the integral part of the quotient.

Logical Operators

Logical operators have been discussed previously in the chapters related to Subsetting. A formal definition of logical operators (or connectors, or connectives) is a set of symbols that may be used to connect two or more statements in a grammatically valid way. In the example discussed above for the subset() function these two sentences would be: a) "The temperature is greater than or equal to 60 degrees" and b) "The wind speed is greater than 10 knots".

Each of the two sequences contain some logical operators in itself. The temperature>=60 and the wind>10 are examples of numerical control operators. Combinations thereof consist of asking:

a) Both of them being true (A AND B)

b) Only one of them being true (A AND NOT B, B AND NOT A)

c) At least one of them being true (A OR B)

All of the above can be coded with specific logical operators listed below.

< less than
<= less than or equal to
> greater than
>= greater than or equal to
== exactly equal to
!= not equal to
!x Not x
x | y x OR y
x & y x AND y

In the following chapters we will see how we go about using them and so getting used to working them on a routine basis

R Functions

Let's now get to the core of the basic R function toolbox. As discussed previously R almost has a function for everything. A number of these functions are very useful for getting beginners accustomed to the way R works and to prepare them for writing their own functions (when they stop being beginners).

Below we present some of the most important R functions for various objectives covering basic statistics, plotting and string manipulation.

Numeric Functions

Numeric functions include functions used to performed more advanced mathematical operations.

Some of the most important are:

abs(x) : absolute value

abs() is used to return the absolute value of a number (integer or not). This means that both

abs(-5) abs(5)

return the value 5.

sqrt(x) : square root

sqrt() returns the square root of a number.

ceiling(x), floor(x), trunc(x), round(x, digits=n), signif(x, digits=n)

All of these functions treat real numbers in terms of rounding (that is ommission of decimal points). ceiling(x) rounds up the real number x to the closest integer that is greater than x (>x) while floor(x) does the opposite, rounding x to the closest integer that is smaller than x (<x). In this sense

ceiling(5.1)

returns 6, while

floor(5.98)

returns 5.

trunc(x) truncates all decimal points rounding the number to the closest integer in the same way floor() does it. round() and signif() both take a number of digits as an additional argument but slightly differ in that round(x, digits=n) does rounding in a way that keeps n decimal points while signif(x, digits=n) rounds to n total digits (not including the comma). Thus:

x<-45.6789 round(x, digits=3) [1] 45.679 signif(x, digits=3) [1] 45.7

cos(x), sin(x), tan(x), acos(x), cosh(x), acosh(x)

Refer to the corresponding trigonometric functions for cosine, sine, tangent, arc-cosine, hyperbolics etc.

log(x) : natural logarithm, log10(x) : common logarithm

log() is the natural logarithm while base-10 log is coded as log10(). Do you remember what you need to do to convert a natural log to, say, a base-2 logarithm?

*In case you don't remember how to change the base you can always use log(x, base=n).

exp(x) : exponential of e

remember generic exponentiation is coded either through a^x or a**x.

Character Functions

Character handling is not one of R's major strong point and you can always work around data manipulation in character strings outside R with shell scripting or other scripting languages. Still, R provides a number of functions we can use to handle strings when we need to stay within the environment.

substr(x, start=n, stop=m)

The substr() function works more or less in the way the function of the same name in Perl. In a string x it returns a substring starting from n and running through m. Remember that numbering in R starts from 1.

grep(pattern, x , ignore.case=FALSE, fixed=FALSE)

Pattern matching in R is performed with grep() that searches for pattern in x. Using fixed=F assures that pattern search may be performed with a regular expression. Notice that the functions returns the matching indices, that is the result of the matching is the subset of elements of the vector x that match the pattern.

A substitution function sub() is similar to grep but with the addition of a replacement string

sub(pattern, replacement, x, ignore.case=F, fixed=F)

sub(pattern, replacement, x, ignore.case =FALSE, fixed=FALSE)

for instance

sub("\\s",".","Hello There")

returns "Hello.There"

Splitting of a string is performed with strsplit()

strsplit(x, split)

where split is the character(s) at which splitting takes place. strsplit() returns a character vector. As in the previous functions, fixed=T allows split to be a regular expression. For instance:

x<-"abc.d.eef.g" strsplit(x,".", fixed=T)->d d [[1]] [1] "abc" "d" "eef" "g"

The paste() function joins strings in a concatenation using sep as a separator

paste("x",1:3,sep="")

returns c("x1","x2" "x3") while

paste("x",1:3,sep="M")

returns c("xM1","xM2" "xM3").

Transforming strings to upper or lower case is explicitly done with toupper(x) and tolower(x).

Other Useful Functions

A number of very useful functions do not fall in a specifi category but will prove very handy once you start coding your own R scripts and functions.

Suppose you need to generate a sequence of numbers with a fixed interval. Remember that this can be done with the ":" operator only for interval=1 but if we want another step we may use

seq(from , to, by)

If you code

x <- seq(1,20,3) x [1] 1 4 7 10 13 16 19

x becomes the vector of all values down to the largest that fulfils the condition < to-by

Another type of sequence we may need to create is the repetition of elements. This is performed with the use of the rep() function

rep(x, ntimes)

y <- rep(1:3, 2) y [1] 1 2 3 1 2 3

Obtaining a random set of values from a greater set is done with sample().

sample(x, size, replace=F/T)

sample() takes a subset of size=size from the vector x and returns it a smaller vector. If replace=T then the same element can be drawn more than once.

pretty(c(start,end), N)

returns a vector of equally spaced N values between start and end. Invaluable for simulating distributions, producing values for standard reference plots etc.

sort(x)

returns the vector x sorted in numerical order from the smallest to the largest element while

order(m[,c(i,j)])

orders a matrix m according first ot the values in the i-th and then in the j-th columns. Calling

m[order(m[,c(i,j)]),]

will produce a re-ordered matrix according to the above ordering

Google Sites

Report abuse