INTRODUCTION TO BASIC COMPUTING USING R
The R language is a freely-available, object-oriented language for programming and statistical analysis. In lab I will give you a “tour” through some of the main features of R that are most relevant to the course. However, this is by no means a comprehensive coverage.
Basic R language and syntax
When I say R is object oriented what I mean is that everything that R does or contains is based on objects. Objects in R include numerical values; letters or other symbols used to stand for numerical values; entire data arrays or other structures; functions; and the results of statistical or other calculations. Object orientation lends itself to a hierarchical way of thinking about data and problems that, while extremely powerful in practice, can initially be confusing to those used to a more “traditional” format. Initially the structure of objects will be very simple and straightforward; as we get more familiar with R and get into more complex problem the power of the object-oriented approach will hopefully become clearer.
Obtaining, installing, and running R
R can be obtained for free from the CRAN site http://cran.r-project.org/
where you will find files that can be downloaded and installed as appropriate for your system (e.g., Windows XP or 7, Mac). Installation files can also be downloaded by a third party, placed on a memory stick or CD, and installed from these devices, in case you do not have access to the internet.
Once installed, R is run simply by clicking on the R icon (or selecting R from the program task bar) to open the R-Gui console screen. This screen will have a series of menu items followed by a blank window with a
>
on the left margin. The “>” is a prompt for R commands, which can be entered directly into the console, or as we shall see below, via script files.
Basic R commands
There are a large number of commands in R, but to get started we will use a few essential ones.
First, if we enter a value (say the number 1) on the prompt line and it return, we get a result: namely, we get back what we entered:
> 1
[1] 1
Second, that many familiar operations such as addition, subtraction, multiplication and division have obvious R counterparts. Furthermore, R follows the usual rules of operation priority. We can illustrate these with a few simple examples
The line starting in [1] indicates that this is a result following the input (>).
> 1+1
[1] 2
> 1
[1] 1
> 1+1+2
[1] 4
> 1+2*7
[1] 15
> 3/2+4+8
[1] 13.5
> 3/2+4+8*3
[1] 29.5
> 3/2+(4+8)*3
[1] 37.5
> 4-5/3+4.4*(7+6)
[1] 59.53333
>
Many mathematical functions are available in R, as illustrated by a few examples. Note that the function “log” refers to “base e or natural logarithm”; if you need base 10 logs then the appropriate function is log10.
> log(2)
[1] 0.6931472
> exp(.6931472)
[1] 2
> log10(100)
[1] 2
> 10**2
[1] 100
> sqrt(16)
[1] 4
> pi
[1] 3.141593
> sin(pi)
[1] 1.224606e-16
> cos(pi)
[1] -1
> tan(pi)
[1] -1.224606e-16
> abs(-5)
[1] 5
> factorial(10)
[1] 3628800
>
Note that in some cases the result of a mathematical operation might seem not quite accurate. For instance, we know from our basic trigonometry that tan(pi) =0, but R says it's -1.224606e-16, which is not quite zero. However, it is zero within the computational tolerances of your operating system. In such cases we might simply want to round the result to an acceptable precision tolerance, e.g.,
> round(tan(pi),15)
[1] 0
Assignment and relational operators
I have already introduced several operators above, namely the arithmetic operators +, -, * , / and ** for addition, subtraction, multiplication, division, and exponentiation. Additional arithmetic operators include ^ , which is the same as ** for exponentiation.
An assignment operator assigns a value or a result of an operation to an object. The standard R assignment operator is <- (less than followed by minus). So for example
> a<-2
assigns the value 2 to the object a. Notice that the operator works in either direction, as long as what is being assigned (in this case 2) is on the “-“ end and the object that it is assigned to (a in this case) is on the “<” side. So,
> 2->a
is equivalent to the above. Finally, notice that it makes no sense to assign values to 2, so that
a->2
Error in 2 <- a : invalid (do_set) left-hand side to assignment
>
returns an error. The operator “=” performs like <- so that “a=2” makes sense but “2=a” does not. In general it is preferable to use the -> or <- operators instead of = because the direction of the arrow makes clear the origin and destination of assignment, and to avoid confusion with the ‘is equal to’ comparison operator below.
Note that assignment is not a permanent, immutable condition. For example above we assigned
> a<-2
If we now assign
> a<-100
then 100 replaces 2 as the value for a. This is important and can be a cause for errors, for the opposite reason: the object a now retains the value 100 until we assign it something else. If we now use the object a in another computation, the value of 100 will go with it. So if we (say later in the R session or in a new session where we have not erased our workspace (see below) from the previous session, we invoke
>a*2
we should not be surprised to see “200” returned, since a’s value is currently 100.
Relational operators work differently, in that rather than assigning a value to an object, they compare 2 or more objects. Relational operators return a value of TRUE (or 1) if the comparison is true and FALSE or 0 if it is not true. The standard relational operators are >, >=, <, <=, ==, and !=. All of these except the last 2 should be obvious (but are illustrated below). The last 2 are the equality comparison (as distinct from assignment (read ‘is equal to’) and the inequality comparison (read “not equal to’). Note that certain combinations of these comparisons can be true
> a=2
> b=3
> 2<5
[1] TRUE
> a>b
[1] FALSE
> a<=b
[1] TRUE
> a==b
[1] FALSE
> a<b
[1] TRUE
> a!=b
[1] TRUE
> b!=a
[1] TRUE
>
Getting help in R
Before getting too far into the R weeds, I want to point out that, while R is relatively easy to use and powerful, everyone – and that includes me- gets stuck and needs help on occasion. Fortunately, there is a large and growing community of support for R, and many ways to get help quickly.
There are a number of references, both online and traditional print, that describe R in detail, and I recommend several below. Some “online” help in built directly into R, either in local html files provided with the R installation, or by reference to the R site. Finally, because R is used by a large community of practice worldwide, much information is available simply by entering a sufficiently descriptive question into an internet search engine.
If you have started the R-Gui and are in the R-console, help is available by selecting “help” from the main menu. Here, you will be routed to a number of sub-menus, including some manuals in Adobe format, as well has html-format help and a “search for” help that allows you to insert a question and search for guidance. If you have a question about a particular R command or function, often help can be obtained by typing ? followed by the function name (no spaces) on the command line. For example, suppose you want to know how the built-in function mean works. The command
>?mean
will send you to an html help page that fully explains the function and provides examples.
If you don’t know how a command is spelled or whether it is capitalized or not (R is case-sensitive!) then you can try a more generic search. For example
>??Mean
would list a large number of functions and procedures related to “mean”, including the specific one called mean.
Command line vs. script
In all of the above examples, we entered R commands after the < prompt, one command at a time. An equivalent (and often more efficient) approach is write a series of commands and save them to a script file (usually this will be a file with a .R or .txt extension, which can be edited like a text file). Script files can be created using the R gui (“New Script File”) or any standard text editor (like Notepad). The file commands can then be submitted to R using the R-gui, either all at once or by selecting a range of lines to submit. Most of our later, more complicated R examples will involve script files.
R workspace
R saves commands, object values/ attributes in a section of computer memory called “workspace”. When you exit and R you will be asked “Do you want to save workspace
image.” Usually you should respond (type) “yes”. Then the next time you open R (assuming you are operating on the same compute you ran R on last time)you will get the statement
[Previously saved workspace restored]
Many things (commands, data objects) that you created the last sessionwill still be there. Often this is a desirable feature, but occasionally can be confusing, particularly if you use different objects with similar names between sessions, since the “old” values will still be there. If you do not wish to save the workspace, you can type “no” when prompted. You can also be sure that the workspace is empty by typing the command
before resuming with new commands; this will remove (delete) any commands or objects that may still be in memory. On the other hand, if you have just loaded and manipulated a complex data set and need to temporarily quit the session, it may be most convenient to save the workspace, in which case you can essentially pick up where you left off.
Tips for running R in Windows
Setting the working directory
One of the most aggravating things you will run into is having to tell R each time you run it where your data files are. You don’t want to hunt for these files every time your run R. By assigning a working directory for your session you tell R where to look for files (and where to write output) without having to assign the full file path each time. To illustrate, suppose I want to use the directory
C:/R_stuff
as my working directory (note: I am assuming that you have already created a directory that will be your working directory and perhaps already placed files there. If not, you need to). If I am running R from the R gui, I can easily set the working directory to this location by going to “File” then “Change dir” and then navigating to the desired location. The same results can be achieve by typing the setwd(“directory name”) command, for this example
setwd("C:/R_stuff")
This will now be the location where R expects to see input and will write output unless specifically directed otherwise.
Both of the above require commands at each R session to set the working directory. You instead may wish to change the default directory that R opens in to your working directory. This is done by going to the Windows task bar, right clicking on the R icon, selecting Properties, and changing the “Start In” directory as desired.
Additional References
Bolker, B.M. 2008. Ecological models and data in R. Princeton University Press.
Crawley, M.J. 2007. The R book. Wiley