Running R
Once installed, R is run simply by clicking on the R icon (or selecting R from the program task bar) to open the R-Gui console screen, or by invoking R Studio, which has a console window. Either way, the console screen will have some introductory information (version of R, etc.) and window with a
>
on the left margin. The “>” is a prompt for R commands, which can be entered directly into the console, or as we shall see below, via script files.
Basic R language and syntax
R is a object oriented language, meaning that everything that R does or contains is based on objects. Objects in R include numerical values; letters or other symbols used to stand for numerical values; entire data arrays or other structures; functions; and the results of statistical or other calculations. Object orientation lends itself to a hierarchical way of thinking about data and problems that, while extremely powerful in practice, can initially be confusing to those used to a more “traditional” format. Initially the structure of objects will be very simple and straightforward; as we get more familiar with R and get into more complex problem the power of the object-oriented approach will hopefully become clearer.
I highly recommend that R be run out of R-studio, which is another freely available software package.
Basic R commands
There are a large number of commands in R, but to get started we will use a few essential ones.
R as a ¨desk calculator¨
A very simple use of R is to interactively perform calculations. For example, if we enter a value (say the number 1) on the prompt line and it return, we get a result: namely, we get back what we entered:
> 1
[1] 1
Many familiar operations such as addition, subtraction, multiplication and division have obvious R counterparts. Furthermore, R follows the usual rules of operation priority. We can illustrate these with a few simple examples
The line starting in [1] indicates that this is a result following the input (>).
> 1+1
[1] 2
> 1
[1] 1
> 1+1+2
[1] 4
> 1+2*7
[1] 15
> 3/2+4+8
[1] 13.5
> 3/2+4+8*3
[1] 29.5
> 3/2+(4+8)*3
[1] 37.5
> 4-5/3+4.4*(7+6)
[1] 59.53333
>
Sometimes we don't want the line that's typed or read from an R script file to actually be run, for example, we might want to add comments. Any line that begins with # is ignored by R. So for example
>###Here is how we add 2 numbers
2+2
the first line is simply ignored (but is still still displayed in the console and in the session history (upper right panel in R Studio). Adding comments to R code is highly recommended, both to remind yourself why you did what you did, as well as to communicate to others who may be using or adapting your code.
Assignment and relational operators
I have already introduced several operators above, namely the arithmetic operators +, -, * , / and ** for addition, subtraction, multiplication, division, and exponentiation. Additional arithmetic operators include ^ , which is the same as ** for exponentiation.
An assignment operator assigns a value or a result of an operation to an object. The standard R assignment operator is <- (less than followed by minus). So for example
> a<-2
assigns the value 2 to the object a. Notice that the operator works in either direction, as long as what is being assigned (in this case 2) is on the “-“ end and the object that it is assigned to (a in this case) is on the “<” side. So,
> 2->a
is equivalent to the above. Finally, notice that it makes no sense to assign values to 2, so that
a->2
Error in 2 <- a : invalid (do_set) left-hand side to assignment
>
returns an error. The operator “=” performs like <- so that “a=2” makes sense but “2=a” does not. In general it is preferable to use the -> or <- operators instead of = because the direction of the arrow makes clear the origin and destination of assignment, and to avoid confusion with the ‘is equal to’ comparison operator below.
Note that assignment is not a permanent, immutable condition. For example above we assigned
> a<-2
If we now assign
> a<-100
then 100 replaces 2 as the value for a. This is important and can be a cause for errors, for the opposite reason: the object a now retains the value 100 until we assign it something else. If we now use the object a in another computation, the value of 100 will go with it. So if we (say later in the R session or in a new session where we have not erased our workspace (see below) from the previous session, we invoke
>a*2
we should not be surprised to see “200” returned, since a’s value is currently 100.
Relational operators work differently, in that rather than assigning a value to an object, they compare 2 or more objects. Relational operators return a value of TRUE (or 1) if the comparison is true and FALSE or 0 if it is not true. The standard relational operators are >, >=, <, <=, ==, and !=. All of these except the last 2 should be obvious (but are illustrated below). The last 2 are the equality comparison (as distinct from assignment (read ‘is equal to’) and the inequality comparison (read “not equal to’). Note that certain combinations of these comparisons can be true
> a<-2
> b<-3
> 2<5
[1] TRUE
> a>b
[1] FALSE
> a<=b
[1] TRUE
> a==b
[1] FALSE
> a<b
[1] TRUE
> a!=b
[1] TRUE
> b!=a
[1] TRUE
>
Reading and manipulating data: collections (lists) and data frames
Data in R are simply the numerical values assigned to one or more data objects. There are a number of ways that data can be entered into R. As we saw above, values can simply be assigned to one object at a time, e.g.,
>a < -4
>b<-50
Usually, data will consist of a list of values for each object (or variable) in the data. For example, we might observe temperature values at 5 sequential days of 20, 25, 19, 21, and 24. One method for assigning these values as data is to define a collection or list of values and assign the collection to the object. For the above example, we might create an object called temp and assign it values as
> temp<- c(20,25,19,21,24)
> temp
[1] 20 25 19 21 24
In general, the function c() can be used to construct the list, with the individual elements separated by commas. Input into a list can be numeric as above, character, or logical. For example
>names<-c("Fred","Wilma","Betty", "Barney")
>names
[1] "Fred" "Wilma" "Betty" "Barney"
or
>truthiness<-c(T,T,F,T)
> truthiness
[1] TRUE TRUE FALSE TRUE
As we will see in more detail later, individual elements of a list can be evaluated using either numeric or comparison operators. E.g.,
> temp-1
[1] 19 24 18 20 23
simply subtracts one from each value of temp. By contrast
> temp>20
[1] FALSE TRUE FALSE TRUE TRUE
evaluate the truth of the condition "temp>20" for each element of temp.
A dataframe is a 2-dimensional R object in which the columns represent the different variables (measurements taken) and the rows are the individual observations. As an example, suppose that in addition to the 5 temperature values we observe (in the same order) 5 humidity values.
> hum<-c(100,70,60,90,80)
A dataframe can be constructed that lines up the temperature and humidity values.
> obs<-data.frame(temperature=temp,humidity=hum)
> obs
temperature humidity
1 20 100
2 25 70
3 19 60
4 21 90
5 24 80
It's important to realize that the rows represent observations (samples), and the columns measurements at each sample. Thus, in this example each row represents a pairing of temperature and humidity (20,100), (25,70), etc., so it is critical that the elements get entered into the dataframe in the correct order. In some cases, measurements may be lacking (missing) for some samples. To preserve the order/ pairing, a placeholder of NA is entered for the missing value, e.g.,
> hum<-c(100,70,60,NA,80)
> hum
[1] 100 70 60 NA 80
> obs<-data.frame(temperature=temp,humidity=hum)
> obs
temperature humidity
1 20 100
2 25 70
3 19 60
4 21 NA
5 24 80
Note that temperature and humidity are referenced within the dataframe by "$", e.g.,
> obs$temperature
[1] 20 25 19 21 24
> obs$humidity
[1] 100 70 60 NA 80
>
New variables can be created in this manner.
> obs$new<-obs$humidity/obs$temperature
Finally, in this example temperature and humidity were originally defined as lists and then used to create the dataframe; their values are therefore available in memory, at least until removed or the R session is ended. In general, the elements of the dataframe will NOT be available globally (i.e, outside the dataframe) but must be accessed within the frame, either by $ as above or (as we will later see) using the with() function or by a data= argument in a some analyses.
Reading data into dataframes from files
Although it is sometimes efficient to enter data into dataframes from lists (as above), in many if not most cases we will be reading data from external files or databases and then converting the data to a dataframe. A common format for these external file is a comma-delimited text file. These files can be created from spreadsheets (like Excel) or other programs and usually have file extension "csv". Often the first line of the file will contain headers that provide the names for the column variables. The function read.csv() takes as input a comma-delimited text file, which by default is assumed to have headers in the first row. The above simple data example is contained in a csv file here, and is read by this script.
Getting help in R
Before getting too far into the R weeds, I want to point out that, while R is relatively easy to use and powerful, everyone – and that includes me- gets stuck and needs help on occasion. Fortunately, there is a large and growing community of support for R, and many ways to get help quickly.
There are a number of references, both online and traditional print, that describe R in detail, and I recommend several below. Some “online” help in built directly into R, either in local html files provided with the R installation, or by reference to the R site. Finally, because R is used by a large community of practice worldwide, much information is available simply by entering a sufficiently descriptive question into an internet search engine.
Help is available from R Studio or from the e R-Gui by selecting “help” from the main menu. Here, you will be routed to a number of sub-menus, including some manuals in Adobe format, as well has html-format help and a “search for” help that allows you to insert a question and search for guidance.
If you have a question about a particular R command or function, often help can be obtained by typing ? followed by the function name (no spaces) on the command line. For example, suppose you want to know how the built-in function mean works. The command
>?mean
will send you to an html help page that fully explains the function and provides examples.
If you don’t know how a command is spelled or whether it is capitalized or not (R is Case-senSitive!) then you can try a more generic search. For example
>??Mean
would list a large number of functions and procedures related to “mean”, including the specific one called mean.
Writing script vs using command lines
In the above examples, we entered R commands after the > prompt, one command at a time. An equivalent (and often more efficient) approach is write a series of commands and save them to a script file (usually this will be a file with a .R or .txt extension, which can be edited like a text file). Script files can be created using the R gui (“New Script File”) or any standard text editor (like Notepad). The file commands can then be submitted to R using the R-gui, either all at once or by selecting a range of lines to submit. Most of our later, more complicated R examples will involve script files.
Using script files has the added advantage of saving command in format that makes it easier to revisit an analysis project at a later date. Saved script files also provide templates that can be modified for future applications-- something that is very handy, since as we will see a few standard R "themes" will tend to re-appear with variations over many applications. As I mentioned above, it is a very good idea always to add comments (lines starting with #) to the script to keep track of your logic. This is good even for small bits of code, and essential for longer more complicated ones.
Let me repeat and emphasize something from above saved script files also provide templates that can be modified for future applications. I do this all the time: I or someone else has written code previously for a similar problem. I grab the code and modify it for my specific needs, and maybe improve it. You do not have to write original code. Just know what the code you "borrowed" (and modified) does. This lead to Conroy's Principle (helpful hint number 1 for the course):
"When faced with an R problem, (like a homework assignment), whenever possible find the closest already written code that solves a similar problem, lift that code, and modify it for your problem."
Tips for running R in Windows
Setting the working directory
One of the most aggravating things you will run into is having to tell R each time you run it where your data files are. You don’t want to hunt for these files every time your run R. By assigning a working directory for your session you tell R where to look for files (and where to write output) without having to assign the full file path each time. To illustrate, suppose I want to use the directory
C:/R_stuff
as my working directory (note: I am assuming that you have already created a directory that will be your working directory and perhaps already placed files there. We saw in the previous page about setting the working directory in R studio. The same results can be achieve by typing the setwd(“directory name”) command, for this example
>setwd("C:/R_stuff")
This will now be the location where R expects to see input and will write output unless specifically directed otherwise. Note that directory levels are separated by "/" in R, not "\" as in windows, and the name read as a character string (in double quotes)
Both of the above require commands at each R session to set the working directory. You instead may wish to change the default directory that R opens in to your working directory. This is done by going to the Windows task bar, right clicking on the R icon, selecting Properties, and changing the “Start In” directory as desired.
I find it much easier to set the working director using R-Studio.
Again, his is easy to do by
Next: More advanced concepts - Object elements, lists, and arrays