The goal of this module is to get you started with R and the RStudio integrated development environment (IDE). In particular, you need to get to the point quickly where you can install R and RStudio, and then be able to start playing with the most fundamental data type in R, the vector. We will start with outlining some learning objectives for this class with respect to programming. My goal is that you learn not just how to use R, but some more general principles of programming that apply to all computer languages (8:49 min).
We can explore the software stack that we are using when we run the RStudio integrated development environment, and how the different programming tools within RStudio interact with that software when we use them (18:21 min).
Here are some direct links for installing R and RStudio:
The Comprehensive R Archive Network (CRAN) for downloading and installing R (among many other R packages).
Open source RStudio Desktop for downloading and installing RStudio (should install R first).
RStudio does not come with R. You should have R installed first so RStudio can find it and configure its links to it.
While we do not necessarily need external packages in the activities for this class, the ability to install external R packages with useful tool is useful. This provides an example of installing the R "swirl" package, which is a training tool for learning the basics of R from an R console. While past students have mentioned liking Swirl, it has not been updated in a long time. I am not necessarily recommending Swirl as a learning tool, especially considering every student is different in terms of effective modes of learning programming skills (25:02 min).
Unless you are working with very large data sets, you will likely do most data manipulation in the random access memory (or RAM) of your computer. The way we organize RAM is to assign objects we create to variables, which results in them being stored somewhere in RAM with the variable providing a way to reference that location in our program. Therefore, you can think of learning how to assign and do operations with variables as your strategy for organizing your use of RAM. One of the arts to computer programming is how to organize RAM in a way that makes your programs more intuitive and thus less likely to have bugs. Here is an introduction to the concepts and syntax associated with organizing RAM in R (10:43 min).
A foundational skill in computer programming is to have a full understanding of the data structures you are using. R has many useful functions for exploring the details of data structures, and I cannot emphasize enough how much time in debugging you will save by being able to check whether each line of code if having the effect on a data structure that you expect. The next video introduces the most primitive or irreducible data structure in R, the atomic vector (10:46 min).
Many operators and functions are available to quickly create various types of R atomic vectors. The next video reviews a few of the more common methods to creating atomic vectors, including those that will be useful in activities for this class (20:02 min).
Many of the iterative techniques we use in this class will need to access a single element of a vector at a time. In fact the ability to access individual elements or subsets of elements in a vector is by far one of the most common operations necessary in computer programs. The number of ways to index a vector in R is spectacularly versatile, which is useful. However, because there are so many ways to do the same thing with R indexing, interpreting how code works when you don't understand the method of indexing being used can be particularly mystifying. This video covers some of the more common and useful ways to index vectors in R (11:59 min).
The fact that the smallest data type in R has the potential to be composed of more than one element introduces some additional rules to basic mathematical operations. What will happen when the operation x + y is performed if both x and y have more than one element or if they have differing numbers of elements? Here is a review of the basic rules of vectorized mathematics in R. These are very important to understand because R will happily make assumptions and produce results without error or warning, even when you try to do some very non-intuitive basic operations with disparate vectors (6:00 min).
The goal here was to develop a deeper level of understanding of the structure of vectors in R than you are likely to receive in training focused on statistics. That said, working at this level of abstraction for too long starts to reduce the ability to retain the material. Therefore, we will be moving promptly on to practicing the use of vectors in the next module, and we will return later to other basic R data structures like matrices, lists, and data frames.
Let's do a more practical exercise with vectors in R in the context of mathematics commonly applied in environmental science. Imagine a chemistry batch reactor experiment where we can measure the exponential decay of the concentration of a reactant due to a chemical reaction following first-order kinetics. Maybe we would like to get an idea of what that curve will look like depending on the initial concentration and the first-order rate coefficient.Â
First, let's review the mathematics of exponential decay as they relate to the solution of an ordinary differential equation for a first-order process (6:48 min).
Now we have an understanding of the theory underlying the exponential decay equation we need to build the time and concentration vectors in R representing the measurements of concentration through a hypothetical reactor experiment. The easiest way to understand the nature of these vectors at a glance is to visualize the time-series results of the experiment with base R graphing functions (8:54 min).
The next video starts with the following code that was developed in the previous video. You may want to cut and past this into an R script in RStudio if you want to follow along with the next video but did not build this code yourself. Note that you need to change the graphics device you use depending on your operating system (windows() for Windows, and quartz() for Mac).
We have a good start, but a couple aspects of the graph from the video and code above do not look very professional. Let's start to look at a couple of ways you can customize graphs in base R to meet the minimum expectations of most scientific applications. We'll start with adjusting the margins around the plotting region and customizing the axis labels with text and mathematical notation (11:58 min).
The next video starts with the following code that was refined in the previous video. You may want to cut and paste this text into an R script in RStudio if you want to follow along with the next video but did not build this code yourself. Note that you need to change the graphics device you use depending on your operating system (windows() for Windows, and quartz() for Mac).
We have covered nearly all the basic skills needed to build a tool that will help us get our heads around what this curve will look like depending on the values applied to the kinetic model parameters. For example, maybe we want to see what happens if we double the reaction rate coefficient relative to the original rate. This gives us an excuse to learn how to add additional plots to the same axes, change the symbols used for data points in a bivariate scatter plot, and add a legend describing what the different symbols represent (13:33 min).
Please note that these slides are intended to provide the logical framework in between active sessions with R. There are some useful visualizations among these slides, but many are just bullet points intended to bridge logic and introduce the real time sessions with R exercises. These exercises are available in the videos. (In other words, I recognize that many of these slides are terrible without the context of the R session, and these slides should definitely not be used as an example of how to develop effective visualizations for a presentation.)
Click this link to download the MS PowerPoint file
The embedded Google viewer below sometimes provides poor renderings of Microsoft files. Use the link above to download the original file with proper formatting.
Click this link to download the MS PowerPoint file
The embedded Google viewer below sometimes provides poor renderings of Microsoft files. Use the link above to download the original file with proper formatting.
Note that you need to change the graphics device you use depending on your operating system (windows() for Windows, and quartz() for Mac).
We will be covering the details of R data structures like vectors, lists, matrices, and data frames as we need them. However, the relevant details on R data structures we will use the most have been compiled into a single document. These notes attempt to cover the nature of R data structures that I have seen cause the worst misconceptions or the hardest to find bugs in student's code.
Notes on the basic R data structures used in this class (link to the full page HTML version)
Rmarkdown source code for detailed notes on basic R data structures (download the Rmd file)
Notes on the basic R data structures used in this class (download the postscript PDF file)Â
Most exercises will not require sophisticated graphing skills, and the materials for this class provide examples using base R graphics. However, base R graphics provide an incredibly flexible graphing tool and understanding just a few fundamentals gives you the capacity to tweak graphs to look exactly as you would like. The following is a document I have started (generated by Rmarkdown) that at least initiates a deeper dive into graphing with base R.
A deeper dive into graphing in base R (link to the full page HTML version)
A deeper dive into graphing in base R (download the fully encapsulated HTML version)
Rmarkdown source code for a deeper dive into graphing in base R (download the Rmd file)
A deeper dive into graphing in base R (download the postscript PDF file)
The Comprehensive R Archive Network (CRAN) for downloading and installing R (among many other R packages).
Open source RStudio Desktop for downloading and installing RStudio (should install R first).
RStudio does not come with R. You should have R installed first so RStudio can find it and configure its links to it.
Swirl R Package for R training using the R console
Recordings of R carpentries workshops from the MSU Library and SCRS