R Basics 1

The total length of the videos in this section is approximately 47 minutes, but you will also spend time running and exploring R code on your own while completing this section. Remember that you can speed up the videos if you click on the settings icon at the bottom right of the video window, which might be an especially useful plan if you who have seen R before.


You can also view all the videos in this section at the YouTube playlist linked here.

R code files end with the extension ".R". Download the file used in this set of videos, RBasics1.R, from the google drive embedded below. Don't try to open it with a regular text editor. Instead, open the code file in R. With R open, you can either double-click on the file to open it or use File>Open Document.

Before you watch the videos below, explore the code file a bit. To run code, highlight lines in the code file and type command-return (on a mac) or control-enter (on a PC) to send those lines to the console.

After you run a line of code that assigns a name to an object, try typing the name of the object to see its contents. See if you can figure out what the lines of code do by looking at what is produced.

You are not trying to memorize the commands in RBasics.R (or any of the R code files I'll show you), and at this point you're not expected to figure out what every line means.  The reason you are exploring the code before I explain the commands is that you will remember your own discoveries much better than the explanations from me.

Vectors

RBasics1.1.Vectors.mp4

Question 1: What will you obtain if you type length(object1)?

Show answer

1

R is case sensitive!

Question 2: Which of the following R expressions are equivalent?

Show answer

The first four are equivalent, but the last will give an error. You only need "c(...)" if you are putting together more than one R expression. So, 5:10 is exactly the same as c(5:10). The last option listed above will produce an error, as you didn't tell R that you wanted to concatenate 5:7 with 8:10, so it finds that the comma is "unexpected."


Subsets and Vector Operations

RBasics1.2.SubsetsAndVectors.mp4

Run the code discussed in the video above, and make sure you understand each line.

Matrices

RBasics1.3.Matrices.mp4

Suppose that mymatrix is a matrix with 100 rows and 100 columns.

Question 3: How many rows are in the subset mymatrix[11:20, 3:4]?

Show answer

10

Question 4: How many columns are in the subset mymatrix[11:20,3:4]?

Show answer

2

Question 5: How many rows are in the subset mymatrix[,3:4]?

Show answer

100

Run the code discussed in the video above, and make sure you understand each line.

More matrices

RBasics1.4.Matrices.mp4

Question 6: Suppose that I create a matrix by running this code:

matrix(1:100, ncol=50)

How many rows will this matrix have?

Show answer

2. If you don't specify both the number of rows and the number of columns, R assumes that you want a matrix to hold all of the values in the first argument of the function. Here, the first argument consists of the numbers from 1 to 100. To include all of these numbers in 50 columns, we need 2 rows. If you don't specify either the number of rows or the number of columns, R will make 1 column.

Help pages

RBasics1.5.HelpPages.mp4

Question 7: Why should you look at the help page for a function?

Show Answer

The help page shows you what arguments a function takes and what their default values are, along with example code, related functions that might be closer to what you need, and other information. 

Run the code discussed in the video above, and make sure you understand each line.

In particular, make sure you look at the help file that appears when you type ?matrix. Look up the help pages for some of the other functions you've used so far. Help files are a key part of programming. I look up a help file to remind myself of the syntax almost every time I write a few lines of code.  You should make a habit of opening the help pages for functions that are new to you. We will talk more about what's in the help pages, but see how much of it makes sense to you now.

Exploring datasets

RBasics1.6.DataSets.mp4

Question 8: Which of the following are equivalent?

Show answer

All of the options are equivalent. If you get an error when running something like swiss$Fertility, check to make sure that your object is a data frame rather than a matrix. Use is.data.frame to check and as.data.frame if you need to convert a matrix into a data frame.

Question 9: Look up the help page for swiss. What do the rows in the data set represent?

Show answer

Provinces

Note, common question: What is the key difference between a matrix and a data frame? Why might having both forms be helpful?

There are two types of R objects that contain information in a rectangle: matrix and data frame. Often, it doesn't matter whether your object is a matrix or a data frame. For example, you can extract rows or columns using the bracket notation on either a matrix or a data frame. However, there are some differences in what is possible with each type. If you want to do matrix multiplication (as in linear algebra), you need matrices rather than data frames. If you want your data to include more than one type of variable (such as numeric and categorical - imagine a data set that contains both age and race), you need a data frame. The reason it’s useful for R to offer both types as options is that some operations can only be done on a grid of all numbers (for example), rather than a data set with multiple types of variables, and storing the numbers as a matrix object makes it clear that those operations should be allowed.




Summaries and tables

RBasics1.7.Summarizing.mp4

Question 10: Suppose that a variable contains students' GPAs. Which command should you use to summarize this vector?

Show answer

summary. The table command will tell you how often each GPA occurs in the data set, but GPA is a continuous variable, so it would be more helpful to know the mean, median, min, max, etc. instead of how many people have a GPA of exactly 3.21.

Intro to subsetting

RBasics1.8.Subsetting.mp4

Question 11: What is the code equivalent of the following statement? I want to summarize the variable Gradelevel for children in a data set called school, where the variable Attendance is greater than 0.95 and the variable Readingscore is less than 0.8.

Show answer

table(school$Gradelevel[school$Attendance > 0.95 & school$Readingscore < 0.8])


Some students have asked why you can't use summary here. You might be able to. If the variable Gradelevel is categorical (R calls this a "factor"), then table and summary do the same thing. For a numeric variable, though, table and summary produce very different output. To see this, try making a table and also a summary of the vector c(3,3,4,5,7,7,7). Think about which output you'd want if you were trying to learn about children's grade levels at a school.

Plotting

Run the code for creating a histogram and boxplot. If you need a reminder of what a boxplot shows, see here for a good overview:

Question 12: How can you find out what parameters are available to change for a plot?

Show answer

Some of the parameters and their defaults will appear at the bottom of the code file when you type the open parenthesis after the command name, but you will be able to get more comprehensive information from the help file.

Now you have finished this section. Hurrah!

During this tutorial you learned:


Operators in review:

:, [], [:], +, -, *, /, ^, [,], $


Functions in review:

c(), length(), rep(), exp(), log(), matrix(), dim(), nrow(), rownames(), colnames(), is.matrix(), as.matrix(), head(), tail(), summary(), is.data.frame(), attach(), mean(), var(), table(), hist(), boxplot(), plot()