R Part 2

The total length of the videos in this section is approximately 33 minutes, but you will also spend time running code while completing this section.

You can also view all the videos in this section at the YouTube playlist linked here.

Please download the R code file below so that you can run the code along with the videos.

The sample command

RPart2.1.Sample Simulations.mp4

Question 1: What is the length of the vector generated from this command: sample(1:10,5,replace=TRUE)

Show answer

5

Random numbers

RPart2.2.Random Numbers.mp4

Question 2: What is the length of the vector generated from this command?

runif(5,2,3)

Show answer

5

For loops

RPart2.3.For Loop.mp4

Question 3: Why do I use "ii" as the object indexing the loop?

Show answer

Personal preference. It's more common just to use "i". I like "ii" because it's easily searchable if I want to find all instances of this object in a code file.

More for loops

RPart2.4.More on For Loop.mp4

Question 4: On the line of code that's numbered 83 in the video, why it is important to begin with

storage[ii]<-...

rather than just

storage<-...

?

Show answer

Each time we go through the loop, a different mean is generated. The purpose of [ii] is to indicate which space in the vector storage should be filled with the newest mean. If we leave out [ii], then, storage will be set equal to the first mean the first time we go through the loop; next, that value will be discarded and replaced by the second mean the second time we go through the loop; etc. At the end, storage will be equal to the mean from the last time we went through the loop, rather than a vector containing all 50 means.

However, if we use storage[ii], the first mean is stored in the first spot in the vector, the second mean is stored in the second spot, etc., as desired.

Graphics

RPart2.5.Graphics.mov

Question 5: How might you represent multiple variables in one scatterplot?

Show answer

In addition to the x and y axes, you can represent variables using color, point size, and point symbol.

So, why do many people love R for graphics? Flexibility is a big reason. You can make pretty much any graphic that you can draw or imagine. Another important reason is that there are so many packages available - no matter what application or method you are working on, there is likely an R package full of beautiful graphical options. If you are comparing R to excel or Stata or SPSS, then the fact that we use reproducible code in R is a plus.

t-tests

RPart2.6.T-test.mp4

Question 6: Your data set contains a column containing student test scores and a column containing student genders. You want to conduct a t-test to compare test scores between genders. Select the appropriate code.

  • t.test(scores, gender)

  • t.test(scores~gender)

Show answer

t.test(scores~gender)

And that's it.

During this tutorial you learned:

  • How to use the sample() function to randomly sample elements from a vector

  • About the functions that sample from known, famous distributions, including from a normal distribution with rnorm(), from a uniform distribution with runif(), from an exponential distribution with rexp(), and from a binomial distribution with rbinom()

  • To use the function set.seed() to write code that generates the same random numbers each time it is run

  • To create for loops in R and save the output from each iteration in a vector

  • To visualize more than two variables on one scatterplot, plot(), by changing the point size with the argument cex=, point type with the pch= argument, or point color by setting the col= argument to either a categorical variable or a vector of color names

  • How to conduct a t-test in R with the t.test()

  • How to find the left sided p-value with the pnorm() and pt() functions, and translate that output to a right-sided p-value or a two-sided p-value


Functions in review:

sample(..., n=, replace= ) rnorm(), runif(), rexp(), set.seed(), for (ii in …) {}, plot(cex=,pch= col=), t.test(), wilcox.test(), pnorm(), pt(..., df=)