The total length of the videos in this section is approximately 25 minutes, but you will also spend time running code while completing this section.
You can also view most of the videos in this section at the YouTube playlist linked here. (The barplot video is not on youtube at this point.)
Please download the code file, which is used in the first videos:
Question 1: How do you know whether you should use boxplot(vec1, vec2, vec3) or boxplot(combinedvec~groups) to plot 3 boxplots on one set of axes?
Examining the data and determining whether you have a vector for each subset or whether you have 2 vectors, one containing all the data, and one containing codes that assigns the data to a group. Also, you can manipulate the data between these two formats if you have a strong preference!
Question 2: How can you determine what is able to accessed in a saved output using the " $ " syntax?
Type words that you think should be reasonable, and see if you get an error or not.
Use the names() command, with your object name in the parentheses
Use the names() command, with your object name in the parentheses. While you can type words that you think should be accessible portions of the output, it is likely not the most efficient way to find out what you can access. In addition, items may not be labeled as you expect (ex. p.value instead of p-value or p value).
Question 3: Which of the following outputs would be obtained by the code:
ifelse(c(1,2,3,4) < 3, "Y", "N")
"Y" "Y" "N" "N"
"N" "N" "Y" "Y"
The first option. 1 and 2 are less than 3, so the console will print "Y" twice, and 3 and 4 are not less than 3, so the console will print "N" twice.
Question 4: For which of the following commands should the file name specified end in a '.R' extension? Check all that apply.
save()
write()
load()
read()
save and load. Saving and loading are used with R objects that you want to use later, while writing and reading are used with different file types (csv, pdf, jpeg, etc.) so that you can access your work outside of the program.
Here is one more idea that will be helpful as you move on. When you change a variable or make a new version of it, the best practice is to create is as a new column in your data set:
dataset$LogIncome<-log(dataset$income)
is much better than
LogIncome<-log(dataset$income)
The latter will not become a new column in your data set, and so it won't be included if you take a subset of your entire data set, reorder the rows, etc.
Next, please download this code file, while will be used in the last video:
Question 5: What does the barplot command want as input, instead of the data itself?
A table summarizing the data, either one vector or a matrix.
Here is a frequently asked question: How do I reorder the bars in a barplot?
First, note that you can make a barplot by first creating a table, and then using the table as an argument for barplot. For example, if you have a categorical variable about voting and a categorical variable about political view, you might investigate the relationship between these two variables via
mytable<-table(vote, view)
barplot(mytable, legend=TRUE)
By default, R will sort the categories alphabetically. This usually isn't what you want, though. The simplest solution is to reorder the rows and/or columns in the table. For example, if the categories for political view are currently ordered alphabetically: "Extremely Conservative", "Extremely Liberal","Moderately Conservative", "Moderately Liberal","Neutral"
You can reorder as follows:
mytable<-table(vote, view)
mytable<-mytable[ , c(2,4,5,3,1)] # because we want the second column from the original table, and then the fourth, and then the fifth, etc.
barplot(mytable, legend=TRUE)
You can also specify an order for the categories as part of the variable itself. This might be useful because then the categories will appear in the correct order for the rest of your analyses. The two arguments to the function factor are the name of the variable and a vector of the levels in whatever order you prefer.
view<-factor(view, levels=c("Extremely Liberal","Moderately Liberal","Neutral","Moderately Conservative","Extremely Conservative"))
This link expands on how to deal with categories and their order.
During this tutorial you learned:
To create a set of boxplots visualizing several vectors and change the labels with names= argument
Review of the $ symbol and how to use it for objects
About several helpful functions for data manipulation
How to import and export csv files
How to save and load data and R objects as .R files
How to save graphics as a PDF or jpeg file
How to make a barplot
Functions in review:
boxplot(), groups(), by(), apply(), which(), if() { }, ifelse(), save(), load(), write.csv(), pdf(), dev.off(), barplot()