First, make sure you have R or R studio installed on your computer. If you don't, go talk to your teacher/Bill and get that figured out.
Once you've got R open, you should also have a set of data you want to work with. If you don't, R already has a few sets installed you can play with (google "R Built-in Data Sets" for more info).
1) Understand how to create variables. "x" is the variable above, and "x" signifies 10 in this example. You can have your variables be names or words, and they can signify numbers or other words or functions. Whenever you type the variable, your program will give you the value you originally typed in return.
2) A function is seen on the third line. "print(x)" is a simple function telling the program to let us see what "x" equals. They get a lot more complicated than that, this is just an example.
If you are anything like me, you have your original data in Excel spreadsheets. If not, that's ok, there are other ways to import data, but this tutorial will only cover how to get there from Excel.
1) type this into the terminal, replacing "NewData" with the name of your spreadsheet in Excel.
library(readxl)
NewData <- read_excel("Downloads/NewData.xlsx")
1a) If you are using R studio, use this command to be able to see the data above your terminal.
View(NewData)
Data frames are a different way to organize data and transform within R that doesn't disrupt your Excel sheet. You can make your data frame look however you want; this is just one example below. Start with labeling a data.frame()
function and then within the function, add as much information as you would like. Each comma starts a new column in the data frame. Also, make sure use the c()
function when defining what you want in the column. The colon between 1 and 4 means that it will use all numbers 1 through 4.
> new_frame=data.frame( dogs = c(1:4), breeds = c("German Shepherd", "Black Lab", "Pug", "Golden Retriever"))
> new_frame
dogs breeds
1 1 German Shepherd
2 2 Black Lab
3 3 Pug
4 4 Golden Retriever
Use nrow()
when you need to use all the rows in a data set or data frame, because nrow will find that number for you.
> nrow(new_frame)
[1] 4
> nrow(NewData)
[1] 3253
If you just want to see what one particular part of your data frame is, or if you wanted to use just one particular part, you could created a new variable and define it as whatever part of your data frame you want. The first number within the brackets represents what row number you pick, and the second what column you pick.
> nf=new_frame[1,1]
> nf
[1] 1
> nf=new_frame[1,2]
> nf
[1] German Shephered
Levels: Black Lab German Shephered Golden Retriever Pug
Here is how to put your data into a bar plot. The bolded words are labels you will replace with labels accurate to your data. In this example, I am plotting x amount of semesters of Engineering that y amount of boys have taken. See the picture below if that is confusing.
counts
are. Use the format of the code below. Replace student_summary$semesters with what you want your x-axis to measure, and replace student_summary$gender with what you want your y-axis to measure. Define what character you are looking to measure in place of M. > counts <- table(student_summary$semesters[student_summary$gender=="M"])
2. Now use the barplot() function. Define your title, x-axis, and y-axis (also known as main, xlab, and ylab) . Your barplot should now look like mine above.
> barplot(counts, main="Number of Guys per Semester of Engineering",
+ xlab="Semesters", ylab="Number of Guys")
Sometimes you will want to compare two sets of data, like this.
Here's how to do that:
> counts <- table(student_summary$gender, student_summary$semesters)
2. Now put in your title, y-axis, and x-axis, and choose which colors you want to appear for the two things you are comparing. For example, for girls I picked red and for boys I picked blue.
> barplot(counts, main="Number of Boys and Girls by Semesters",
+ xlab="Number of Semesters", ylab = "Number of Students", col=c("red","blue"),
+ legend = rownames(counts), beside=TRUE)