If you are an absolute beginner in R you should first download R and Rstudio. There are many tutorials online but I will link a couple below.
Note from Nini: I was confused about what R and Rstudio are when I first got started. An analogy I found helpful was to think of R and R studio like a person and their house. R is the actual person, and Rstudio is a nice house someone build for R to live inside and function nicely.
The programming language = R
Integrated development environment = Rstudio
Install R and Rstudio tutorial: https://www.youtube.com/watch?v=d-u_7vdag-0
YouTube playlists:
(YT playlists are great for when you don't have much time and want to learn in small increments)
https://www.youtube.com/watch?v=riONFzJdXcs&list=PLqzoL9-eJTNBDdKgJgJzaQcY6OXmsXAHU
Really good beginner playlist by MarinStatsLectures-R Programming & Statistics. Very beginner friendly.
https://www.youtube.com/watch?v=SWxoJqTqo08&list=PLjgj6kdf_snYBkIsWQYcYtUZiDpam7ygg
Datacamp have short videos. Good for beginner after you install R and RStudios.
https://www.youtube.com/watch?v=KlsYCECWEWE&list=PLEiEAq2VkUUKAw0aAJ1W4jpZ1q9LpX4yG
Coursera:
(choose the audit for free option, you don't have to pay to see the crouses!)
https://www.coursera.org/specializations/data-science-foundations-r
For absolute beginners: week 2 shows you how to download, install and set up. Start here!)
https://www.coursera.org/learn/r-programming
will teach you practical skills might be more lenghty)
R resources from Chi Hua:
Additional materials if you're interested to learn more about R. Here are some learning examples. There are many other good ones.
You can look through these nicely explained R tutorials for getting started and making plots.
My postdoc put together the resources below. Here are some online links and e-books that she found useful in learning R. She recommended starting with the first two links.
Thank you for helping and have a great evening!
https://www.littlemissdata.com/blog/prettytables
(making tables that almost look like heatmaps!)
also Laura Ellis (littlemissdata) also gives very nice simple R beginner introductions: https://www.littlemissdata.com/blog/watsonstudio
https://rfortherestofus.com/2019/11/how-to-make-beautiful-tables-in-r/
https://r-graph-gallery.com/ggplot2-package.html
https://informationisbeautiful.net/
https://blogs.baylor.edu/rlatentvariable/sample-page/r-syntax/
Latent Variable Modeling using R: A Step-By-Step Guide
(well more or less “beginner” in my mind)
Mini history
pipes comes from magrittr package by Stefan Milton Bache
pipes are preloaded in tidyverse
After you install R and R studio:
sample workplace set up:
## clear the workspacerm(list=ls())setwd('/Users/niniliu/Desktop/2020lab/')## Set working directory and data directoryWDir = '/Users/niniliu/Desktop/2020lab/' # UPDATEsetwd(WDir)DDir = '/Users/niniliu/Desktop/2020lab/geneID add/' # UPDATEClear works space
rm(list=ls())
Installing packages
install.packages("name of package")
Language setting
Sys.setenv(LANG = "en") #setting the error message to English
Selecting rows and columns in dataframes
dataframe[row,column]
c is combine. Can also use c in this way
dataframe[,-c(1:5)]
this took away column 1-5 in
#matching practice
setwd("~/Desktop/rpractice")
V1 <-c("james", "jamie", "john")#first data set
id1 <- c(3, 4, 7)
V2 <- c(200, 300, 9000, 500, 222)
id2 <- c(7, 3, 4, 6, 2)#second dataset
df1 <- data.frame(V1, id1)#df1
df2 <- data.frame(id2, V2)#df2
match(df1$id1, df2$id2) #will give you the order
#subset
df2$V2[match(df1$id1, df2$id2)]
#df1$amountdue = df2$V1[match(df1$id1, df2$id2)]
new order
#add a new column in the df for the new order you want
#Df1 is the order you want, making DF 2 be the same as 1
DF2$order <- match(Df2$names,DF1$names)
savenew order df <-DF2[order(DF2$order),]
when plotting categorical data the default of ggplot is to plot them in alphabetical order. To change this you need to change your variable names into factors first; and then reorder the factors.
Example: manually ordering the variable region
a$region <- as.factor(a$region)
levels(a$region)
a$region <- factor(a$region, levels = c( "superiortemporal_area", "superiorparietal_area", "precuneus_area", "posterolateraltemporal_area", "parsopercularis_area"))
warnings
Warning message:
In `[<-.factor`(`*tmp*`, a$color == "anteromedialtemporal_area", :
invalid factor level, NA generated
Means what you want to do is not working because the variable is a "factor", change it to "characters" and it should work.
setdiff(list1, list2) #will return what is different in list1
grep
grep("word you are looking for ", list_you_think_it_is_in)
x <- 'aabb.ccdd'
> sub('.*', '', x)
[1] ""
> sub('bb.*', '', x)
[1] "aa"
> sub('.*bb', '', x)
[1] ".ccdd"
> sub('\\..*', '', x)
[1] "aabb"
> sub('.*\\.', '', x)
[1] "ccdd"
prefix <- "pre"
suffix <- data$variable
data$chr <- paste(prefix, suffix, sep = "")
Show the default colors used in ggplot
https://stackoverflow.com/questions/25211078/what-are-the-default-plotting-colors-in-r-or-ggplot2
after you have your plot you can used ggplot_build()$data to see the colors used .
brew color example from a heatmap
heatmap(dfMatrix, Colv = NA, Rowv = NA,
col= colorRampPalette(brewer.pal(8, "Blues"))(25) #this line is the color line
)
example code:
write.table(AMH_derived_data, file = 'AMH_derived_datas.txt', sep = '\t', quote = F, row.names = F)explanations: write.table(the data you want to save, file= 'files name you want to save.format', sep = '')
More functions
sep = "\t"
sep =","
quote = F
quote = T
sep = how you want it t separate
sep by tab
sep by comma
quotes = double quotes in output table or not
quote = True is with quotes
quote = False is no quotes
Exporting excel
library(xlsx)
write.xlsx(mydata, "c:/mydata.xlsx")
the dataset has multiple variables - to check for NAs or missing values in all variables, only want to keep rows with no missing variables.