Adding rows (combining certain rows from different data frames together)
Adding rows (combining certain rows from different data frames together)
We begin with an excel file with multiple sheets and column of information
Our desired end goal:
a .tsv file with only 3 column. The first column we to add suffix 'chr' to the chromosome number
We will take some thing that looks llike this:
In to some thing that looks something like this: (we want it in .tsv form, this is viewing the tsv file in numbers)
library(readxl)
humandata <- read_excel("1250368tableS2 (1).xlsx", sheet = "Present-day human-specific DMRs", skip = 1)
View(humandata)
The excel files (1250368tableS2 (1).xlsx)we are given has 5 sheets. Out of those we only want to see the sheet names "Present-day human-specific DMRs"
skip = 1 Skip the first row because it is a note from the author of this data
This is what the excel data looks like:
We only want the columns 'chromosome', 'start coordinate', and 'end coordinate'
fyi
class of chromosome is 'character'
class of start coordinate is 'numeric'
class of end coordinate is 'numeric'
We want to add the prefix 'chr' to the characters in the chromosome column
prefix <- "chr"
suffix <- humandata$chromosome
humandata$chr <- paste(prefix, suffix, sep = "")
We created a column in the dataframe called 'chr' that has suffix 'chr' whatever character is in the column 'chromosome'
we use the paste() function to put together with no space between the prefix and suffix sep = ""
Now we want to make a new data frame that only contains the 3 columns we want (we will call it 'Presentday_Human_data'
Presentday_Human_data <- data.frame(humandata$chr, humandata$`start coordinate`, humandata$`end coordinate`)
names(Presentday_Human_data)
names(Presentday_Human_data) <- c("chromosome", "start", "end")
We want to add the prefix 'chr' to the characters in the chromosome column
prefix <- "chr"
suffix <- humandata$chromosome
humandata$chr <- paste(prefix, suffix, sep = "")
We created a column in the dataframe called 'chr' that has suffix 'chr' whatever character is in the column 'chromosome'
we use the paste() function to put together with no space between the prefix and suffix sep = ""