Merge SNPmarker
Slack CH April 7th 2021
locatino on server: /space/chen-syn01/1/data/cinliu/data/extract_clean_SNPID
Slack CH April 7th 2021
locatino on server: /space/chen-syn01/1/data/cinliu/data/extract_clean_SNPID
Location on server: /space/chen-syn01/1/data/cinliu/data/extract_clean_SNPID
Hi Nini, could you help merge all the files in this folder and give me a file with the second column only (SNP list)?
"/space/chen-syn01/1/data/haowang/InvHap/SNPmarker"
18.txt files
Each file looks similar to this:
SNPmarker.R
rm(list=ls())
#only extract column 2
dir <- "/Users/nini/Desktop/2021lab/CH/SNPmarker/" #scp from "/space/chen-syn01/1/data/haowang/InvHap/SNPmarker"
setwd(dir)
files <- list.files(path = paste0(dir,"SNPmarker/"))
files[1]
df <- read.delim(paste0(dir,"SNPmarker/",files[1]))
df <- df[,2]
count <- 0
for (i in 2:length(files)) {
df2 <- read.delim(paste0(dir,"SNPmarker/",files[i]))
print(nrow(df2))
df2 <- df2[,2]
df <- c(df,df2)
}
class(df)
df <- as.data.frame(df)
colnames(df) <- "SNP"
write.table(df, "SNPmarker_2ndColumnOnly.txt",quote = FALSE,row.names = FALSE, col.names = TRUE)
File attached: SNPmarker_2ndColumnOnly.txt
Note: for 2 of the files the names are given in dbSNP instead of rsSNP format, let me know if you’d like me to do something about it ?
Changes needed to be made:
Files “UKB_17q21.31.txt” and “UKB_6p21.33.txt" names are given in dbSNP instead of rsSNP format. We need to do ID conversion.
The reference genome version is GRC37 (also known as hg19)
Possible tools CH suggested might be helpful:
Make ID the common identifier. Then use the merge function to merge and indicate ID as the common factor.
2.SNPmarker_SNPname_only.R
rm(list=ls())
#only extract column 2
dir <- "/Users/nini/Desktop/2021lab/CH/SNPmarker/" #scp from "/space/chen-syn01/1/data/haowang/InvHap/SNPmarker"
setwd(dir)
files <- list.files(path = paste0(dir,"SNPmarker/"))
files[1]
df <- read.delim(paste0(dir,"SNPmarker/",files[1]))
df <- df[,2]
count <- 0
for (i in 2:length(files)) {
df2 <- read.delim(paste0(dir,"SNPmarker/",files[i]))
print(nrow(df2))
df2 <- df2[,2]
df <- c(df,df2)
}
class(df)
df <- as.data.frame(df)
colnames(df) <- "SNP"
write.table(df, "SNPmarker_2ndColumnOnly_update.txt",quote = FALSE,row.names = FALSE, col.names = TRUE)
rsSNP_to_dbSNP.R (this is like step 3)
Full code:
1.snp_name_correction.R
For a more cleaned up version: /space/chen-syn01/1/data/cinliu/data/extract_clean_SNPID/SNPmarker
It is basically doing the same thing just with simplified steps.