Post date: Nov 11, 2019 6:16:5 PM
I realized last Friday that only the odd numbered individuals had been sequenced.
I use the following command to keep them nin a new file called oddPandoSamples.txt
grep '^[0126][0-9][13579]' PandoSamples.csv > oddPandoSamples.csv
I modified the script to map the samples under mappingPandoOdd.R and I use source("mappingPandoOdd.R") to run it in R but I did not have any output so I now try under the bash terminal using R CMD BATCH mappingPandoOdd.R.
To visualize the output I openend ssh_config file situated in /etc/ssh/ and uncommented the lines related to X11Forward.
Running the R script in the terminal opened the map in a new window. Yayy!
Now I would like to export the colour column to use it for the PCA.
I use the colour column from the mappingPandoOdd.R file which correcponds to column number 4. I use
write.table(samps[,4], "type.txt") in R
Now I have a "type.txt" column that is sorted according to the individuals sample numbers (maybe I should export sample ID here as well) and that has a different color number for each sample.
However, the number of lines is 283 while th enumber of sequenced samples is 280.
I go back to the vcf file and extract the individual names from there.
These are the two line sthat worked:
grep 'potr' filtered2xHiCov_pando_variants.vcf > ./rozenn/type.txt
cut -f 10- type.txt > type_clean.txt
I was able to get the first PCA using the following code:
library(jpeg)
library(Cairo)
library(FactoMineR)
library(factoextra)
# Work with FactoMineR
#http://factominer.free.fr/factomethods/principal-components-analysis.html
######## Prepare the data ##############
# First, look at genotypes point estimates
pntest <- as.matrix(read.table("pntest_filtered2xHiCov_pando_variants.txt"))
names <- as.matrix(read.table("sampleID.txt"))
type <- as.matrix(read.table("type_clean.txt"))
# Specify which individuals are Pando
track_type = rep("aroundPando",each=length(type))
for (i in c(which(type == "potr-017-B"):(which(type == "potr-277-B"))))
{
track_type[i] <- "Pando"
}
for (i in c(which(type == "potr-603-B"):(which(type == "potr-659-B"))))
{
track_type[i] <- "nearbyClone"
}
data <- rbind(names,track_type,pntest)
data <- as.data.frame(t(data))
#data <- rbind(type,pntest)
# Transpose matrix to have it with individuals as rows and variables as columns
#data[,c(1,2)] <- as.character(data[,c(1,2)])
for (i in c(3:dim(data)[2]))
{
data[,i] <- as.numeric(as.character(unlist(data[[i]])))
}
######## Plot the data ##############
# Use habillage to specify groups for coloring
fviz_pca_ind(res.pca,
label = "none", # hide individual labels
habillage = data[,2], # color by groups
palette = c("#af8dc3", "#E7B800", "#5ab4ac"),
#addEllipses = TRUE # Concentration ellipses
)
which gives the following:
There is no filter being applied now, this PCA represents all individuals.
This coloring is rather arbitrary and based on microsatellite and mapping data.
Pando = from individual 017 to individual 277
nearbyClone = individuals from 603 to 659
nearby Pando = everything else
Applying read depth filter:
dp <- as.matrix(read.table("rawDepth.txt"))
dp <- sort(dp)
mean(dp)
[1] 1122.135
median(dp)
[1] 34
> sd(dp)
[1] 2620.133
Read depths per SNPs:
> quantile(dp)
0% 25% 50% 75% 100%
1 1 34 1311 76707
Find out the read depth per individual.