Post date: Jun 27, 2020 3:21:48 PM
- Extract proba of being heterozygote from vcf file friends + pando, 226 samples.
- Label heterozygotes, homozygotes and NA based on threshold of 0,94.
- I obtain exactly the same variants when I filter for >0.94 in both clones... This is weird.
I can try filter with a less stringent filter or look at spatial pattern without filtering for low hets.
Ok! I have found my mistake: TRUE/FALSE when turned to 1/0 cannot be considered as indices... It was considered as the first column of the file, repeated the number of times 1 appeared.
The thing is that, if I do not consider the homozygote variants (which give no spatial pattern idea) I am left with very few variants.
- I create the boolean vector to keep the interesting variants.
- I filter the vcf file based on this boolean vector. (remove the header of the vcf file and filter variants).
- Extract proba of being hets from these vcf files + only keep the individuals pertaining to the clones.
- Okay, the final files with the probability of being hets for less than 0.96 and individual clones are in :
/Volumes/Data/Documents/Education/GaTech/Year2/Summer2020/Research/data/Pando/11-FriendsSelectedClones/proba_hets_clone_45.txt
/Volumes/Data/Documents/Education/GaTech/Year2/Summer2020/Research/data/Pando/11-FriendsSelectedClones/proba_hets_clone_25.txt