What I did today

Post date: Jun 27, 2020 3:21:48 PM

- Extract proba of being heterozygote from vcf file friends + pando, 226 samples.

- Label heterozygotes, homozygotes and NA based on threshold of 0,94.

- I obtain exactly the same variants when I filter for >0.94 in both clones... This is weird.

I can try filter with a less stringent filter or look at spatial pattern without filtering for low hets.

Ok! I have found my mistake: TRUE/FALSE when turned to 1/0 cannot be considered as indices... It was considered as the first column of the file, repeated the number of times 1 appeared.

The thing is that, if I do not consider the homozygote variants (which give no spatial pattern idea) I am left with very few variants.

- I create the boolean vector to keep the interesting variants.

- I filter the vcf file based on this boolean vector. (remove the header of the vcf file and filter variants).

- Extract proba of being hets from these vcf files + only keep the individuals pertaining to the clones.

- Okay, the final files with the probability of being hets for less than 0.96 and individual clones are in :

/Volumes/Data/Documents/Education/GaTech/Year2/Summer2020/Research/data/Pando/11-FriendsSelectedClones/proba_hets_clone_45.txt

/Volumes/Data/Documents/Education/GaTech/Year2/Summer2020/Research/data/Pando/11-FriendsSelectedClones/proba_hets_clone_25.txt

Page updated

Google Sites

Report abuse