Post date: Sep 04, 2017 9:37:58 PM
Here are my notes on preparing data for F4 tests:
1. Create genotype (geno) files. I used the genotype point estimates for this (computed from genotype likelihood and population allele frequencies). Sex chromosomes pair individuals. Genotypes were rounded to the nearest integer, unless then were > 0.1 from an integer, in which case they were set to 9 (missing data). These are used to calculate genotype frequencies.
perl mkGeno.pl pntest_lychybAutos_*
makes lycHybAnc.geno
perl mkGeno.pl pntest_lychybSex_*
makes lycHybAnc.geno
2. Next I made geno files for each autosome.
get LG info
tail -n +2 pntest_lychybAutos_BHP.txt | cut -f 1 -d ":" > autoLgs.txt
perl splitLg.pl
3. Prepare SNP files
get SNP info
tail -n +2 pntest_lychybAutos_KHL.txt | cut -f 1 -d " " > autoSnpIds.txt
format is: snpID LG CMpos bppos allele1 allele2
as far as I can tell only LG matters, that and CMpos are real, the rest is made up
perl makeSnpFile.pl
4. Make individual id and group files
first extract the number of individual per population for autosomes and the Z chromsome and print these to files:
perl mkIndList.pl pntest_lychybSex_*txt
makes indListSex.txt
perl mkIndList.pl pntest_lychybAutos_*txt
makes indListAuto.txt
Then I generated the codes1.txt file that groups populations. Most were given their own unique ID, but anna and melissa populations (two flavors) were grouped to serve as parents.
This was used to generate the actual ind files
perl mkIndFile.pl codes1.txt indListAuto.txt
makes lycHybAncVer1.ind
perl mkIndFile.pl codes1.txt indListSex.txt
makes lycHybAncSexVer1.ind