Post date: May 03, 2019 11:1:27 PM
I wanted to re-estimate allele frequencies for Z-linked SNPs (scaffold 1631) accounting for the fact that females are heterogametic and thus have only one copy of the Z chromosome. Here is what I did.
1. First, I added the sex information to the gl files. This was done from within the Variants sub-directory (/uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_diversity/Variants/). Sex (1 = female = ZW; 2 = male = ZZ) was added as the third row for each individual gl file.
perl addSex.pl *gl
This generates the Z*gl files, which include sex and only include the Z-linked SNPs: 2914 (full) or 1248 (sub) SNPs total.
2. I then ran a modified version of estpEM that incorporates this sex information. In particular, for females, the data are modeled based only on the homozygous genotype likelihoods (this is mathematically equivalent to a haploid model) and the individuals are assumed to contribute a single allele. I ran this in the AlleleFreqs subdirectory).
RunEstpEMsexFork.pl
#!/usr/bin/perl
#
# run estpEM as an interactive job
#
use Parallel::ForkManager;
my $max = 20;
my $pm = Parallel::ForkManager->new($max);
FILES:
foreach $file (@ARGV){
$pm->start and next FILES; ## fork
$file =~ m/Z_([A-Z]+\-\d+)\S+_([A-Z]+_[a-z]+)\.gl/ or die "failed to match $file\n";
$out = "p_Z_$1"."_$2".".txt";
system "~/bin/estpEMsex -i $file -o $out -m 50 -e 0.001 -h 2\n";
$pm->finish;
}
$pm->wait_all_children;
This generates the p_Z* files. I generated the p_auto* files by simply sub-setting the original p_* files (Z and auto).
3. I then inferred Ne from only the autosomes and from only the Z chromosome.
Autosomes
Generate infiles for varne for each data subset
perl mkNeAutoInfiles.pl SAM_sub
perl mkNeAutoInfiles.pl GATK_sub
perl mkNeAutoInfiles.pl GATK_full
Run varne
perl runNeEstAuto.pl in_auto_*13* 2> NeAutoLog_13-17.txt
which runs
#!/usr/bin/perl
#
# then subset of files from a given set
foreach $in13 (@ARGV){
if ($in13 =~ m/full/){
$L = 27119;
}
else{
$L = 11638;
}
$in13 =~ m/in_auto_([A-Z]+)\-\d+_([A-Z]+_[a-z]+)/;
$pop = $1;
$set = $2;
$in17 = $in13;
$in17 =~ s/13/17/ or die "failed sub $in13\n";
system "varne -a $in13 -b $in17 -l $L -t 4 -n 3000 -x 1000 > Ne_auto_"."$set"."_"."$pop"."_13-17.txt\n";
}
Extract the point estimates and ETPIS
perl parseNeLogAuto.pl | perl -p -i -e 's/auto\s+//' | perl -p -i -e 's/_/ /' > SummaryNeAuto.txt
Z chromosome
Generate infiles for varne for each data subset
perl mkNeZInfiles.pl SAM_sub
perl mkNeZInfiles.pl GATK_sub
perl mkNeZInfiles.pl GATK_full
Run varne
perl runNeEstZ.pl in_Z*13* 2> NeZLog_13-17.txt
which runs
foreach $in13 (@ARGV){
if ($in13 =~ m/full/){
$L = 2914;
}
else{
$L = 1248;
}
$in13 =~ m/in_Z_([A-Z]+)\-\d+_([A-Z]+_[a-z]+)/;
$pop = $1;
$set = $2;
$in17 = $in13;
$in17 =~ s/13/17/ or die "failed sub $in13\n";
system "varne -a $in13 -b $in17 -l $L -t 4 -n 3000 -x 1000 > Ne_Z_"."$set"."_"."$pop"."_13-17.txt\n";
}
Extract the point estimates and ETPIS
perl parseNeLogZ.pl | perl -p -i -e 's/Z\s+//' | perl -p -i -e 's/_/ /' > SummaryNeZ.txt
4. Here are the 3 estimates of Ne for the SAM subset (on average Z is lower, but this isn't always true):
grep SAM SummaryNeZ.txt
BCR SAM_sub 168.623 111.873 279.388
BNP SAM_sub 181.974 125.615 291.715
BTB SAM_sub 306.884 176.515 699.778
GNP SAM_sub 118.957 86.2417 176.109
HNV SAM_sub 116.373 89.1201 157.58
MRF SAM_sub 221.379 118.389 645.336
PSP SAM_sub 129.183 82.1922 196.977
RNV SAM_sub 108.805 76.1253 170.705
SKI SAM_sub 171.978 103.42 329.063
USL SAM_sub 183.232 112.702 307.375
grep SAM SummaryNeAuto.txt
BCR SAM_sub 190.548 164.069 221.699
BNP SAM_sub 179.745 150.175 209.345
BTB SAM_sub 287.764 244.615 347.128
GNP SAM_sub 142.603 115.775 168.352
HNV SAM_sub 186.455 158.321 217.029
MRF SAM_sub 118.599 100.17 139.655
PSP SAM_sub 130.726 114.563 147.998
RNV SAM_sub 106.277 95.4288 118.915
SKI SAM_sub 279.767 232.714 334.865
USL SAM_sub 243.709 208.198 289.62
grep SAM SummaryNe.txt
BCR SAM_sub 178.035 154.757 203.162
BNP SAM_sub 175.807 151.495 200.524
BTB SAM_sub 271.039 235.333 321.271
GNP SAM_sub 134.984 112.01 155.352
HNV SAM_sub 169.78 146.804 194.355
MRF SAM_sub 115.897 99.8245 134.279
PSP SAM_sub 127.769 114.131 142.939
RNV SAM_sub 100.46 90.7566 110.832
SKI SAM_sub 241.858 204.398 287.181
USL SAM_sub 227.146 196.436 266.815