Post date: May 30, 2016 4:19:16 AM
Given my recent success (see previous post), I am trying Lep-MAP2 again with multiple families. I am including all of the families with two known parents, and two of the families (272 and 200) with only the mom known for sure. I will assess sensitivity to including theses (200 in particular).
Here is what I did:
1. Generate the linkage format input file (not offspring sex is not used for this):
perl ../scripts/makeMultiFamLinkage.pl ../variants/sub_offspringGenotypes_filtered2x_fam123.txt ../variants/sub_offspringGenotypes_filtered2x_fam143.txt ../variants/sub_offspringGenotypes_filtered2x_fam297.txt ../variants/sub_offspringGenotypes_filtered2x_fam309.txt ../variants/sub_offspringGenotypes_filtered2x_fam272.txt ../variants/sub_offspringGenotypes_filtered2x_fam200.txt
This generates lm_lyc.txt, which includes six families,
132 individuals (120 offspring, 12 parents) and 24,643 SNPs.
2. Filter the data (note that the 10+ missing will only affect some families)
java -cp ~/source/lepmap2/bin/ Filtering data=lm_lyc.txt epsilon=0.01 dataTolerance=0.01 missingLimit=10 keepAlleles=1 > data_lyclm.txt
***
Mendel error statistics:
0->0 : 0
0->1 : 0
0->2 : 0
Error rate = 0.0
1->0 : 0
1->1 : 0
1->2 : 0
Error rate = 0.0
2->0 : 0
2->1 : 0
2->2 : 0
Error rate = 0.0
0,2->0,2 : 881252
0,2->1 : 0
1,2->1,2 : 0
1,2->0 : 0
Error rate = 0.0
***
Filtering markers based on segregation distortion
Removed 24267 markers from family 123
( from which 24196 due to missing parent genotypes, run ParentCall to fix the parents )
Removed 13133 markers from family 143
( from which 12553 due to missing parent genotypes, run ParentCall to fix the parents )
Removed 11378 markers from family 200
( from which 10417 due to missing parent genotypes, run ParentCall to fix the parents )
Removed 23100 markers from family 272
( from which 22726 due to missing parent genotypes, run ParentCall to fix the parents )
Removed 24532 markers from family 297
( from which 24525 due to missing parent genotypes, run ParentCall to fix the parents )
Removed 24498 markers from family 309
( from which 24488 due to missing parent genotypes, run ParentCall to fix the parents )
Maternally informative markers = 12674 (of 24643)
Paternally informative markers = 13099
Maternally or paternally informative markers = 23088
***
Removed 71878 markers
3. Constructing linkage groups. Here I only used female informative markers. I first evaluated alternative LOD support values (see below) and then went with LOD 7.
java -cp ~/source/lepmap2/bin/ SeparateChromosomes data=data_lyclm.txt lodLimit=7 femalePrior=0.05 informativeMask=2 sizeLimit=100 > femLgs_lyclm.txt
Number of different markers = 4136
Using LOD score 7.0
Computing pair-wise LOD scores 1.2.3.4.5.6.7.8.9.10 all done
Number of LGs = 22, markers in LGs = 5889, singles = 18754
4. Add unassigned SNPs
java -cp ~/source/lepmap2/bin/ JoinSingles femLgs_lyclm.txt data=data_lyclm.txt lodLimit=6 lodDifference=2 > femLgsJs_lyclm.txt
joined 1288 single markers, non-conflicting LOD score > 6.977937445473155
Number of LGs = 22, markers in LGs = 10639, singles = 14004
5. Assign (or unassign) entire scaffolds (including male informative markers) to LGs.
Combine map and scaffold data
map<-read.table("femLgsJs_lyclm.txt",header=FALSE)
snps<-read.table("combinedSnpList.txt",header=FALSE)
o<-cbind(map,snps)
colnames(o)<-c("flg","scaf","pos")
write.table(o,"femaleLgs.txt",row.names=FALSE,col.names=TRUE,quote=FALSE)
Run perl script
## key params ##
## decide on LG for scaf
## rules:
## $nsnp SNPs supporting LG with $prct of SNPs in agreement
## $min SNPs total
## $alt is maximum % for an alternative LG
$prct = 0.2;
$nsnp = 3;
$min = $nsnp;
$second = 2;
#############
perl addMalesFilter.pl femaleLgs.txt
assigned LGs for 1490 scaffolds, failed to assign LGs for 7708 scaffolds
55.5% of SNPs on LGs, not including sex chromosomes
6. Order markers, uses male recombination only
tail -n +2 mod_femaleLgs.txt | cut -f 1 -d " " > female_map.txt
java -cp ~/source/lepmap2/bin/ OrderMarkers map=female_map.txt data=data_lyclm.txt minError=0.01 numThreads=12 informativeMask=1 initRecombination=0.05 0.000000001 learnRecombinationParameters=1 0
Next I obtained median positions (in cM) for each scaffold, only counting SNPs with error estimates < 0.1.
perl calcScafOrder.pl combinedSnpList.txt orderNonSex_lyclm.txt
These were sorted by LG (but not cM or scaffold #)
sort -g lyc_scafmap.txt > ordered_lyc_scafmap.txt
This includes 1221 (non-sex linked) scaffolds.
7. Adding the sex chromosome (Z)
7a. first I identified scaffolds where male/female depth > 1.5
x<-read.table("../melmap/sc_fmdepth.txt",header=F)
a<-which(x[,5]/x[,4] > 1.5)
plot(x[,4],x[,5])
points(x[a,4],x[a,5],col="orange",pch=19)
sex<-x[a,3]
write.table(sex,"putSexScaf.txt",row.names=F,col.names=F,quote=F)
7b. Made a new input file; I am using only families 123 and 143 for sex chromosomes
perl ../scripts/makeSexMultiFamLinkage.pl ../variants/sub_offspringGenotypes_filtered2x_fam1[24]3*txt
530 SNPs in combinedSnpListSex.txt
7c. Filter the data (note that the 4+ missing)
java -cp ~/source/lepmap2/bin/ Filtering data=lm_lycSex.txt epsilon=0.01 dataTolerance=0.01 missingLimit=4 keepAlleles=1 > data_lyclmSex.txt
Filtering markers based on segregation distortion
Removed 520 markers from family 123
( from which 520 due to missing parent genotypes, run ParentCall to fix the parents )
Removed 19 markers from family 143
( from which 6 due to missing parent genotypes, run ParentCall to fix the parents )
Maternally informative markers = 30 (of 530)
Paternally informative markers = 487
Maternally or paternally informative markers = 517
***
Removed 539 markers
7d. Create LGs, only paternally informative markers are meaningful (real)
java -cp ~/source/lepmap2/bin/ SeparateChromosomes data=data_lyclmSex.txt lodLimit=1.5 malePrior=0.05 informativeMask=1 sizeLimit=5 > sexLgs_lyclm.txt
***
Mendel error statistics:
0->0 : 0
0->1 : 0
0->2 : 0
Error rate = 0.0
1->0 : 0
1->1 : 0
1->2 : 0
Error rate = 0.0
2->0 : 0
2->1 : 0
2->2 : 0
Error rate = 0.0
0,2->0,2 : 6222
0,2->1 : 0
1,2->1,2 : 0
1,2->0 : 0
Error rate = 0.0
***
Filtering markers based on segregation distortion
Removed 0 markers from family 123
Removed 0 markers from family 143
Maternally informative markers = 30 (of 530)
Paternally informative markers = 487
Maternally or paternally informative markers = 517
***
LOD(12, 0) = 3.1856403883267372
LOD(11, 0) = 2.8945664394040045
LOD(11, 1) = 1.5509305371562556
LOD(10, 2) = 0.06817772743076445
LOD(12, 0) = 3.1856403883267372
LOD(11, 0) = 2.8945664394040045
LOD(11, 1) = 1.5509305371562556
LOD(10, 2) = 0.06817772743076445
Family 123
Family 143
Number of different markers = 64
Using LOD score 1.5
Computing pair-wise LOD scores 1.2.3.4.5.6.7.8.9.10 all done
number of LGs = 8 singles = 460
Removing LG 2 (as too small) with size 4
Removing LG 3 (as too small) with size 2
Removing LG 4 (as too small) with size 2
Removing LG 5 (as too small) with size 2
Removing LG 6 (as too small) with size 2
Removing LG 7 (as too small) with size 2
Removing LG 8 (as too small) with size 2
Number of LGs = 1, markers in LGs = 458, singles = 72
Join singles did nothing
7e. Assign or unassign scaffolds to LGs
map<-read.table("sexLgs_lyclm.txt",header=FALSE)
snps<-read.table("combinedSnpListSex.txt",header=FALSE)
o<-cbind(map,snps)
colnames(o)<-c("flg","scaf","pos")
write.table(o,"sexLgs.txt",row.names=FALSE,col.names=TRUE,quote=FALSE)
perl assignScafFilter.pl sexLgs.txt
Assigned LGs for 37 scaffolds, failed to assign LGs for 34 scaffolds
this includes 90.5% of snps (480 assigned, 50 no)
7f. Order markers, uses male recombination only
tail -n +2 mod_sexLgs.txt | cut -f 1 -d " " | perl -p -i -e 's/1/23/' > sex_map.txt
java -cp ~/source/lepmap2/bin/ OrderMarkers map=sex_map.txt data=data_lyclmSex.txt minError=0.01 numThreads=4 informativeMask=1 initRecombination=0.05 0.000000001 learnRecombinationParameters=1 0 > orderSex_lyclm.txt
Next I obtained median positions (in cM) for each scaffold, only counting SNPs with error estimates < 0.1.
perl calcSexOrder.pl combinedSnpListSex.txt orderSex_lyclm.txt
These were sorted by LG (but not cM or scaffold #)
sort -g lyc_sexmap.txt > ordered_lyc_sexmap.txt
This includes 37 (sex linked) scaffolds.
8. Combine the maps
cat ordered_lyc_scafmap.txt ordered_lyc_sexmap.txt > orderedLinkageMap.txt
orderedLinkageMap.txt is the final map with 1259 scaffolds on 22 autosomes and one sex chromsome.