Post date: Jun 10, 2014 2:55:34 PM
Here are genome-level annotations summaries:
N (missing data) = 56.06%
Non-N = 43.94%
genic = 16.12% (13,158 genes)
coding = 3.53% (62,064 CDS)
non-coding = 12.58%
2094 UTR's
62,157 exons
intergenic = 27.82%
Note, these do not sum to one, but that is because genic = coding + non-coding.
Median scaffold size: 9651 bp (2474 excluding N's).
718 scaffolds > 100 kb, 2 > 1 Mb.
Total genome length = 360,254,725 bp (158,297,813 without N's).
This information comes from/labs/evolution/data/lycaeides/melissa_genome/Annotation/genomeAnnotation.txt.
Enrichment tests--
We have four key structural annotations: genic, coding sequence, UTR, and repeat region. Along these lines, I asked whether the SNPs with the largest absolute model-averaged effects were over-represented in any of these categories. I considered two top quantiles 99.9th (about 80 loci) and 99.95th (about 40 loci) as these give roughly the number of SNPs that could have non-zero effects on the traits (based on no. snp estimates, at least roughly). I used binomial probability distributions, with p = proportion of snp's in a category, to test for enrichment.
Here are the instances with significant enrichment:
Survival, plant x population treatments: None
Wgt, plant x population treatments:
q 99.9, repeat, GLA x Ac, Obs = 4, Expected = 1.67, p = 0.0265
q 99.95, repeat, GLA x Ac, Obs = 4, Expected = 0.85, p = 0.0015
q 99.95, repeat, SLA x Ms, Obs = 2, Expected = 0.80, p = 0.0452
Survival, combined
q 99.9, repeat, Ac, Obs = 5, Expected = 1.72, p = 0.008
q 99.9, repeat, all, Obs = 4, Expected = 1.71, p = 0.029
q 99.9, repeat, SLA, Obs = 5, Expected = 1.65, p = 0.006
q 99.95, repeat, Ac, Obs = 4, Expected = 0.86, p = 0.002
Wgt, combined
q 99.9, repeat, Ms, obs = 4, Expected = 1.69, p = 0.027
q 99.95, repeat, Ms, obs = 3, Expected = 0.844, p = 0.0099
q 99.95, repeat, SLA, obs = 2, Expected = 0.829, p = 0.048
q 99.95, repeat, all, obs = 2, Expected = 0.82, p = 0.048
Thus, all of the significant enrichments are for repeat elements!