Post date: Jun 13, 2017 9:13:10 PM
12-13.vi.17. ZG made plotPips.pl to made a plot of all SNP pips per trait per group. I went through all plots on my computer and wrote down which SNPs stood out as having high SNPs across at least two traits within each group. We found the following 9 scaffolds of interest (sequences are in wing_scaffolds.fasta):
203
1566
48
157
92
1636
443
5076
165
Uploaded wing_scaffolds.fasta to http://www.butterflygenome.org/?q=node/5.
Blasting against:
B. anynana (squinting bush brown)
D. plexippus (monarch)
H. erato demophoon (red postman)
H. erato lativitta (red postman)
H. melpomene (postman)
J. coenia (common buckeye)
V. cardui (painted lady)
**What H. melpomene scaffold has cortext? H. melpomene fzy was identified as part of the annotated gene HMEL017486 on scaffold HE671623 (Hmel v.1.1) based on alignment of D. plexippus fzy;
1. blasted 203 against H. melpomene only. Best match:
469 Hmel215031 length=129763 [H._melpomene_v2.0] 1280.62 1.07 Ă— 10-78 129763
2. blasted 48 against H. melpomene only. Best match:
601 Hmel221009 length=690739 [H._melpomene_v2.0] 24008.97 0.00 690739
Hmm...
There's also flybase (has lots of insects, monarch and silkworm are only leps).
And lepbase.org. Which is maybe the best. It's fast. With lepbase:
scaf48 maps to (evalue = 0):
1. cce3034.2.mRNA (Calycopis cecrops; red-banded hairstreak)
2. Nuclear hormone receptor FTZ-F1 OS=Bombyx mori GN=FTZ-F1 PE=1 SV=2 [Source:UniProtKB/TrEMBL;Acc:P49867]
--------
14vi17. I ended up blasting all nine scaffolds with lepbase (butterfly and moth CDS databases only). See "anotated-genes-CDSdatabases.xlsx" for the results. Here's a summary:
scaffolds 203, 443 and 5076 either didn't hit anything or hit other lep scaffolds that lacked info.
scaffold1566 (chromosome 20) is responsible for size of some central distal spots in AN. It sorta maps to Choline O-acetyltransferase (Drosophila melanogaster; involved in neurotransmitters ACh and acetyl CoA).
>SNP at position 17518 is 12518 bp away from Choline O-acetyltransferase.
This is the coolest one. scaffold 48 (chromosome Z) is responsible for aurorae size (all) in ME. It maps to Nuclear hormone receptor FTZ-F1 (Bombyx mori) and plays a role in development (pupa to adult).
scaffold 48 is also responsible for USP6 N-terminal-like protein (Mus musculus; receptor traficking, golgi aparatus).
>SNP at position 166171 is 16171 bp away from FTZ-F1.
>SNP at position 166171 is 108829 bp away from USP6 N-terminal-like protein.
scaffold 157 (chromosome 2) is responsible for size of some central distal spots in ME. It maps to probable muscarinic acetylcholine receptor gar-2 (C. elegans; acetycholine receptor).
>SNP at position 20350 is 156348 bp away from probable muscarinic acetylcholine receptor gar-2
scaffold 92 (chromosome Z) is responsible for size of some central distal spots in ME. It sorta maps to: 1) Sterol regulatory element-binding protein cleavage-activating protein (Human; escort protein required for cholestrol), 2) Triple function domain protein (Humans: actin remodeling required for cell movement), 3) kalirin (human: exchange of GDP by GTP, affects actin cytoskeleton).
>SNP at position 104206 is about 155794 bp away from Sterol regulatory element-binding protein cleavage-activating protein
>SNP at position 104206 is about 25794 bp away from Triple function domain protein
**>SNP at position 104206 is within kalirin
scaffold 1636 (chromosome ?) is responsible for some central spot sizes in MW. It maps to Mediator of RNA polymerase II transcription subunit 24 (Anopheles gambiae; regulated transcriptions of nearly all RNA polymerase II-dependent genes)
>SNP at position 44024 is 6524 bp away from Mediator of RNA polymerase II transcription subunit 24
scaffold 165 (chromosome Z) is responsible for a couple x coordinate positions in WA. It maps to Otoferlin (Human and Mus; calcium ion sensor, neurotransmitter release, cochlear inner hair cells specifically)
**>SNP at position 182769 is within Otoferlin
---
Next steps:
0. How far are my SNP positions from the part that blasted? (LL: for any I'm going to talk about)
1. Check the Lycaeides annotated genome. (cortex, optix not in our genome, we found one "cell division" gene but perhaps not the one that comes up in lepbase when you search for cortex)
2. See where these scaffolds show up at high pips across all traits and groups. (ZG going to match up SNPs across group pip files so can compare across, reformat spreadsheet)
3. Check wing pattern papers for these specific genes?
4. get sequence of cortex and optix and see if it aligns with out genome (LL: get FASTA to ZG)
5. we are going to make a plot of number of trait associated snps by linkage group size to see the patterns across groups -- whether traits are mapping disproportionately to different parts of the genome across groups (ZG will make these plots)
----
ZG: There are various options you can play with with blast. I kept mostly defaults for now, but we could dig in more later (probably not as important here as you are blasting butterfly vs. butterfly).
Here is what I did:
## load blast module
module load blast
## make database from L. melissa genome (in ~/data/lycaeides/melissa_genome/)
makeblastdb -dbtype nucl -in Lmelissa1FinalAssembly.fasta
## run blastn search (based on your file, re-saved as fasta)
blastn -query wingpattern_genes.fasta -db Lmelissa1FinalAssembly.fasta -task blastn > wing_matches.txt
It looks like optix is on scaffold 625, chromosome 7.
Cortex has more but shorter hits. Could be similar domains but not the actual gene. The best match is scaffold 275 (chromosome 6).
See last page (pg15) of genome-stabilization.pdf for which scaffolds are on which linkage groups.