Post date: Apr 10, 2014 4:12:32 PM
SNAP fits an HMM to identify different gene annotation state, e.g. exons, splice sites, etc. This in one of the programs recommended to generate initial gene predictions that will serve as a starting point for maker. This is the program that Victor used for Timema. It requires two input files: genome.ann and genome.dna.
genome.ann is in a non-standard format (zff) that is similar to gff. To generate this file I converted the gff file with putative exons from cegma (output.cegma.local.gff) to genome.ann with a perl script I wrote gff2zff.pl. The second fiile, genome.dna is a fasta file with the gene sequences and flanking regions. This was simply renamed and came from cegma as well.
Here are the commands I used to fit the HMM (all called from data/lycaeides/melissa_genome/Annotation/snap/):
../maker/exe/snap/fathom -categorize 1000 genome.ann genome.dna
../maker/exe/snap/fathom -export 1000 -plus uni.ann uni.dna
../maker/exe/snap/forge export.ann export.dna
perl ../maker/exe/snap/hmm-assembler.pl melissa . > melissa.hmm
The main outfile is melissa.hmm.