HOWTO HMMER

Using HMMER to identify ncRNAs homologs

Install HMMER

Go to http://hmmer.janelia.org/
Download The latest release of Infernal "hmmer.tar.gz"
Open a terminal window
Unpack the source code:
- tar zxvf hmmer-3.0.tar.gz
Change into the INFERNAL directory:
- cd hmmer-3.0
Configure the installation:
- ./configure --bindir=$HOME/bin
Build the programs:
- make
Install the programs:
- make install
- The programs we need are now installed in your "~/bin/" directory.

Get an alignment that we can use

Download an alignment from Rfam, for example PrfA thermoregulator UTR or choose your own by browsing Rfam.

Now we are ready to build a model!

Build a covariance model

If you have RALEE and emacs installed, you can view your sequence alignment with pretty colour markup.
Otherwise, you might want to have a quick look at the alignment in your favourite text editor.
Build a covariance model:
- hmmbuild -h
- hmmbuild RF00038_seed.hmm RF00038_seed.stk
- Take a look at the contents of the covariance model file if you want to:
- less RF00038_seed.hmm
- (Hit "q" to quit from less)

Search a database of sequences

Download a potentially interesting fasta sequence file from here. Save the file with the name database.fa or similar.
Search the database using our model file:
- hmmsearch -h
- hmmsearch --domtblout RF00038_seed.tabfile -o RF00038_seed.hmmsearch RF00038_seed.hmm database.fa

While you're waiting, visit http://rfam.sanger.ac.uk/. Search or browse for the PrfA family. Have a look around at what is available.

cmsearch results

Take a look at the results of the cmsearch:
- less RF00038_seed.hmmsearch
- The raw data is perhaps not so informative if there are multiple hits to several sequences. You can view a tabular format:
- less RF00038_seed.tabfile

Align the sequences with good scores

Collect the sequences with scores greater than your threshold:
- esl-sfetch --index database.fa
- esl-sfetch --tabfile --Tmin 121 -Cf database.fa RF00038_seed.tabfile > new.fa
Align the sequences to the model:
- hmmalign -h
- hmmalign -o new.stk RF00038_seed.hmm new.fa
Take a look at the alignment:
- emacs new.stk
- (First choose "Unblock alignment" in the "edit" menu to remove the alignment blocks)

Et voila!

We have built a covariance model, used it to identify putative PrfA elements and therefore a putative virulance gene, and aligned the PrfA element sequences back to the covariance model.

Page updated

Google Sites

Report abuse