Using HMMER to identify ncRNAs homologs
- Go to http://hmmer.janelia.org/
- Download The latest release of Infernal "hmmer.tar.gz"
- Open a terminal window
- Unpack the source code:
- tar zxvf hmmer-3.0.tar.gz
- Change into the INFERNAL directory:
- Configure the installation:
- ./configure --bindir=$HOME/bin
- Build the programs:
- Install the programs:
- make install
- The programs we need are now installed in your "~/bin/" directory.
Get an alignment that we can use
Now we are ready to build a model!
- If you have RALEE and emacs installed, you can view your sequence alignment with pretty colour markup.
- First rename your alignment something like:
- mv RF00038_seed.txt RF00038_seed.stk
- then:
- Use the menus to colour the alignment by sequence conservation and then by structure.
- Can you improve the alignment?
- Quit from emacs.
- Otherwise, you might want to have a quick look at the alignment in your favourite text editor.
- Build a covariance model:
- hmmbuild -h
- hmmbuild RF00038_seed.hmm RF00038_seed.stk
- Take a look at the contents of the covariance model file if you want to:
- less RF00038_seed.hmm
- (Hit "q" to quit from less)
Search a database of sequences
- Download a potentially interesting fasta sequence file from here. Save the file with the name database.fa or similar.
- Search the database using our model file:
- hmmsearch -h
- hmmsearch --domtblout RF00038_seed.tabfile -o RF00038_seed.hmmsearch RF00038_seed.hmm database.fa
While you're waiting, visit http://rfam.sanger.ac.uk/. Search or browse for the PrfA family. Have a look around at what is available.
- Take a look at the results of the cmsearch:
- less RF00038_seed.hmmsearch
- The raw data is perhaps not so informative if there are multiple hits to several sequences. You can view a tabular format:
- less RF00038_seed.tabfile
Align the sequences with good scores
- Collect the sequences with scores greater than your threshold:
- esl-sfetch --index database.fa
- esl-sfetch --tabfile --Tmin 121 -Cf database.fa RF00038_seed.tabfile > new.fa
- Align the sequences to the model:
- hmmalign -h
- hmmalign -o new.stk RF00038_seed.hmm new.fa
- Take a look at the alignment:
- emacs new.stk
- (First choose "Unblock alignment" in the "edit" menu to remove the alignment blocks)
We have built a covariance model, used it to identify putative PrfA elements and therefore a putative virulance gene, and aligned the PrfA element sequences back to the covariance model.