Computational Biology Tools and Databases

Download

DATABASES

Microbial genome databases  http://www.ncbi.nlm.nih.gov:80/PMGifs/Genomes/micr.html
Protein Information Resource  http://www-nbrf.georgetown.edu/pir/genome.html
Comparative genome analysis in P. Bork laboratory http://www.bork.embl-heidelberg.de/Genome/
TIGR: The Comprehensive Microbial Resource Home Page—the omniome  http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl
Genome databases other than NCBI http://www.unl.edu/stc-95/ResTools/biotools/biotools10.html
Genome list at NIH  http://molbio.info.nih.gov/molbio/db.html
Mitochondrial DNA Database MitBASE  http://www3.ebi.ac.uk/Research/Mitbase/mitbase.pl
E. coli genome project  http://www.genome.wisc.edu/
E. coli genome and proteome database GenProtEC  http://genprotec.mbl.edu/
E. coli index  http://web.bham.ac.uk/bcm4ght6/res.html
Organelle genome sequences  http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/organelles.html
Parasite genome databases and genome research resources  http://www.ebi.ac.uk/parasites/parasite-genome.html
Retroviral genotyping and analysis site  http://www.ncbi.nlm.nih.gov/retroviruses/
GenBank at the National Center of Biotechnology Information, National Library of Medicine, Washington, DC accessible from:  http://www.ncbi.nlm.nih.gov/Entrez
European Molecular Biology Laboratory (EMBL) Outstation at Hixton, England  http://www.ebi.ac.uk/embl/index.html
DNA DataBank of Japan (DDBJ) at Mishima, Japan  http://www.ddbj.nig.ac.jp/
Protein International Resource (PIR) database at the National Biomedical Research Foundation in Washington, DC  http://www-nbrf.georgetown.edu/pirwww/
The SwissProt protein sequence database at ISREC, Swiss Institute for Experimental Cancer Research in Epalinges/Lausanne  http://www.expasy.ch/cgi-bin/sprot-search-de
The Sequence Retrieval System (SRS) at the European Bioinformatics Institute allows both simple and complex concurrent searches of one or more sequence databases. The SRS system may also be used on a local machine to assist in the preparation of local sequence databases. http://srs6.ebi.ac.uk
Protein data bank (PDB) at the State University of New Jersey (Rutgers)a atomic coordinates of structures as PDB files, models, viewers, links to many other Web sites for structural analysis and classification  http://www.rcsb.org/pdb
COG (cluster of orthologous groups): http://www.ncbi.nlm.nih.gov/COG/
DOGS: Database of genome sizes  http://www.cbs.dtu.dk/databases/DOGS/index.html
allgenes.org: A comprehensive gene index (catalog) derived from ESTs and predicted genes  http://www.allgenes.org/
GeneCensus Genome Comparisons by encoded protein structures  http://bioinfo.mbb.yale.edu/genome/
GeneQuiz: An integrated system for large-scale biological sequence analysis and data management (Andrade et al. 1999; Hoersch et al. 2000)  http://jura.ebi.ac.uk:8765/ext-genequiz/
Genes and disease: Map location on human chromosomes  http://www.ncbi.nlm.nih.gov/disease/
Genome channel at Oak Ridge National Laboratories  http://compbio.ornl.gov/channel/
GOLD™: Genomes OnLine Database (Kyrpides 1999)  http://wit.integratedgenomics.com/GOLD/
IMGT ImMunoGeneTics Database specializing in Immunoglobulins, T-cell receptors, and Major Histocompatibility Complex (MHC) of all vertebrate species  http://www.ebi.ac.uk/imgt/index.html
KEGG: Kyoto Encyclopedia of Genes and Genomes (Kanehisa and Goto 2000)  http://www.genome.ad.jp/kegg/
MIA Molecular Information Agent: A Web server that searches biological databases for information on a macromolecule  http://mia.sdsc.edu/
Orthologous gene alignments at TIGR  http://www.tigr.org/tdb/toga/toga.shtml
PEDANT: A protein extraction, description, and analysis tool  http://pedant.mips.biochem.mpg.de/
STRING Search Tool for Recurring Instances of Neighboring Genes http://www.Bork.EMBL-Heidelberg.DE/STRING/
Taxonomy browser at the NCBI arranges genomes taxonomically for sequence retrieval  http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/
UniGene System gene-oriented clusters of GenBank sequences useful for gene identification  http://www.ncbi.nlm.nih.gov/UniGene/
2D gel analysis of proteins: List of organisms  http://www.expasy.ch/ch2d/2d-index.html
AlignAce for promoter analysis of coordinately regulated genes, e.g., microarrays by Gibbs sampling (Roth et al. 1998; Hughes et al. 2000; McGuire et al. 2000)  http://atlas.med.harvard.edu/download/
ArrayExpress database at European Bioinformatics Institute for microarray analysis  http://www.ebi.ac.uk/arrayexpress/
BRITE: Database of protein-protein interactions and cross-reference links http://www.genome.ad.jp/brite/brite.html
Ecocyc electronic encyclopedia of genes and metabolism of E. coli (Karp et al. 2000)  http://ecocyc.PangeaSystems.com/ecocyc/
Expression Profiler tools for analysis and clustering of gene expression and sequence data  http://ep.ebi.ac.uk/
Functional genomics sites http://www.ornl.gov/hgmis/publicat/hgn/hgnarch.html#fg  http://www.ornl.gov/hgmis/publicat/hgn/hgnarch.html
GeneCensus Genome Comparisons by encoded protein structures  http://bioinfo.mbb.yale.edu/genome/
GENECLUSTER; Tamayo et al. (1999)  http://www.genome.wi.mit.edu/MPR/software.html
GeneX: A Collaborative Internet Database and Toolset for Gene Expression Data  http://www.ncgr.org/genex/
MetaCyc metabolic encyclopedia (see EcoCyc)  http://ecocyc.PangeaSystems.com/ecocyc/
Microarray guide: P. Brown lab  http://cmgm.stanford.edu/pbrown/
Microarray project at NIH  http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/
Microarray software  http://rana.lbl.gov/
microarrays.org http://www.microarrays.org/
SMART: For the study of genetically mobile protein domains (Schultz et al. 2000)  http://smart.embl-heidelberg.de/
SWISS-2DPAGE: Two-dimensional polyacrylamide gel electrophoresis database (Hoogland et al. 2000)  http://www.expasy.ch/ch2d/
TIGR: Annotation and gene indexing resources, including analysis of the transcribed sequences represented in the public EST databases.  http://www.tigr.org/tdb/tgi.shtml
WIT (What is there?): Interactive metabolic reconstruction on the Web (Overbeek et al. 2000)  http://wit.mcs.anl.gov/WIT2/
GFF (Gene-Finding Features): Specification for describing genes and other features of genomics  http://www.sanger.ac.uk/Software/GFF/
GO (gene ontology) controlled vocabulary  http://genome-www.stanford.edu/GO/
MAGPIE: Multipurpose Automated Genome Project Investigation Environment http://www.rockefeller.edu/labheads/gaasterland/gaasterland.html,http://genomes.rockefeller.edu/research.shtml#magpie
 http://magpie.genome.wisc.edu/tools.html
 http://genomes.rockefeller.edu/research.shtml
TAMBIS: A conceptual model of molecular biology and bioinformatics and methods for querying the model (Baker et al. 1999)  http://img.cs.man.ac.uk/tambis/
RDP: The Ribosomal Database Project (RDP) provides ribosome related data services to the scientific community, including online data analysis, rRNA derived phylogenetic trees, and aligned and annotated rRNA sequences http://rdp.cme.msu.edu/html/
"GO: dynamic controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing.
http://www.geneontology.org/index.shtml

 

Miscelleneous Tools For Bioinformatics Analysis On the WWW  

Pairwise Sequence Alignment  
Global alignment programs (GAP, NAP)  http://genome.cs.mtu.edu/align/align.html  Huang (1994)
BLAST 2 sequence alignment (BLASTN, BLASTP) http://www.ncbi.nlm.nih.gov/gorf/bl2.html  Altschul et al. (1990)
Bayes block aligner  http://www.wadsworth.org/res&res/bioinfo 
BCM Search Launcher: Pairwise sequence alignmenta  http://searchlauncher.bcm.tmc.edu/seq-search/alignment.html 
SIM—Local similarity program for finding alternative alignments  http://www.expasy.ch/tools/sim.html 
FASTA program suite http://fasta.bioch.virginia.edu/fasta/fasta_list.html  Pearson and Miller (1992); Pearson (1996)
Likelihood-weighted sequence alignment (lwa)c  http://stateslab.bioinformatics.med.umich.edu/service/lwa.html 

Multiple Sequence Alignment  
CLUSTALW or CLUSTALX (latter has graphical interface) FTP to ftp.ebi.ac.uk/pub/software  ftp://ftp.ebi.ac.uk/pub/software a,d Thompson et al. (1994a, 1997); Higgins et al. (1996)
MSA  http://www.psc.edu/, http://www.ibc.wustl.edu/ibc/msa.html, ftp://fastlink.nih.gov/pub/msa, cFTP to fastlink.nih.gov/pub/msa   Lipman et al. (1989);Gupta et al. (1995)
PRALINE http://mathbio.nimr.mrc.ac.uk/~jhering/praline/  http://mathbio.nimr.mrc.ac.uk/%7Ejhering/praline/  Heringa (1999)

DIALIGN segment alignment  http://www.gsf.de/biodv/dialign.html  Morgenstern et al. (1996)
MultAlin  http://protein.toulouse.inra.fr/multalin.html  Corpet (1988)
Parallel PRRN progressive global alignment  http://prrn.ims.u-tokyo.ac.jp/  Gotoh (1996)
SAGA genetic algorithm http://igs-server.cnrs-mrs.fr/~cnotred/Projects_home_page/saga_home_page.html  http://igs-server.cnrs-mrs.fr/%7Ecnotred/Projects_home_page/saga_home_page.html  Notredame and Higgins (1996)
 Protein Profile Generation Tools Based on MSA 
Aligned Segment Statistical Evaluation Tool (Asset) FTP to ncbi.nlm.nih.gov/pub/neuwald/asset  ftp://ncbi.nlm.nih.gov/pub/neuwald/asset  Neuwald and Green (1994)
BLOCKS Web site  http://blocks.fhcrc.org/blocks/   Henikoff and Henikoff (1991, 1992)
eMOTIF Web server  http://dna.Stanford.EDU/emotif/  Nevill-Manning et al. (1998)
GIBBS, the Gibbs sampler statistical method FTP to ncbi.nlm.nih.gov/pub/neuwald/gibbs9_95/  ftp://ncbi.nlm.nih.gov/pub/neuwald/gibbs9_95/  Lawrence et al. (1993); Liu et al. (1995); Neuwald et al. (1995)
HMMER hidden Markov model software  http://hmmer.wustl.edu/  Eddy (1998)
MACAW, a workbench for multiple alignment construction and analysis FTP to ncbi.nlm.nih.gov/pub/macaw/  ftp://ncbi.nlm.nih.gov/pub/macaw/  Schuler et al. (1991)
MEME Web site, expectation maximization method  http://meme.sdsc.edu/meme/website/  Bailey and Elkan (1995); Grundy et al. (1996, 1997); Bailey and Gribskov (1998)
Profile analysis at UCSDa,e  http://www.sdsc.edu/projects/profile/  Gribskov and Veretnik (1996)
SAM hidden Markov model Web site  http://www.cse.ucsc.edu/research/compbio/sam.html  Krogh et al. (1994); Hughey and Krogh (1996)

RNA Tools  
MFOLD minimum energy RNA configuration http://bioinfo.math.rpi.edu/~zukerm/rna/  http://bioinfo.math.rpi.edu/%7Ezukerm/rna/  Zuker et al. (1991)
RNA editing Web site, UCLA  http://www.lifesci.ucla.edu/RNA/index.html  Simpson et al. (1998)
RNA editing, uridine insertion/deletion  http://www.lifesci.ucla.edu/RNA/trypanosome/  Simpson et al. (1998)
RNA modification database  http://medlib.med.utah.edu/RNAmods/  Limbach et al. (1994); Rozenski et al. (1999)
RNA secondary structures, Group I introns, 16S rRNA, 23S rRNA  http://www.rna.icmb.utexas.edu  Gutell (1994); Schnare et al. (1996 and references therein)
tRNAscan-SE search server  http://www.genetics.wustl.edu/eddy/tRNAscan-SE/  Lowe and Eddy (1997)
Vienna RNA package for RNA secondary structure prediction and comparison http://www.tbi.univie.ac.at/~ivo/RNA/  http://www.tbi.univie.ac.at/%7Eivo/RNA/  Hofacker et al. (1998); Wuchty et al. (1999)

DATABASE SEARCHES (Sequence similarity search with query sequence protein sequence database (or genomic sequencesa) search for database sequence that can be aligned with query sequence single sequence, e.g.,DAHQSNGA)   

BLAST SUITE http://www.ncbi.nlm.nih.gov/BLAST/ 
FASTA SUITE http://fasta.bioch.virginia.edu/fasta/ 
WU-BLAST http://blast.wustl.edu/ 

PROFILESEARCH   ftp://ftp.sdsc.edu/pub/sdsc/biology Alignment search with profile (scoring matrixb,d with gap penalties) protein sequence database prepare profile from a multiple sequence alignment (Profilemake) and align profile with database sequence profile representing gapped multiple sequence alignment, e.g.,D-HQSNGA,ESHQ-YTM,EAHQSN-L EGVQSYSL

MAST http://meme.sdsc.edu/meme/website/mast.html Search with position-specific scoring matrixc,d (PSSM) representing ungapped sequence alignment (BLOCK) protein sequence database prepare PSSM from ungapped region of multiple sequence alignment or search for patterns of same length in unaligned sequences,c then use for database search PSSM representing ungapped alignment, e.g.,DAHQSN,ESHQSY,EAHQSN,EGVQSY

PSI- BLAST  http://www.ncbi.nlm.nih.gov/BLAST/ Iterative alignment search for similar sequences that starts with a query sequence, builds a gapped multiple alignment, and then uses the alignment to augment the searchd ses initial matches to query sequence to build a type of scoring matrix and searches for additional matches to the matrix by an iterative search methodd builds matches to query sequence, e.g.,DAHQSNGA,iteration 1H-SNGA EAHQSN-L -> further iterations.  PSI-BLAST finds a set of sequences related to each other by the presence of common patterns (not every sequence may have same patterns).

PROSITE  http://www.expasy.ch/prosite Search query sequence for patterns representative of protein familiese database of patterns found in protein families search for patterns represented by scoring matrix or hidden Markov model (profile HMM)e single sequence, e.g., DAHQSNGA
 INTERPRO  http://www.ebi.ac.uk/interpro 
 PFAM http://www.sanger.ac.uk/Pfam 
CDD/IMPALA  http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml 
BCM Search Launcher (with programming links to several servers)  http://searchlauncher.bcm.tmc.edu/seq-search/protein-search.html 
bic-swa Bic server European Bioinformatics Institute  http://www.ebi.ac.uk/bic_sw/ 
MPsearchb National Institute of Agrobiological Resources, Tsukuba, Japan  http://www.dna.affrc.go.jp/htbin/mp_PP.pl 

Scanps G.Barton, European Bioinformatics Institute  http://barton.ebi.ac.uk ;

SSEARCH E-mail server DNA Databank of Japan  http://www.ddbj.nig.ac.jp/E-mail/homology.html 

Swatc Phil Green, University of Washington  http://www.genome.washington.edu/UWGC/analysistools/swat.cfm 

Programs and Web sites for database similarity searches with a regular expression, motif, block, or profile  
 Regular Expression and Motifsa 
EMOTIF Scan SwissProt and Genpept  http://motif.stanford.edu/emotif/emotif-scan.html 
Prosite patterns SwissProt and TrEMBL  http://www.expasy.ch/tools/scnpsit2.html 
ISREC pattern-finding service SwissProt and non-redundant EMBL database  http://www.isrec.isb-sib.ch/software/PATFND_form.html 
fpat PDB SwissProt Genpept  http://stateslab.bioinformatics.med.umich.edu/service/fpat/ 
PHI-BLAST BLAST databases  http://www.ncbi.nlm.nih.gov/ 
MOTIF SwissProt, PDB, PIR, PRF, Genes  http://www.motif.genome.ad.jp/MOTIF2.html 
 BLOCKS 
BLOCKSb most databases  http://www.blocks.fhcrc.org/blockmkr/make_blocks.html 
MASTc most databases  http://meme.sdsc.edu/meme/website/ 
BLIMPSd locally available databases anonymous FTP ncbi.nlm.nih.gov/repository/blocks/unix/blimps  ftp://ncbi.nlm.nih.gov/repository/blocks/unix/blimps 
Probee BLAST databases anonymous FTP ncbi.nlm.nih.gov/pub/neuwald/probe1.0  ftp://ncbi.nlm.nih.gov/pub/neuwald/probe1.0 
Genefindf PIR  http://pir.georgetown.edu/gfserver 
 PROFILE Programs 
Profilesearchg locally available databases anonymous FTP ftp.sdsc.edu/pub/sdsc/biology/profile_programs  ftp://ftp.sdsc.edu/pub/sdsc/biology/profile_programs 
Profile-SSh most databases  http://www.psc.edu/general/software/packages/profiless/profiless.html 


Search Genes and Coding Regions  

FGENES and related programs that use linear discriminant analysis or hidden Markov modelsa  http://genomic.sanger.ac.uk/gf/gf.shtml  Solovyev et al. (1995);
GeneFinder access site at the Sanger Center  http://genomic.sanger.ac.uk/gf/gf.html collection of methods
Genehacker for microbial genomes based on HMMs  http://www-btls.jst.go.jp/GeneHacker/  Hirosawa et al. (1997)
GeneID-3 Web server using rule-based models, and GeneID+b  http://www1.imim.es/geneid.html  Mail server at geneid@darwin.bu.edu
GeneMark and GeneMark.hmmc uses hidden Markov models  http://opal.biology.gatech.edu/GeneMark/ 
GeneParsera,b Web page, uses combination of neural network and dynamic programming methods http://beagle.colorado.edu/~eesnyder/GeneParser.html  http://beagle.colorado.edu/%7Eeesnyder/GeneParser.html Snyder and Stormo (1993, 1995)  
Genescan using Fourier transform of DNA sequences to find characteristic patterns  http://202.41.10.146/~sn055/DOC/gs.htm  Tiwari et al. (1997)
Genetic code variations  http://www.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c 
GenLang using linguistic methods  http://www.cbil.upenn.edu/  Dong and Searls (1994)
GenScan based on probabilistic model of gene structure for vertebrate, Drosophila, and plant genes  http://genes.mit.edu/GENSCAN.html  Burge and Karlin (1998)
Genseqer for aligning genomic and EST sequences  http://bioinformatics.iastate.edu/cgi-bin/gs.cgi Close to SplicePredictor
Glimmer uses interpolated Markov models for prokaryotic translation  http://www.tigr.org/softlab/glimmer/  Salzberg et al. (1998)
GrailIIa,b prediction by neural networks based on scores of characteristic sequence patterns and composition  http://compbio.ornl.gov/  Uberbacher and Mural (1991); Uberbacher et al. (1996)
Initiation codon analysis  http://www.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c 
Microbial genome coding region identification based on Markov chains of order 5  http://igs-server.cnrs-mrs.fr/~audic/selfid.html  Audic and Claverie (1998)
Procrustes based on comparison of related genomic sequences  http://www-hto.usc.edu/software/procrustes/  Gelfand et al. (1996)
Push-button Gene Finder for gene identification using Markov and hidden Markov models  http://www.cse.ucsc.edu/research/compbio/pgf/ 
Translate tool at ExPASy  http://www.expasy.ch/tools/dna.html 
Translation machine on the Web at EBI  http://www2.ebi.ac.uk/translate/ 
Translation of large genome sequences on the Web  http://alces.med.umn.edu/rawtrans.html 
Veil (Viterbi exon-intron locator) uses hidden Markov models for vertebrate DNA  http://www.cs.jhu.edu/labs/compbio/veil.html  Henderson et al. (1997)
Webgene, a set of gene prediction tools and concurrent database similarity searches  http://www.itba.mi.cnr.it/webgene/ 
Webgenemark and Webgenemark.hmmc  http://opal.biology.gatech.edu/GeneMark/  see GeneMark; Lukashin and Borodovsky (1998)

Promoter Prediction Program  

ConsInspector–see Transfac databasea  http://www.gsf.de/biodv/consinspector.html 
FastM for transcription factor binding sites  http://transfac.gbf.de/cgi-bin/fastm/fastm.pl  Klingenhoff et al. (1999)
GeneExpress analysis of transcriptional regulations with TRRD database  http://wwwmgs.bionet.nsc.ru/systems/GeneExpress/ Kolchanov et al. (1999a, b) 
Genome inspector for combined analysis of multiple signals in genomes  http://www.gsf.de/biodv/genomeinspector.html  Quandt et al. (1997) GrailIIb prediction of TSS by neural networks based on scores of characteristic sequence patterns and composition
MAR-FINDER for finding matrix attachment regions  http://www.futuresoft.org/MAR-Wiz/  Kramer et al. (1997); Singh et al. (1997)
MatInspectora – Transfac database  http://www.gsf.de/biodv/matinspector.html  (for downloading)
Mirage (Molecular Informatics Resource for the Analysis of Gene Expression)d  http://www.ifti.org/ 
NNPP Promoter Prediction by Neural Network for prokaryotes or eukaryotes  http://www.fruitfly.org/seq_tools/promoter.html  Reese et al. (1996)
NSITE–search for TF binding sites or other consensus regulatory sequences  http://genomic.sanger.ac.uk/gf/gf.shtml 
OOTFD Object-Oriented Transcription Factor Database  http://www.ifti.org/cgi-bin/ifti/ootfd.pl  Ghosh (1998)
Pol3scan for RNAP III/tRNA promoter sequences using pattern scoring matrices  http://irisbioc.bio.unipr.it/genomics.html  Pavesi et al. (1994)
Promoter element weight matrices and HMMs  http://www.epd.isb-sib.ch/promoter_elements/  Bucher (1990)
Promoter II for recognition of PolII sequences by neural networks  http://www.cbs.dtu.dk/services/promoter/  Knudsen (1999)
PromoterScane  http://bimas.dcrt.nih.gov/molbio/proscan/  Prestridge (1995) and see Web site
RegScan for promoter classification  http://wwwmgs.bionet.nsc.ru/mgs/programs/classprom/  Babenko et al. (1999)
Sequence walkers for graphical viewing of the interaction of regulatory protein with DNA binding site  http://www-lecb.ncifcrf.gov/~toms/walker/narcoverlogowalker.html  Schneider (1997)
Signal scan for transcriptional elements  http://bimas.dcrt.nih.gov:80/molbio/signal/ Prestridge (1991, 1996)  
TargetFinder for promoter searching in selected annotated sequences  http://www.tigem.it/  Lavorgna et al. (1999)
TESS for searching for transcription factor binding sites  http://www.cbil.upenn.edu/tess/ Schug and Overton (1997a, b)  
Tfbind for transcription factor binding sites  http://tfbind.ims.u-tokyo.ac.jp  Tsunoda and Takagi (1999)
Transfac programs providing search for TF binding sites. MatInd for making scoring matrices and MatInspector for searching for matches to matrices  http://www.gsf.de/cgi-bin/matsearch.pl, http://www.gsf.de/biodv/staff_pub.html,  Knüppel et al. 1994);Quandt et al. (1995);Heinemeyer et al. (1999);Klingenhoff et al. (1999)
Wentian Li's Website for multiple analysis  http://linkage.rockefeller.edu/wli/gene/programs.html .

Protein Structure Analysis  

The PredictProtein server at the European Molecular Biology Laboratory at Heidelberg, Germany important site for secondary structure prediction by PHD, predator, TOPITS, threader  http://cubic.bioc.columbia.edu/predictprotein 
Swiss Institute of Bioinformatics, Geneva basic types of protein analysisd databases, the Swiss-Model resource for prediction of protein models, Swiss-PdbViewer  http://www.expasy.ch/ 

Protein Structure Viewer  
  
Chime  http://www.umass.edu/microbio/chime/  A Web browser plug-in that can be used to display and manipulate structures inside a Web page. There are many mouse-driven controls. Excellent for lecture presentations.
Cn3da  http://www.ncbi.nlm.nih.gov/Structure/  (Hogue 1997) Provides viewing of three-dimensional structures from Entrez and MMDBa. Cn3D runs on Windows, MacOS, and Unix; simultaneously displays structural and sequence alignments; can show multiple superimposed images from NMR studies.
Mage  http://kinemage.biochem.duke.edu  (see Richardson and Richardson 1994) Standard molecular viewing features with animation and kaleidoscope effects.
Rasmolb  http://www.umass.edu/microbio/rasmol/  (Sayle and Milner-White 1995) Most commonly used viewer for Windows, MacOS, UNIX, and VMS operating systems. Performs many functions.
Swiss 3D viewer, Spdbv  http://www.expasy.ch/spdbv/mainpage.html  (Guex and Peitsch 1997) Protein models can be built by structural alignments; calculates atomic angles and distances, threading, energy minimation, and interacts with the Swiss Model server.

Protein Secondary Structure Prediction  

Modeller  http://guitar.rockefeller.edu/modeller/modeller.html  dynamic programming alignment of sequences and structures and molecular dynamics methods Sali et al. (1995)
Swiss-model  http://www.expasy.ch/swissmod/SWISS-MODEL.html  sequence alignment of query with sequences of known structure Peitsch (1996)
Whatif  http://www.cmbi.kun.nl/whatif/  flexible molecular graphics rendering of models Rodriguez et al. (1998)
Baylor College of Medicine (BCM)  http://searchlauncher.bcm.tmc.edu/seq-search/struc-predict.html  collection of methods and linked to other servers
DSC  http://www.bmm.icnet.uk/dsc/  linear discrimination King et al. (1997)
J-Pred structure prediction server  http://jura.ebi.ac.uk:8888/  NNSSP, DSC, Predator, Mulpred,b Zpred,c Jnet,e and PHD Cuff et al. (1998);
NNPRED http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html  http://www.cmpharm.ucsf.edu/%7Enomi/nnpredict.html  neural networks enhanced to detect sequence periodicity Kneller et al. (1990)
NPS@ server, MLR combination for secondary structure predictiona  http://pbil.ibcp.fr/NPSA/  combination of prediction methods using multivariate linear regression to optimize the predictions Guermeur et al. (1999)
Protein Sequence Analysis (PSA) Systemd  http://bmerc-www.bu.edu/psa/index.html  discrete space models (hidden Markov models) for patterns of a helices, b strands, tight turns, and loops in specific structural classes Stultz et al. (1993, 1997); White et al. (1994)
PREDATOR  http://www.embl-heidelberg.de/argos/predator/predator_info.html  based on analysis of long- and short-range amino acid interactions and alignments of sequence pairs Frishman and Argos (1995, 1996, 1997)
Predict Protein server  http://www.embl-heidelberg.de/predictprotein/predictprotein.html ; see also mirror sites neural networks of multiple sequence alignment Rost and Sander (1994); Rost (1996)
PSSP  http://searchlauncher.bcm.tmc.edu/seq-search/struc-predict.html nearest neighbor enhanced by non-intersecting local and multiple sequence alignments Salamov and Solovyev (1995, 1997)  
Simpa96  http://pbil.ibcp.fr/NPSA/  nearest-neighbor method Levin (1997)
SOPM, SOPMA  http://pbil.ibcp.fr/NPSA/  nearest-neighbor method based on sequence alignments Geourjon and Deleage (1994, 1995)
SSP  http://searchlauncher.bcm.tmc.edu/seq-search/struc-predict.html  linear discriminant analysis based on amino acid composition of local and adjacent regions see H option for this program on Web page
UCLA-DOE structure prediction server  http://www.doe-mbi.ucla.edu/people/frsvr/frsvr.html  collection of methods and linked to other servers Fischer and Eisenberg (1996)

Threading servers and program   
123D  http://www-lmmb.ncifcrf.gov/~nicka/123D.html  contact potentials between amino acid side groups Alexandrov et al. (1996)
3D-PSSM  http://www.bmm.icnet.uk/~3dpssm  sequence-structure using position-specific scoring matrices Russell et al. (1997)
Honig lab  http://honiglab.cpmc.columbia.edu/  threading methods using biophysical properties
Libra I  http://www.ddbj.nig.ac.jp/htmls/E-mail/libra/LIBRA_I.html  target sequence and 3D profile are aligned by dynamic programming Ota and Nishikawa (1997)
NCBI structure site  http://www.ncbi.nlm.nih.gov/Structure/RESEARCH/threading.html  Gibbs sampling algorithm used to align sequence and structurea Bryant (1996)
Profit  http://lore.came.sbg.ac.at/home.html  fold recognition by the contact potential method M. Sippl
Threader 2  http://insulin.brunel.ac.uk/threader/threader.html  prediction by recognition of the correct fold from a library of alternatives Jones et al. (1995)
TOPITS  http://www.embl-heidelberg.de/predictprotein/doc/help_05.html detects similar motifs of secondary structure and accessibility between a sequence of unknown structure and a known fold Rost (1995a,b)
UCLA-DOE structure prediction server  http://www.doe-mbi.ucla.edu/people/frsvr/frsvr.html  fold-recognition using 3D profiles and secondary structure prediction methods Fischer and Eisenberg (1996)
CASP  http://predictioncenter.llnl.gov/  overall assesment of the methods

 

EMBOSS ( http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Apps/index.html#embassy ) dowloadable source codes.  

alignment consensus FUNCTION AUTHOR
cons Creates a consensus from multiple alignments HGMP
megamerger Merge two large overlapping nucleic acid sequences HGMP
merger Merge two overlapping sequences HGMP

alignment differences  
diffseq Find differences between nearly identical sequences HGMP

alignment dot plots  
dotmatcher Produces a dotplot of two sequences. Sanger
dotpath Displays a non-overlapping wordmatch dotplot of two sequences HGMP
dottup DNA sequence dot plot Sanger
polydot Multiple dotplot Sanger

alignment global  
est2genome Align EST and genomic DNA sequences Sanger
needle Needleman-Wunsch global alignment. HGMP
stretcher Global alignment of two sequences. Sanger

alignment local  
matcher Local alignment of two sequences Sanger
seqmatchall Does an all-against-all comparison of a set of sequences Sanger
supermatcher Finds a match of a large sequence against one or more sequences Sanger
water Smith-Waterman local alignment. HGMP
wordmatch Finds all exact matches of a given size between 2 sequences Sanger

alignment multiple  
emma Multiple alignment program HGMP
infoalign Displays some simple information about sequences HGMP
plotcon Plots the quality of conservation of a sequence alignment HGMP
prettyplot Displays aligned sequences, with colouring and boxing. Sanger
showalign Display a multiple sequence alignment HGMP
tranalign Align nucleic coding regions given the aligned proteins HGMP

display  
cirdna Draws circular maps of DNA constructs Norway
lindna Draws linear maps of DNA constructs Norway
pepnet Protein helical net plot HGMP
pepwheel Shows protein sequences as helices HGMP
prettyseq Output sequence with translated ranges HGMP
remap Display a sequence with restriction cut sites, translation etc.. HGMP
seealso Finds programs sharing group names HGMP
showdb Displays information on the currently available databases HGMP
showfeat Show features of a sequence. HGMP
showseq Display a sequence with features, translation etc HGMP
sixpack Display a DNA sequence with 6-frame translation and ORFs LION
textsearch Search sequence documentation text. SRS and Entrez are faster! HGMP

edit  
biosed Replace or delete sequence sections HGMP
cutseq Removes a specified section from a sequence. HGMP
degapseq Removes gap characters from sequences HGMP
descseq Alter the name or description of a sequence. HGMP
entret Reads and writes (returns) flatfile entries HGMP
extractfeat Extract features from a sequence HGMP
extractseq Extract regions from a sequence. HGMP
listor Writes a list file of the logical OR of two sets of sequences HGMP
maskfeat Mask off features of a sequence HGMP
maskseq Mask off regions of a sequence. HGMP
newseq Type in a short new sequence. HGMP
noreturn Removes carriage return from ASCII files HGMP
notseq Excludes a set of sequences and writes out the remaining ones HGMP
nthseq Writes one sequence from a multiple set of sequences HGMP
pasteseq Insert one sequence into another. HGMP
revseq Reverse and complement a sequence. HGMP
seqret Reads and writes (returns) a sequence. Sanger
seqretsplit Reads and writes (returns) sequences in individual files HGMP
skipseq Reads and writes (returns) sequences, skipping the first few HGMP
splitter Split a sequence into (overlapping) smaller sequences. HGMP
trimest Trim poly-A tails off EST sequences HGMP
trimseq Trim ambiguous bits off the ends of sequences HGMP
union Reads sequence fragments and builds one sequence LION
vectorstrip Strips out DNA between a pair of vector sequences HGMP
yank Reads a range from a sequence, appends the full USA to a list file LION

enzyme kinetics  
findkm Calculates Km and Vmax for an enzyme reaction HGMP

feature tables  
coderet Extract CDS, mRNA and translations from feature tables HGMP
twofeat Finds neighbouring pairs of features in sequences HGMP

information  
infoseq Displays some simple information about sequences HGMP
tfm Displays a program's help documentation manual HGMP
whichdb Search all databases for an entry HGMP
wossname Finds programs by keywords in their one-line documentation. HGMP

nucleic codon usage  
cai CAI codon usage statistic HGMP
chips Codon usage statistics HGMP
codcmp Codon usage table comparison HGMP
cusp Create a codon usage table HGMP
syco Synonymous codon usage Gribskov statistic plot HGMP

nucleic composition  
banana Bending and Curvature Plot in B-DNA Sanger
btwisted Calculates the twisting in a B-DNA sequence HGMP
chaos Create a chaos plot for a sequence. Sanger
compseq Counts the composition of dimer/trimer/etc words in a sequence HGMP
dan Plot melting temperatures for DNA. HGMP
freak Residue/base frequency table or plot HGMP
isochore Plots isochores in large DNA sequences Sanger
sirna Finds siRNA duplexes in mRNA HGMP
wordcount Counts words of a specified size in a DNA sequence. Sanger

nucleic cpg islands  
cpgplot Plot CpG rich areas HGMP
cpgreport Reports CpG rich regions HGMP
geecee Calculates the fractional GC content of nucleic acid sequences Sanger
newcpgreport Report CpG rich areas EBI
newcpgseek Reports CpG rich regions EBI

nucleic gene finding  
getorf Finds and extracts open reading frames (ORFs) HGMP
marscan Finds MAR/SAR sites in nucleic sequences HGMP
plotorf Plot potential open reading frames HGMP
showorf Pretty output of DNA translations HGMP
wobble Wobble base plot HGMP

nucleic motifs  
dreg Regular expression search of a nucleotide sequence Sanger
fuzznuc Nucleic acid pattern search HGMP
fuzztran Protein pattern search after translation HGMP

nucleic mutation  
msbar Mutate sequence beyond all recognition HGMP
shuffleseq Shuffles a set of sequences maintaining composition HGMP

nucleic primers  
eprimer3 Picks PCR primers and hybridization oligos HGMP
primersearch Searches DNA sequences for matches with primer pairs HGMP
stssearch Searches a DNA database for matches with a set of STS primers Sanger

nucleic profiles  
profit Scan a sequence or database with a matrix or profile HGMP
prophecy Creates matrices/profiles from multiple alignments HGMP
prophet Gapped alignment for profiles HGMP

nucleic repeats  
einverted Finds DNA inverted repeats Sanger
equicktandem Finds tandem repeats Sanger
etandem Looks for tandem repeats in a nucleotide sequence. Sanger
palindrome Looks for inverted repeats in a nucleotide sequence. HGMP

nucleic restriction  
recoder Find and remove restriction sites but maintain the same translation HGMP
redata Isoschizomers, references and Suppliers for Restriction Enzymes HGMP
restover Finds restriction enzymes that produce a specific overhang Sloan-Kettering Cancer Center
restrict Finds Restriction Enzyme Cleavage Sites HGMP
silent Silent mutation restriction enzyme scan HGMP

nucleic transcription  
tfscan Scans DNA sequences for transcription factors. HGMP

nucleic translation  
backtranseq Back translate a protein sequence HGMP
transeq Translates nucleic acid sequences. HGMP

phylogeny  
distmat Creates a distance matrix from multiple alignments HGMP

protein 2d structure  
garnier Predicts protein secondary structure EBI
helixturnhelix Finds nucleic acid binding domains. HGMP
hmoment Hydrophobic moment calculation HGMP
pepcoil Predicts coiled coil regions HGMP
tmap Predict transmembrane proteins Sanger

protein composition  
charge Protein charge plot HGMP
checktrans ORF property statistics EBI
emowse Protein identification by mass spectrometry HGMP
iep Calculates the isoelectric point of a protein HGMP
mwfilter Filter noisy molwts from mass spec output HGMP
octanol Displays protein hydropathy Sanger
pepinfo Plots simple amino acid properties in parallel HGMP
pepstats Protein statistics HGMP
pepwindow Displays protein hydropathy Sanger
pepwindowall Displays protein hydropathy of a set of sequences Sanger

protein motifs  
antigenic Finds antigenic sites in proteins HGMP
digest Protein proteolytic enzyme or reagent cleavage digest HGMP
fuzzpro Protein pattern search HGMP
oddcomp Finds protein sequence regions with a biased composition. Norway
patmatdb Matching a Prosite motif against a Protein Sequence Database. HGMP
patmatmotifs Compares a protein sequence to the PROSITE motif database. HGMP
pestfind Finds PEST motifs as potential proteolytic cleavage sites Austria
preg Regular expression search of a protein sequence Sanger
pscan Locates fingerprints (multiple motif features) in a protein sequence. HGMP
sigcleave Predicts signal peptide cleavage sites HGMP

utils database creation  
aaindexextract Extract data from AAINDEX HGMP
cutgextract CUTG: Codon Usage Tabulated from GenBank by organism HGMP
printsextract Preprocesses the PRINTS database for use with the program PSCAN HGMP
prosextract Extracts ID, AC, and PA lines from the PROSITE motif database. HGMP
rebaseextract Extract data from REBASE HGMP
tfextract Extract data from TRANSFAC HGMP

utils database indexing  
dbiblast Database indexing for BLAST 1 and 2 indexed databases Sanger
dbifasta Index a fasta database HGMP
dbiflat Database indexing for flat file databases Sanger
dbigcg Database indexing for GCG formatted databases Sanger

utils misc  
embossdata Finds or fetches the data files read in by the EMBOSS programs HGMP
embossversion Writes the current EMBOSS version number HGMP

PHYLIP TOOLS ( http://evolution.genetics.washington.edu/phylip/programs.html ) downloadable source codes. 

Heuristic search for best tree 

PROTPARS Estimates phylogenies from protein sequences (input using the standard one-letter code for amino acids) using the parsimony method, in a variant which counts only those nucleotide changes that change the amino acid, on the assumption that silent changes are more easily accomplished."

DNAPARS. Estimates phylogenies by the parsimony method using nucleic acid sequences. Allows use the full IUB ambiguity codes, and estimates ancestral nucleotide states. Gaps treated as a fifth nucleotide state."

DNACOMP. Estimates phylogenies from nucleic acid sequence data using the compatibility criterion, which searches for the largest number of sites which could have all states (nucleotides) uniquely evolved on the same tree. Compatibility is particularly appropriate when sites vary greatly in their rates of evolution, but we do not know in advance which are the less reliable ones.

DNAML.  Estimates phylogenies from nucleotide sequences by maximum likelihood. The model employed allows for unequal expected frequencies of the four nucleotides, for unequal rates of transitions and transversions, and for different (prespecified) rates of change in different categories of sites, with the program inferring which sites have which rates.

NAMLK. Same as DNAML but assumes a molecular clock. The use of the two programs together permits a likelihood ratio test of the molecular clock hypothesis to be made.

RESTML. Estimation of phylogenies by maximum likelihood using restriction sites data (not restriction fragments but presence/absence of individual sites). It employs the Jukes-Cantor symmetrical model of nucleotide change, which does not allow for differences of rate between transitions and transversions. This program is VERY slow."

FITCH. Estimates phylogenies from distance matrix data under the "additive tree model" according to which the distances are expected to equal the sums of branch lengths between the species. Uses the Fitch-Margoliash criterion and some related least squares criteria. Does not assume an evolutionary clock. This program will be useful with distances computed from DNA sequences, with DNA hybridization measurements, and with genetic distances computed from gene frequencies.

KITSCH. Estimates phylogenies from distance matrix data under the "ultrametric" model which is the same as the additive tree model except that an evolutionary clock is assumed. The Fitch-Margoliash criterion and other least squares criteria are assumed. This program will be useful with distances computes from DNA sequences, with DNA hybridization measurements, and with genetic distances computed from gene frequencies.

NEIGHBOR An implementation by Mary Kuhner and John Yamato of Saitou and Nei's "Neighbor Joining Method," and of the UPGMA (Average Linkage clustering) method. Neighbor Joining is a distance matrix method producing an unrooted tree without the assumption of a clock. UPGMA does assume a clock. The branch lengths are not optimized by the least squares criterion but the methods are very fast and thus can handle much larger data sets.

ONTML.  Estimates phylogenies from gene frequency data by maximum likelihood under a model in which all divergence is due to genetic drift in the absence of new mutations. Does not assume a molecular clock. An alternative method of analyzing this data is to compute Nei's genetic distance and use one of the distance matrix programs.

MIX.  Estimates phylogenies by some parsimony methods for discrete character data with two states (0 and 1). Allows use of the Wagner parsimony method, the Camin-Sokal parsimony method, or arbitrary mixtures of these. Also reconstructs ancestral states and allows weighting of characters."

DOLLOP Estimates phylogenies by the Dollo or polymorphism parsimony criteria for discrete character data with two states (0 and 1). Also reconstructs ancestral states and allows weighting of characters. Dollo parsimony is particularly appropriate for restriction sites data; with ancestor states specified as unknown it may be appropriate for restriction fragments data.

Branch-and-bound exact search for best tree 
 
DNAPENNY.  Finds all most parsimonious phylogenies for nucleic acid sequences by branch-and-bound search. This may not be practical (depending on the data) for more than 10 or 11 species.
 
PENNY.  Finds all most parsimonious phylogenies for discrete-character data with two states, for the Wagner, Camin-Sokal, and mixed parsimony criteria using the branch-and-bound method of exact search. May be impractical (depending on the data) for more than 10-11 species.
 
DOLPENNY.  Finds all most parsimonious phylogenies for discrete-character data with two states, for the Dollo or polymorphism parsimony criteria using the branch-and-bound method of exact search. May be impractical (depending on the data) for more than 10-11 species.
 
CLIQUE.  Finds the largest clique of mutually compatible characters, and the phylogeny which they recommend, for discrete character data with two states. The largest clique (or all cliques within a given size range of the largest one) are found by a very fast branch and bound search method. The method does not allow for missing data. For such cases the T (Threshold) option of MIX may be a useful alternative. Compatibility methods are particular useful when some characters are of poor quality and the rest of good quality, but when it is not known in advance which ones are which.

Distances or bootstrap samples 

DNADIST Computes four different distances between species from nucleic acid sequences. The distances can then be used in the distance matrix programs. The distances are the Jukes-Cantor formula, one based on Kimura's 2- parameter method, Jin and Nei's distance which allows for rate variation from site to site, and a maximum likelihood method using the model employed in DNAML. The latter method of computing distances can be very slow.

PROTDIST Computes a distance measure for protein sequences, using maximum likelihood estimates based on the Dayhoff PAM matrix, Kimura's 1983 approximation to it, or a model based on the genetic code plus a constraint on changing to a different category of amino acid. The distances can then be used in the distance matrix programs

SEQBOOT Reads in a data set, and produces multiple data sets from it by bootstrap resampling. Since most programs in the current version of the package allow processing of multiple data sets, this can be used together with the consensus tree program CONSENSE to do bootstrap (or delete-half-jackknife) analyses with most of the methods in this package. This program also allows the Archie/Faith technique of permutation of species within characters.

GENDIST Computes one of three different genetic distance formulas from gene frequency data. The formulas are Nei's genetic distance, the Cavalli- Sforza chord measure, and the genetic distance of Reynolds et. al. The former is appropriate for data in which new mutations occur in an infinite isoalleles neutral mutation model, the latter two for a model without mutation and with pure genetic drift. The distances are written to a file in a format appropriate for input to the distance matrix programs.

FACTOR Takes discrete multistate data with character state trees and produces the corresponding data set with two states (0 and 1). Written by Christopher Meacham

Tree manipulation, plotting, consensus 

DRAWGRAM Plots rooted phylogenies, cladograms, and phenograms in a wide variety of user-controllable formats. The program is interactive and allows previewing of the tree on PC graphics screens, and Tektronix or DEC graphics terminals. Final output can be on a laser printer (such as the Apple Laserwriter or HP Laserjet), on graphics screens or terminals, in files readable by drawing programs such as PC Paintbrush, MacDraw, Idraw, and Xfig, on pen plotters (Hewlett-Packard or Houston Instruments) or on dot matrix printers capable of graphics

DRAWTREE Similar to DRAWGRAM but plots unrooted phylogenies

CONSENSE Computes consensus trees by the majority-rule consensus tree method, which also allows one to easily find the strict consensus tree. Does NOT compute the Adams consensus tree. Trees are input in a tree file in standard nested-parenthesis notation, which is produced by many of the tree estimation programs in the package. This program can be used as the final step in doing bootstrap analyses for many of the methods in the package

RETREE Reads in a tree (with branch lengths if necessary) and allows you to reroot the tree, to flip branches, to change species names and branch lengths, and then write the result out. Can be used to convert between rooted and unrooted trees.

Interactive tree manipulation 

DNAMOVE Interactive construction of phylogenies from nucleic acid sequences, with their evaluation by parsimony and compatibility and the display of reconstructed ancestral bases. This can be used to find parsimony or compatibility estimates by hand.

MOVE Interactive construction of phylogenies from discrete character data with two states (0 and 1). Evaluates parsimony and compatibility criteria for those phylogenies and displays reconstructed states throughout the tree. This can be used to find parsimony or compatibility estimates by hand.

DOLMOVE Interactive construction of phylogenies from discrete character data with two states (0 and 1) using the Dollo or polymorphism parsimony criteria. Evaluates parsimony and compatibility criteria for those phylogenies and displays reconstructed states throughout the tree. This can be used to find parsimony or compatibility estimates by hand.

RETREE Reads in a tree (with branch lengths if necessary) and allows you to reroot the tree, to flip branches, to change species names and branch lengths, and then write the result out. Can be used to convert between rooted and unrooted trees. Does not refer to any data.
 

List of Other Phylogenetic Analysis Tools (http://evolution.genetics.washington.edu/phylip/software.html

EBI Tools  http://www.ebi.ac.uk/Tools/index.html 

Homology & Similarity  http://www.ebi.ac.uk/Tools/homology.html programs can be used to look for sequence similarity
 http://www.ebi.ac.uk/blast/index.html  - the BLAST
 http://www.ebi.ac.uk/fasta/index.html   or Fasta
  
Protein Functional Analysis  http://www.ebi.ac.uk/Tools/protein.html  
 http://www.ebi.ac.uk/interpro/scan.html InterProScan
  
Structural Analysis  http://www.ebi.ac.uk/Tools/structural.html  can be used to search for motifs in your protein sequence
 http://www.ebi.ac.uk/msd-srv/ssm   - MSDfold
 http://www.ebi.ac.uk/dali/  or DALI
  
Sequence Analysis  http://www.ebi.ac.uk/Tools/sequence.html can be used to query your protein structure and compare it to those in the Protein Data Bank (PDB)
 http://www.ebi.ac.uk/clustalw/index.html   - ClustalW
  
Miscellaneous Tools  http://www.ebi.ac.uk/Tools/misc.html  a sequence alignment tool
 http://www.ebi.ac.uk/microarray/ExpressionProfiler/ep.html  Expression Profiler: A set of tools for clustering, analysis and visualization of gene expression and other genomic data


EXPASY TOOLS  http://expasy.ch/ 

Proteomics and sequence analysis tools  http://expasy.ch/tools/  
  
Proteomics  http://expasy.ch/tools/  
 http://expasy.ch/tools/peptident.html PeptIdent
 http://expasy.ch/tools/peptide-mass.html PeptideMass
DNA -> Protein  http://expasy.ch/tools/ 
 http://expasy.ch/tools/dna.html Translate
Similarity searches  http://expasy.ch/tools/  
 http://expasy.ch/tools/blast/ BLAST
Pattern and profile searches  http://expasy.ch/tools/  
 http://expasy.ch/tools/scanprosite/ ScanProsite
Post-translational modification and topology prediction  http://expasy.ch/tools/  
Primary structure analysis  http://expasy.ch/tools/  
 http://expasy.ch/tools/protparam.html ProtParam
 http://expasy.ch/tools/pi_tool.html,  pI/MW
 http://expasy.ch/cgi-bin/protscale.pl ProtScale
Secondary and tertiary structure prediction  http://expasy.ch/tools/  
 http://expasy.ch/swissmod/SWISS-MODEL.html SWISS-MODEL
 http://expasy.ch/spdbv/ Swiss-PdbViewer
Alignment  http://expasy.ch/tools/ 
 http://www.ch.embnet.org/software/TCoffee.html T-COFFEE
 http://expasy.ch/tools/sim-prot.html SIM

Biological text analysis  http://expasy.ch/tools/  
 http://expasy.ch/melanie/ Software for 2-D PAGE analysis
  
Roche Applied Science's Biochemical Pathways  http://expasy.ch/cgi-bin/search-biochem-index  


RCSB-Developed Software  

mmCIF Resources  
CIFTr   http://pdb.rutgers.edu/mmcif/CIFTr/index.html 
CIFLIB  http://pdb.rutgers.edu/mmcif/CIFLIB/index.html  C language application program interface
CIFOBJ  http://pdb.rutgers.edu/mmcif/CIFOBJ/index.html  A class library of mmCIF dictionary access tools
CIFPARSE  http://pdb.rutgers.edu/mmcif/CIFPARSE/index.html  A library of access tools for mmCIF
CIFPARSE-OBJ  http://pdb.rutgers.edu/mmcif/CIFPARSE-OBJ/index.html  A library of access tools for mmCIF in C++
CIFTABLE (SSTable)  http://pdb.rutgers.edu/mmcif/SSTABLE/index.html  A class library of table access tools (old version)
CIFTABLE (ISTable)  http://pdb.rutgers.edu/mmcif/ISTABLE/index.html  A class library of table access tools
mmCIF loader  http://pdb.rutgers.edu/mmcif/MMCIF-LOADER/index.html  An application to load mmCIF data into relational databases and XML
OpenMMS Toolkit  http://openmms.sdsc.edu A suite of Java source code that includes an mmCIF parser, RDBMS loader, XML translator, and Corba server 
STAR (CIF) parser  http://pdb.sdsc.edu/index.html  Several object-oriented Perl modules for parsing mmCIF files and other STAR-compliant files without nested loops
Deposition Resources  
ADIT - Workstation Version (alpha release)  http://pdb.rutgers.edu/mmcif/ADIT/index.html  A package for editing and checking structure data entries
MAXIT  http://pdb.rutgers.edu/mmcif/MAXIT/index.html  An application for processing and curation of macromolecular structure data
PDB_EXTRACT  http://pdb.rutgers.edu/mmcif/demo.tar.gz  (download) Tools and examples for extracting mmCIF data from structure determination applications
PDB Validation Suite (beta version)  http://pdb.rutgers.edu/mmcif/VAL/index.html  A tool for processing and checking structure data
FTP Archive Resources  
bnl2rcsb  ftp://ftp.rcsb.org/pub/pdb/software/  Perl script to convert a BNL FTP directory structure to an RCSB FTP directory structure
getPdbUpdate  ftp://ftp.rcsb.org/pub/pdb/software/  Perl script to retrieve files from any update found at

Other Software Links*  

mmCIF software tools  
CBFLib  http://www.bernstein-plus-sons.com/software/CBF/ 
A library of ANSI-C functions providing a simple mechanism for accessing Crystallographic Binary Files (CBF files) and Image-supporting CIF (imgCIF) files   
cif2pdb  http://www.bernstein-plus-sons.com/software/cif2pdb/ Program to convert mmCIF to pseudo-PDB format
CIFtbx2   http://www.bernstein-plus-sons.com/software/ciftbx/ 
Extended CIF Tool Box (Fortran) with CYCLOPS and cif2cif   
OOSTAR  http://www.sdsc.edu/pb/cif/OOSTAR.html 
Applications to manipulate STAR files (Objective-C)   
pdb2cif   http://www.bernstein-plus-sons.com/software/pdb2cif/ 
Scripts to filter a PDB entry and produce mmCIF   

Crystallography  

ARP/wARP  http://www.embl-hamburg.de/ARP/ A system for the refinement of protein structures via automatic updating and re-building of the model and solvent structure

CCP4  http://www.dl.ac.uk/CCP/CCP4/main.htmlA suite of programs covering all aspects of crystallographic structure determination, refinement and analysis  

CNS  http://cns.csb.yale.edu/v1.0/ A system for structure determination from crystallographic or NMR data
o  
MAIN  http://www-bmb.ijs.si/doc/An interactively driven suite of programs for molecular modeling, density modification, model refinement and structure analysis  

 http://imsb.au.dk/~mok/o/ An interactive system for building and manipulating models in electron density maps

SHELX  http://shelx.uni-ac.gwdg.de/SHELX/ A set of programs for direct structure solution and refinement with high resolution diffraction data

SOLVE  http://www.solve.lanl.gov/ An automated system for phase determination from MIR and MAD data

X-PLOR 3.851  http://xplor.csb.yale.edu/xplor-info/xploronline.html A program for structure determination from crystallographic or NMR data (Yale version)

X-PLOR/CNX  http://www.accelrys.com/products/cnx/ A program for structure determination from crystallographic or NMR data (Accelrys version)

XtalView  http://www.scripps.edu/pub/dem-web/toc.html An interactive system for building and manipulating models in electron density map and for phase determination from MIR or MAD data.

NMR  

CNS  http://cns.csb.yale.edu/v1.0/ A system for structure determination from crystallographic or NMR data

CYANA  http://www.guentert.com/Cyana.html A program for the structure calculation of biological macromolecules on the basis of conformational constraints from NMR

Fantom  http://www.scsb.utmb.edu/fantom/fm_home.html A program for structure calculation and refinement using torsion angle minimization with NMR data

X-PLOR 3.851  http://xplor.csb.yale.edu/xplor-info/xploronline.html A program for structure determination from crystallographic or NMR data (Yale version)
  
Structure Analysis and Verification  

CE/CL  http://cl.sdsc.edu/ Software for structure comparison by Combinatorial Extension (CE) and Compound Likeness (CL)
ENDscript  http://genopole.toulouse.inra.fr/ENDscript 
A Web server for searching homologous sequences and giving information on secondary structure elements, accessibility, hydropathy and protein-protein contacts   

ESPript  http://genopole.toulouse.inra.fr/ESPript Easy Sequencing in Postscript

Non-covalent bond finder  http://www.umass.edu/microbio/chime/find-ncb/index.htm Software for finding non-covalent interactions for use with Chime 2 or higher

PASS  http://www.delanet.com/~bradygp/pass A fast cavity-detection program for the identification and visualization of possible protein binding sites

Procheck  http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html A program that checks the stereochemical quality of a protein structure

ProFit  http://www.biochem.ucl.ac.uk/~martin/text/ProFit.readme A program for fitting protein structures on to each other

SARF2  http://123d.ncifcrf.gov/sarf2.html A program which searches for similar structural motifs (via an analysis of backbone fragments) in protein structures
Surface Racer  http://monte.biochem.wisc.edu/~tsodikov/surface.html 
A program that calculates exact accessible surface area, molecular surface area and average curvature of molecular surface, and analyzes cavities in the protein interior inaccessible from the outside.   

SURFNET  http://www.biochem.ucl.ac.uk/~roman/surfnet/surfnet.html A program which generates surfaces and void regions between molecular surfaces

WHAT_CHECK  http://www.sander.embl-heidelberg.de/whatcheck/ A system for protein structure validation derived from the WHAT IF program

WHAT IF  http://www.cmbi.kun.nl/whatif/A protein structure analysis program that may be used for mutant prediction, structure verification and molecular graphics  

Modeling and Simulation  


ANALYZE  http://www.tc.cornell.edu/reports/NIH/resource/CompBiologyTools/analyze/index.asp Cornell Theory Center program to classify and analyze conformations obtained from global searches; includes capabability to compare NMR intensites and coupling constants to experimental data

AMBER  http://www.amber.ucsf.edu/amber/amber.html Assisted Model Building with Energy Refinement - a molecular dynamics and energy minimization program
AutoDock3.0  http://www.scripps.edu/pub/olson-web/dock/autodock 
A suite of automated docking tools designed to predict how small molecules, such as substrate or drug candidates, bind to a receptor of known 3D structure   

CHARMM  http://yuri.harvard.edu/ Chemistry at HARvard Molecular Mechanics - a molecular dynamics and energy minimization program

ECEPPAK  http://www.tc.cornell.edu/reports/NIH/resource/CompBiologyTools/eceppak/index.asp Cornell Theory Center package to carry out global conformational searches using the ECEPP/3 force field

FTDOCK  http://www.bmm.icnet.uk/docking/ A program for carrying out rigid-body docking between biomolecules

GROMOS  http://www.igc.ethz.ch/gromos/ A general-purpose molecular dynamics computer simulation package for the study of biomolecular systems

GROMACS  http://md.chem.rug.nl/~gmxComplete modelling package for proteins, membrane systems and more, including fast molecular dynamics, normal mode analysis, essential dynamics analysis and many trajectory analysis utilities  

ICM  http://www.molsoft.com/MolSoft ICM programs and modules for applications including for structure analysis, modeling, docking, homology modeling and virtual ligand screening  
JACKAL  http://trantor.bioc.columbia.edu/~xiang/jackal/ 
Suite of tools for model building, structure prediction and refinement, reconstruction, and minimization; for SGI, Linux, and Sun Solaris   

LOOPP  http://www.tc.cornell.edu/reports/NIH/resource/CompBiologyTools/loopp/index.asp Linear Optimization of Protein Potentials. Cornell Theory Center program for potential optimization and alignments of sequences and structures
MAMMOTH  http://icb.mssm.edu/services/mammoth/mammoth 
MAtching Molecular Models Obtained from THeory - a program for automated pairwise and multiple structural alignments; for SGI, Linux, and Sun Solaris   

MidasPlus  http://www.cgl.ucsf.edu/Outreach/midasplus/A program for displaying, manipulating and analysing macromolecules  

MODELLER  http://guitar.rockefeller.edu/modeller/modeller.html A program for automated protein homology modeling

MOIL  http://www.tc.cornell.edu/reports/NIH/resource/CompBiologyTools/moil/index.asp Cornell Theory Center package for molecular dynamics simulation of biological molecules

NAMD  http://www.ks.uiuc.edu/Research/namd/ A parallel object-oriented molecular dynamics simulation program

WAM - Web Antibody Modelling  http://antibody.bath.ac.uk A server for automated structure modeling from antibody Fv sequences

123D  http://123d.ncifcrf.gov/123D+.htmlA program which threads a sequence through a set of structures using substitution matrix, secondary structure prediction and contact capacity potential  

Molecular Graphics  

BioEditor  http://bioeditor.sdsc.edu/ 

Shockwave 3D PDB Viewer  http://www.candomultimedia.com/medical A tool for creating and viewing dynamic, formatted structure annotations; for Windows
Free, easy to use tool for viewing molecular structures through a Web page--streams data directly from PDB on PC's and Mac; developed in Ireland   
Chemscape Chime  http://www.mdlchime.com/chime/ 
From MDL Information Systems. This program allows visualisation of structures within WWW browser pages. For further information about Chime see the UMass Chime Resources Page  http://www.umass.edu/microbio/chime/ 
Java3D Molecular Visualisation System  http://www.adcworks.com/projects/jmvs 
Free Java/Java3D progam and source code   

Mage and Kinemages  http://kinemage.biochem.duke.edu/kinemage/kinemage.phpInteractive molecular display for research and educational uses. Free, open source for Macintosh, PC, Unix, and Linux. A Java version does 3-D Web display without plug-ins.  
MOLMOL  http://www.mol.biol.ethz.ch/wuthrich/software/molmol/ 
A program for displaying, analyzing, and manipulating the 3-D structure of biological macromolecules, with special emphasis on the study of protein or DNA structures determined by NMR   

RasMol  http://www.bernstein-plus-sons.com/software/rasmol/A free viewing system for PDB coordinate files that runs on Macintosh, PC and UNIX systems. Open source versions  http://www.openrasmol.org/

Raster3D  http://www.bmsc.washington.edu/raster3d/raster3d.htmlA set of tools for generating high quality raster images of proteins or other molecules. Freeware for UNIX, LINUX and PC.  

RasTop (v. 2.0)  http://www.geneinfinity.org/rastopA free user-friendly graphical interface to RasMol molecular visualization software (v. 2.7.2.1), available for Windows platforms  

Ribbons  http://sgce.cbse.uab.edu/ribbons/ A program for molecular illustration and error analysis
RmscopII  http://rmscopii.sourceforge.net/ 
A Tcl/Tk script responsible to redirect PDB files or RasMol scripts to multiple RasMol sessions; can be used as a Web browser helper application or as a standalone program.   
Swiss PDB viewer available from Switzerland  http://www.expasy.ch/spdbv/  | Australia
A 3D graphics and molecular modeling program for the simultaneous analysis of multiple models and for model-building into electron density maps. The software is available for Macintosh or PC  
Uppsala Electron Density Server  http://portray.bmc.uu.se/eds/ Generated density maps

MolScript  http://www.avatar.se/molscript/ A program for displaying structures in both detailed and schematic formats and writing images in various formats

MolView and MolView Lite  http://www.danforthcenter.org/smith/MolView/molview.html Free molecular visualization programs for the Macintosh
PDB2MGIF  http://www.dkfz-heidelberg.de/spec/pdb2mgif/ 
Free, user-friendly server that converts PDB files to animated gif files that can be used in Web pages and presentations. Simple step-by-step instructions can be found here  http://www.rcsb.org/pdb/animation.html .
PocketMol  http://birg.cs.wright.edu/pocketmol/pocketmol.html 
Program to view and manipulate PDB files on a PocketPC   
ProteinScope  http://www.proteinscope.com 
Free viewer to display and manipulate PDB files and create animations and slides of proteins   
PyMOL  http://www.pymol.org 
A free and open-source molecular graphics system for visualization, animation, editing, and publication-quality imagery. PyMOL is scriptable and can be extended using the Python language. Supports Windows, Mac OSX, and Unix   
Qmol  http://lancelot.bio.cornell.edu/jason/qmol.html 
A lightweight OpenGL based molecular viewer for Windows 95/NT/00 and X Windows   
ViewerLite and ViewerPro (Discovery Studio)  http://www.accelrys.com/dstudio/ds_viewer/ Molecular visualization programs for Macintosh and PC from Accelrys
VMD  http://www.ks.uiuc.edu/Research/vmd/VMD (Visual Molecular Dynamics) runs on many platforms including MacOS X, and several versions of Unix and Windows. VMD provides visualization, analysis, and Tcl/Python scripting features, and has recently added sequence browsing and volumetric rendering features. VMD is distributed free of charge.  
WebMol  http://www.cmpharm.ucsf.edu/~walther/webmol.html A Java PDB Viewer. WebMol was designed to display and analyze structural information contained in the Protein Data Bank (PDB). It can be run as an applet or as a stand-alone application.
World Index of Molecular Visualization Resources  http://molvis.sdsc.edu/visres/
A Visitor-Maintained Indices (VMI)TM Site by Eric Martz and Trevor D. Kramer. Contains many links to visualization tools, tutorials, and other resources.  


TIGR Tools  http://www.tigr.org/software/ 

Gene Finding/Annotation  

MANATEE  http://manatee.sourceforge.net/  is a  web-based gene evaluation and genome annotation tool. Manatee can store and view annotation for prokaryotic and eukaryotic genomes. The Manatee interface allows biologists to quickly identify genes and make high quality functional assignments, such as GO classifications, using search data, paralogous families, and annotation suggestions generated from automated analysis.

GlimmerM  http://www.tigr.org/software/glimmerm/.related organisms.  A gene finder derived from Glimmer, but developed specifically for eukaryotes. It is based on a dynamic programing algorithm that considers all combinations of possible exons for inclusion in a gene model and chooses the best of these combinations. The decision about what gene model is best is a combination of the strength of the splice sites and the score of the exons generated by an interpolated Markov model (IMM). The system has been trained for Arabidopsis thaliana, Oryza sativa (rice), and Plasmodium falciparum (the malaria parasite), and should work well on closely

Glimmer  http://www.tigr.org/software/glimmer/   A system for finding genes in microbial DNA, especially the genomes of bacteria and archaea. (Gene Locator and Interpolated Markov Modeler) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DNA.
GeneSplicer : A computational method for splice site prediction  http://www.tigr.org/tdb/GeneSplicer/gene_spl.html A fast, flexible system for detecting splice sites in the genomic DNA of various eukaryotes. The system has been trained and tested successfully on Plasmodium falciparum (malaria), Arabidopsis thaliana and human genomes. Training data sets for Human and Arabidopsis thaliana are included. It is fully described in Pertea M, Lin X, Salzberg SL. GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res. 2001 Mar 1;29(5):1185-90 .

TransTerm  http://www.tigr.org/software/transterm.html is a  program that finds rho-independent transcription terminators in bacterial genomes. Each terminator found by the program is assigned a confidence value that provides an estimate of its probability of being a true terminator. TransTerm has been published: Prediction of Transcription Terminators in Bacterial Genomes Ermolaeva, M.D., Khalak, H.G., .White, O., Smith, H.O., Salzberg, S.L. Journal of Molecular Biology 301, 27-33 (2000)

EXONomy  http://www.tigr.org/software/Exonomy/index.shtml  is a  new gene finder based on the Generalized Hidden Markov Model (GHMM) framework, similar to Genscan and Genie. It is highly reconfigurable and includes software for retraining. The replaceable submodels of the GHMM include homogeneous and inhomogeneous Markov models of selectable order, nonstationary Markov chains, windowed and non-windowed Weight Array Matrices (WWAM/WAM/WMM), Maximal Dependence Decomposition (MDD) trees, and codon bias. An EXONomy Web Interface is available.

Unveil  http://www.tigr.org/software/Unveil/index.shtml   is a new gene finder based on a 283-state Hidden Markov Model (HMM) similar to that described in [Henderson,J., Salzberg,S., and Fasman,K.H. (1997) J. Comput. Biol. 4, 127-141]. An Unveil Web Interface is available.

ELPH  http://www.tigr.org/software/ELPH/index.shtml    is a  general-purpose Gibbs sampler for finding motifs in a set of DNA or protein sequences. The program takes as input a set containing anywhere from a few dozen to thousands of sequences, and searches through them for the most common motif, assuming that each sequence contains one copy of the motif.

   
RepeatFinder  ftp://ftp.tigr.org/pub/software/repeatFinder/ is a  computational system for analysis of repetitive structure of genomic sequences. The method uses suffix trees for efficient computation of exact repeats and organizes those repeats into classes. The method can be applied to individual genome sequences or sets of sequences. The output is multi-fasta file of found repeat sequences that can be used as the target of searches.

RBSfinder  ftp://ftp.tigr.org/pub/software/RBSfinder/ is a  Perl script that implements an algorithm to find ribosome binding sites for genes in bacterial and archaeal genomes. It is normally run as a post-processor to the Glimmer gene finder or to other prokaryotic gene finders.

Combiner  http://www.tigr.org/software/combiner/ is a  program that predicts gene models using the output from other annotation software. It uses a statistical algorithm to identify patterns of evidence corresponding to gene models.

HBQCM:  ftp://ftp.tigr.org/pub/software/qc/  Hexamer Based Quality Control Method as described in White O., Dunning T., Sutton G., Adams M., Venter J.C., and Fields C. (1993) A quality control algorithm for DNA sequencing projects. Nucleic Acids Research 21:3829-3838.

Alignment  

MUMmer  http://www.tigr.org/software/mummer/  A system for aligning whole genome sequences. Using an efficient data structure called a suffix tree, the system is able rapidly to align sequences containing millions of nucleotides. It is fully described in: A.L. Delcher, S. Kasif, R.D. Fleischmann, J. Peterson, O. White, and S.L. Salzberg. Alignment of whole genomes. Nucleic Acids Research, 27:11 (1999), 2369-2376. A graphical viewer for the MUMmer output can be found here.

AAT  ftp://ftp.tigr.org/pub/software/AAT/:  A tool for analyzing and annotating genomic sequences. Huang, X., Adams, M.D., Zhou, H. and Kerlavage, A.R. (1997) Genomics 46, 37-45. The AAT package includes two sets of programs, one set (DPS/NAP) for comparing the query sequence with a protein database, and the other (DDS/GAP2) for comparing the query with a cDNA database.

Sequencing/Finishing  
   
Assembler:  http://www.tigr.org/software/assembler/  A tool for assembly of large sets of overlapping sequence data such as ESTs, BACs, or small genomes. This updated assembly tool delivers better performance and results than the previous version, assembling EST, BAC, and genome data with greater care given to repeat detection and contig-level overlapping. TIGR Assembler has been published (Sutton G., White, O., Adams, M., and Kerlavage, A. (1995) TIGR Assembler: A new tool for assembling large shotgun sequencing projects. Genome Science & Technology 1:9-19). Also available, without a license, is the utility ta2ace for converting TIGR Assembler output into the "new" .ACE format used by Consed and other sequence assembly editors.

BAMBUS  http://www.tigr.org/software/bambus/ is the  first publicly available genome sequence scaffolding program. It orders and orients contigs into scaffolds based on various types of linking information. Additionally, BAMBUS allows users to build scaffolds in a hierarchical fashion by prioritizing the order in which links are used. BAMBUS runs on Unix systems.

Lucy:  http://www.tigr.org/software/lucy/  A Sequence Cleanup Program. Lucy is a utility that prepares raw DNA sequence fragments for sequence assembly, possibly using the TIGR Assembler. The cleanup process includes quality assessment, confidence reassurance, vector trimming and vector removal. The primary advantage of Lucy over other similar utilities is that it is a fully integrated, stand alone program. You can view the Program Requirements. The Windows version of Lucy is available from Hui-Hsien Chou's webpage. Lucy is fully described in: DNA sequence quality trimming and vector removal. H.-H. Chou and M.H. Holmes. Bioinformatics, 17:12, pp. 1093-1104, 2001

Microarray  

TM4: A package of Open Source software programsfor Microarray analysis  http://www.tigr.org/software/tm4/   TIGR Microarray Data Analysis System (MIDAS) is a microarray data quality filtering and normalization tool that allows raw experimental data to be processed through various data normalizations, filters, and transformations via a user-designed analysis pipeline. Currently implemented normalization and data analysis algorithms include total-intensity normalization, Lowess (Locfit) normalization, flip-dye consistency checking, replicates analysis, intensity-dependent z-score filtering (slice analysis), etc. MIDAS is implemented by Java language and thus a platform-independent application. It requires JDK v1.3 or higher. Refer to the included manual for details.

MADAM (MicroArray DAta Manager)   Microarray experiments produce large amounts of data for even the simplest of experiments. In order to analyze data from many experiments that data must be stored in an accessible form, such as in a database. MADAM (MicroArray DAta Manager) is a java-based application designed to load and retrieve microarray data to and from a database (also supplied with the software). MADAM provides data entry forms, data report forms and additional applications necessary to maintain microarray data for further analysis. Madam requires JRE 1.3.1.

TIGR MultiExperiment Viewer (MEV) is a   Java application designed to allow the analysis of microarray data to identify patterns of gene expression and differentially expressed genes. Numerous normalization, clustering and distance algorithms have been implemented, along with a variety of graphical displays to best present the results. MEV was written to be flexible and expandable, and supports a variety of input and output formats. MEV requires version 1.2 or higher of Sun's JRE and J3D package.

TIGR Spotfinder   is a software tool designed for Microarray image processing using the TIFF image files generated by most microarray scanners. TIGR Spotfinder was written in C/C++ for PCs running Windows NT/2000/ME/XP.


ArrayViewer http://www.tigr.org/tigr-scripts/license/new.pl?genre=soft&program=ArrayViewer is written in Java for cross-platform compatibility and reads and writes data using flat files or a database through stored procedures, See the ArrayViewer Overview as a Adobe Acrobat PDF File. Machines that lack the requirements for the MultiExperiment Viewer may use ArrayViewer for single experiment analysis.  A software tool designed to facilitate the presentation and analysis of microarray expression data, leading to the identification of genes that are differentially expressed. 

TIGR McCoder  ftp://ftp.tigr.org/pub/software/Microarray/McCoder/ is a  software package designed for a portable scanner with Palm OS to collect bar codes and then transfer the bar codes to PC as a plain text file. The package includes two programs: one that runs on the handheld scanner and one that runs on a regular PC with Windows 95/98/2000/NT. Transferred to PC, the scanned bar codes could be manipulated easily with McCoder.
Scheduler  ftp://ftp.tigr.org/pub/software/Microarray/Scheduler/  is a web based tool that provides an efficient reservation method to manage lab instruments and office facilities. The Scheduler is designed as a two-tier system running on the Internet and can be configured to meet a variety of requirements.


NCBI Tools  http://www.ncbi.nlm.nih.gov/ 

The Basic Local Alignment Search Tool (BLAST  http://www.ncbi.nlm.nih.gov/BLAST/),  for comparing gene and protein sequences against others in public databases, now comes in several flavors including PSI-BLAST, PHI-BLAST, and BLAST 2 sequences. Specialized BLASTs are also available for human, microbial, malaria, and other genomes, as well as for vector contamination, immunoglobulins, and tentative human consensus sequences.

Clusters of Orthologous Groups (COGs  http://www.ncbi.nlm.nih.gov/COG/)  currently covers 21 complete genomes from 17 major phylogenetic lineages. A COG is a cluster of very similar proteins found in at least three species. The presence or absence of a protein in different genomes can tell us about the evolution of the organisms, as well as point to new drug targets.

Map Viewer  http://www.ncbi.nlm.nih.gov/mapview/static/MVstart.html  shows integrated views of chromosome maps for 17 organisms. Used to view the NCBI assembly of complete genomes, including human, Map Viewer is a valuable tool for the identification and localization of genes, particularly those that contribute to diseases.  

LocusLink  http://www.ncbi.nlm.nih.gov/LocusLink/  combines descriptive and sequence information on genetic loci through a single query interface. LocusLink covers information on official nomenclature, aliases, sequence accessions, phenotypes, EC numbers, OMIM numbers, UniGene clusters, homology, map information, and related web sites.

UniGene  http://www.ncbi.nlm.nih.gov/UniGene/  cluster is a non-redundant set of sequences that represents a unique gene. Well-characterized genes, as well as thousands of expressed sequence tag (EST) sequences have been included. Each cluster record also contains information such as the tissue types in which the gene has been expressed and map location. UniGene can assist in gene discovery, gene mapping projects, and large-scale expression analysis.

ORF finder  http://www.ncbi.nlm.nih.gov/gorf/gorf.html   identifies all possible ORFs in a DNA sequence by locating the standard and alternative stop and start codons. The deduced amino acid sequences can then be used to BLAST against GenBank. ORF finder is also packaged in the sequence submission software Sequin. 

Electronic PCR  http://www.ncbi.nlm.nih.gov/genome/sts/epcr.cgi   allows you to search your DNA sequence for sequence tagged sites (STSs), which have been used as landmarks in various types of genomic maps. It compares the query sequence against data in NCBI's UniSTS, a unified, non-redundant view of STSs from a wide range of sources.

VAST Search  http://www.ncbi.nlm.nih.gov/Structure/VAST/vastsearch.html   is a structure-structure similarity search service. It compares 3D coordinates of a newly determined protein structure to those in the MMDB/PDB database. VAST Search computes a list of similar structures that can be browsed interactively, using molecular graphics to view superimpositions and alignments.  

The Cancer Chromosome Aberration Project (CCAP)  http://www.ncbi.nlm.nih.gov/CCAP/  compiles information on the distinct chromosome aberrations that are associated with different cancers. The identification of chromosomal abnormalities by clinicians can enable the diagnosis of, classification of, and treatment selection for a given cancer. 

HumanMouse Homology Maps http://www.ncbi.nlm.nih.gov/Homology/  compare genes in homologous segments of DNA from human and mouse sources, sorted by position in each genome. A total of 1793 loci are presented, most of which are genes. This map should be interpreted as a reflection of probable, not confirmed, homology relationships because of the lack of further information available for about half the loci. 

VecScreen  http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html is a  tool for identifying segments of a nucleic acid sequence that may be of vector, linker or adapter origin prior to sequence analysis or submission. VecScreen was developed to combat the problem of vector contamination in public sequence databases.   dbMHC provides an open, publicly accessible platform for DNA, and clinical data related to the human Major Histocompatibilty Complex (MHC). In addition the dbMHC will provide tools for further submission and analysis of research data linked to the MHC. 

The Cancer Genome Anatomy Project (CGAP)  http://www.ncbi.nlm.nih.gov/ncicgap/  aims to decipher the molecular anatomy of cancer cells. CGAP develops profiles of cancer cells by comparing gene expression in normal, precancerous, and malignant cells from a wide variety of tissues. 

mRNA to Genomic Alignments: Spidey  http://www.ncbi.nih.gov/IEB/Research/Ostell/Spidey  aligns one or more mRNA sequences to a single genomic sequence. Spidey will try to determine the exon/intron structure, returning one or more models of the genomic structure, including the genomic/mRNA alignments for each exon. 


Biology WorkBench  http://biowb.sdsc.edu/

Protein Tools 

Ndjinn  Multiple Database Search
BL2SEQ  Compare proteins to each other with BLAST
BL2SEQX  Compare a protein to nucleotide sequences with BLAST
BLASTP  Compare a PS to a PS DB
TBLASTN  Compare a PS to a translated DB
PSIBLASTP  Position Specific Iterative BLAST
FASTA  Heuristic Sequence Similarity Search (PS Or DB)
TFASTA  Compare a PS to a NS, PS DB
TFASTX  Comp PS to Trans DNA (NS Or DB)
TFASTY  Comp PS to Trans DNA (NS Or DB)
SSEARCH  Smith Waterman Local Alignment of Proteins
CLUSTALW  Multiple Sequence Alignment
CLUSTALWPROF  Align Sequences to Existing Alignment (Profile)
ALIGN  Optimal Global Alignment of Two PS
MSA  Multiple Sequence Alignment (Sum of Pairs Criterion)
LALIGN  Calculate N Best Local PS Alignments
LFASTA  Local Alignment of Two PS
ROBUST  Global alignment of Two PS (Show Robust Pairs)
SIM  N Best Local Similarities Using Affine Weights
BESTSCOR  Calculate the Best Self Comparison Score
CTREE  Align protein sequences with confidence estimates
PRSS  Compare a PS to a Shuffled PS
SAPS  Statistical Analysis of PS
AASTATS  Statistics Based on Amino Acid Abundance, including weight and specific volume
GREASE  Kyte Doolittle Hydropathy Profile
RPSBLAST  Compare a PS to a Conserved Domain DB
FINGERPRINTSCAN  PRINTS fingerprint identification
PROSEARCH  Search Prosite DB for Patterns in a PS
PPSEARCH  Search Prosite DB for Patterns in a PS
PFSCAN  Sequence Search Against a Set of Profiles (PROSITE and PFAM)
HMMPFAM  Search against Pfam HMM database
BLIMPS  Sequence Search Against a Set of Profiles (BLOCKS)
PATTERNMATCHDB  Search for Regular Expressions (Patterns) in a protein sequence DB
PATTERNMATCH  Search for Regular Expressions (Patterns) in a protein sequence
GOR4  Predict Secondary Structure of PS
RANDSEQ  Randomize a Sequence
CHOFAS  Predict Secondary Stucture of PS(s) (Chou Fasman)
HTH  Predict HTH Motifs in Protein Chains
PELE  Protein Structure Prediction
DSSP  Secondary Structure/Solvent Exposure of PDB Proteins
TMAP  Prediction of Transmembrane Segments
TMHMM  Predict location of transmembrane helices and location of intervening loop regions
EXTCOEF  Extinction coefficient calculation
PI  Isoelectric point determination

Nucleic Acid Tools 

 BL2SEQ  Compare nucleotides to each other with BLAST
 BL2SEQX  Compare a nucleotide to protein sequences with BLAST
 BLASTN  Compare a NS to a NS DB
 BLASTX  Compare a PS Derived from NS to a PS DB
 TBLASTX  Compare a translated NS to a translated DB
 FASTA  Nucleic Acid Sequence Comparisons (NS or DB)
 FASTX  Compare Translated NS to PS DB
 FASTY  Compare Translated NS to PS DB
 SSEARCH  Smith
 CLUSTALW  Multiple Sequence Alignment
 CLUSTALWPROF  Align Sequences to Existing Alignment (Profile)
 ALIGN  Optimal Global Sequence Alignment
 LALIGN  Calculate Optimal Local Sequence Alignments
 LFASTA  Calculate Local Sequence Alignments (Heuristic)
 PATTERNMATCHDB  Search for Regular Expressions (Patterns) in a nucleic sequence DB
 PATTERNMATCH  Search for Regular Expressions (Patterns) in a nucleic sequence
 TACG  Analyze a NS for Restriction Enzyme Sites
 PRIMER3  Design Primer Pairs and Probes
 NASTATS  Nucleic Acid Statistics
 BESTSCOR  Calculate the Best Self Comparison Score
 PFSCAN  Sequence Search Against a Set of Profiles (PROSITE)
 PRIMERCHECK  Calculates melting point, length, %GC for a primer sequence
 PRIMERTM  Designs end primers based on a minimum Tm
 SIXFRAME  Generate & Import 6 Frame Translations on a NS
 REVCOM  Generate Reverse Complement of NS
 RANDSEQ  Randomize a Sequence


Alignment Tools 

 Ndjinn  Multiple Database Search
 SPLITSplit Alignment Into Component Sequences 
 DEGAP_SPLITSplit Alignment Into Component Sequences and Remove Gap Characters 
Download Aligned Sequences 
 TEXSHADE  Color Coded Plots of Pre Aligned Sequences
 BOXSHADE  Color Coded Plots of Pre Aligned Sequences
 CLUSTALWPROF  Align Two Existing Alignments (Profiles)
 TMAP  Prediction of Transmembrane Segments
 DRAWTREEDRAWTREE  Draw Unrooted Phylogenetic Tree from Alignment
 DRAWGRAM  Draw Rooted Phylogenetic Tree from Alignment
 CLUSTALDIST  Generate Distance Matrix with Clustal W
 CLUSTALTREE  Phylogenetic Analysis with Clustal W
 DNADIST  Compute Evolutionary Distance Matrix from NS Alignment
 PROTDIST  Compute Evolutionary Distance Matrix from PS Alignment
 DNAPARS  Infer an Unrooted Phylogeny from NS Alignment
 PROTPARS  Infer an Unrooted Phylogeny from PS Alignment
 MVIEW  Multiple Alignment Display


Structure Tools 

 PDF  PDF Knowledge
 CONVERT  File format conversion utility
 TNT  Macromolecular Refinement Package