Link to Application:
Plot Protein with Conservation
Description:
This tool takes mutation information at the protein level and plots out the mutation above the schematic of the protein. It also plots the domains as well as providing a table (under table tab) with the annotation of domains and post-translational modification for each amino acid change.
Notes:
All files should be referring to the same isoform of the protein. This is imperative for drawing the plot correctly. Also, images can be saved by right clicking on the plot and saving as a picture. For best results, run this application in FireFox or Google Chrome.
Required files:
Mutation file: tab-delimited file containing 5 columns (ProteinId, GeneName, ProteinPositionOfMutation, ReferenceAminoAcid, AlternateAminoAcid) NO HEADER NEEDED FOR THIS FILE
Protein architecture file: tab-delimited file containing 3 columns (architecture_name, start_site, end_site). This file NEEDS the header and it is the same as what was previously written. This information can be downloaded from the HPRD (http://hprd.org/) or can be whatever you have gleaned from the literature.
Post-translational modification file: This is a tab-delimited file with only one column and that is the site. This file NEEDS a header and is as previously written.
Alignment file: This is an aligned multiple sequence alignment fasta file such as that produced by MUSCLE (http://www.ebi.ac.uk/Tools/msa/muscle/).
Additional Information Required:
Reference Sequence Position In File: This is the sequence number of your reference in the aligned fasta; for example if human is your reference and its the 4th sequence in the aligned file just say 4
Additional Options:
Name of Query: Any name you want to show up on the plot
X-axis Tick Size: What tick size you want on the x-axis
Show Labels: This is whether or not you want to see the mutation labels (this is derived from the information your provide in your mutation file
Show Conservation Score: If selected, this will show the score as calculated by the formula in the Calculation of Conservation section below. (This works best when zoomed into a region)
Show Reference Sequence: If selected, this will show the reference sequence for your user-defined reference. (This works best when zoomed into a region)
Show Gridlines at Ticks: If selected, this will show full vertical grid lines at each of your x-axis ticks.
Zoom In: This gives you the option to specify a start and end region you would like to look at specifically
Second Mutation File for Plot: This gives you the option to plot another set of mutation data. (Useful for instances where you may want to compare 2 sets like case-control)
Example Input Files:
Please see attached at the bottom of the page example files for this application. The reference sequence position in your test file is 4
Calculation of Conservation:
To generate the conservation track the multiple sequence alignment is read in and all sequences are compared to the user-defined reference at each position. The score (s) is defined as:
s=n/t
where n is the number of sequences with the same amino acid at that position as the reference and t is the total number of sequences queried at that position.
The score is between 0 and 1 with 0 indicating no other sequences matching the reference at that position and 1 indicating all sequences matching the reference at that position.
Example Output Using Few Options and Not Zoomed In
Example Output Using Many Options and Zoomed In
Table