Pro-origami
Pro-origami is a system for automatically generating protein structure cartoons. The cartoons are intended to make protein structure easy to interpret by laying out the secondary and super-secondary structure in two dimensions in a manner that makes the structure clear.
Installing the Pro-origami software
The Pro-origami software for automatically generating protein structure cartoons is usually used via the Pro-origami server. However you can also install the software on your own system, should you need to (for example to make sure you can keep using it if the server is down, or to deal with structures that are large and cause gateway timeout errors on the server). A 64-bit Linux system is required. You do not need root (administrator) access on the system, and you can install it anywhere (somewhere under your home directory, for example). It only requires about 122 MB of disk space.
As described on the Pro-origami "About" page, Pro-origami consists of a large number of software components depending on many libraries, so the easiest way to install it is via the CDE (Code, Data, and Environment) package which you can download from http://munk.cis.unimelb.edu.au/pro-origami/proorigami-cde-package.tar.gz. I have also stored a copy at https://stivalaa.github.io/AcademicWebsite/software/proorigami-cde-package.tar.gz in case the original server is unavailable.
The instructions for doing this on the Pro-origami "About" page are very brief, and a little misleading, as third and final step shows the command
./make_cartoon.sh.cde /tmp/pdb1qlp.ent.gzto make a cartoon from a PDB file in the /tmp directory. This works, however it is misleading as it makes it appear you can use a PDB file from anywhere. However this is not really true, as the CDE environment means that it will only work from some locations; specifically those that are not intercepted by the CDE system. So it will fail if you try to use a PDB file from under /home or /usr, for example. The simplest solution is to simply copy the PDB files you want into the same directory as the make_cartoon.sh.cde script (which will be your working directory when you run it).
Here is a transcript of a shell session showing exactly how to install and use Pro-origami on your own system, to create a cartoon for the cryoEM structure of glutamate dehydrogenase from Thermococcus profundus in complex with NADP, PDB identifier 8HIZ (the commands you enter are in bold):
alex@alex-Inspiron-15-3567:~$ wget http://munk.cis.unimelb.edu.au/pro-origami/proorigami-cde-package.tar.gz--2023-02-09 09:07:48-- http://munk.cis.unimelb.edu.au/pro-origami/proorigami-cde-package.tar.gzResolving munk.cis.unimelb.edu.au (munk.cis.unimelb.edu.au)... 128.250.59.61Connecting to munk.cis.unimelb.edu.au (munk.cis.unimelb.edu.au)|128.250.59.61|:80... connected.HTTP request sent, awaiting response... 200 OKLength: 45761074 (44M) [application/x-gzip]Saving to: ‘proorigami-cde-package.tar.gz’proorigami-cde-pack 100%[===================>] 43.64M 3.00MB/s in 15s
2023-02-09 09:08:03 (2.97 MB/s) - ‘proorigami-cde-package.tar.gz’ saved [45761074/45761074]
alex@alex-Inspiron-15-3567:~$ tar zxf proorigami-cde-package.tar.gz alex@alex-Inspiron-15-3567:~$ cd proorigami-cde-package/cde-root/home/proorigami/alex@alex-Inspiron-15-3567:~/proorigami-cde-package/cde-root/home/proorigami$ cp ~/Downloads/8hiz.pdb.gz .alex@alex-Inspiron-15-3567:~/proorigami-cde-package/cde-root/home/proorigami$ ./make_cartoon.sh.cde 8hiz.pdb.gz !!! Residue GLN 13 A has 1 instead of expected 5 sidechain atoms. Calculated solvent accessibility refers to incomplete sidechain !!!
!!! Residue ASP 22 A has 1 instead of expected 4 sidechain atoms. Calculated solvent accessibility refers to incomplete sidechain !!!
!!! Residue LYS 113 A has 1 instead of expected 5 sidechain atoms. Calculated solvent accessibility refers to incomplete sidechain !!!
!!! Residue LYS 211 A has 1 instead of expected 5 sidechain atoms. Calculated solvent accessibility refers to incomplete sidechain !!!
!!! Residue ARG 246 A has 1 instead of expected 7 sidechain atoms. Calculated solvent accessibility refers to incomplete sidechain !!!
!!! Residue ARG 265 A has 1 instead of expected 7 sidechain atoms. Calculated solvent accessibility refers to incomplete sidechain !!!
!!! Residue GLU 266 A has 1 instead of expected 5 sidechain atoms. Calculated solvent accessibility refers to incomplete sidechain !!!
!!! Residue LYS 271 A has 1 instead of expected 5 sidechain atoms. Calculated solvent accessibility refers to incomplete sidechain !!!
!!! Residue LYS 304 A has 1 instead of expected 5 sidechain atoms. Calculated solvent accessibility refers to incomplete sidechain !!!
!!! Residue LYS 419 A has 6 instead of expected 5 sidechain atoms. last sidechain atom name is OXT calculated solvent accessibility includes extra atoms !!!
WARNING: (helix clustering) no reference strand for helix ALPHA ALPHAHELIX_A_8[366..391]WARNING: (helix clustering) no reference strand for helix ALPHA ALPHAHELIX_A_9[395..414]overlap count for 8HIZ-1.svg (default) is 0going back to to default gapsize (overlap count 0) for 8HIZ-1.svgoverlap count for 8HIZ-2.svg (default) is 0going back to to default gapsize (overlap count 0) for 8HIZ-2.svgGtk-Message: Failed to load module "gail"Gtk-Message: Failed to load module "atk-bridge"Gtk-Message: Failed to load module "canberra-gtk-module"Background RRGGBBAA: ffffffffArea 0:0:284:1193 exported to 284 x 1193 pixels (90 dpi)Bitmap saved as: 8HIZ-1.pngGtk-Message: Failed to load module "gail"Gtk-Message: Failed to load module "atk-bridge"Gtk-Message: Failed to load module "canberra-gtk-module"Background RRGGBBAA: ffffffffArea 0:0:1063:779 exported to 1063 x 779 pixels (90 dpi)Bitmap saved as: 8HIZ-2.pngalex@alex-Inspiron-15-3567:~/proorigami-cde-package/cde-root/home/proorigami$ display 8HIZ-1.png &alex@alex-Inspiron-15-3567:~/proorigami-cde-package/cde-root/home/proorigami$ inkscape 8HIZ-1.svg
Unfortunately, I no longer have access to log in to the Pro-origami server to fix problems or update documentation, and the CDE software seems to be no longer available anywhere, and links to its documentation on the Pro-origami "About" page no longer work. For more information on CDE, see:
Guo, P. J., & Engler, D. R. (2011, June). CDE: Using System Call Interposition to Automatically Create Portable Software Packages. In USENIX Annual technical conference (Vol. 21). https://www.usenix.org/legacy/event/atc11/tech/final_files/GuoEngler.pdf
Citation
If you use Pro-origami in your research, please cite:
Dwyer, T., Marriott, K., Wybrow, M. (2009). Dunnart: A Constraint-Based Network Diagram Authoring Tool. In: Tollis, I.G., Patrignani, M. (eds) Graph Drawing. GD 2008. Lecture Notes in Computer Science, vol 5417, pp. 420-431. Springer, Berlin, Heidelberg. doi:10.1007/978-3-642-00219-9_41
Stivala, A., Wybrow, M., Wirth, A., Whisstock, J. C., & Stuckey, P. J. (2011). Automatic generation of protein structure cartoons with Pro-origami. Bioinformatics, 27(23), 3315-3316. doi:10.1093/bioinformatics/btr575
Changing the options
The options for generating the cartoon can be changed by editing the PTGRAPH2_OPTIONS string in the make_cartoon.sh script. The available options can be shown by running the ptgraph2.py script with no parameters (note always running via cde-exec from the proorigami-cde-package/cde-root/home/proorigami directory):
[stivala@icslogin01 proorigami]$ pwd/home/stivala/proorigami-cde-package/cde-root/home/proorigami[stivala@icslogin01 proorigami]$ ../../../cde-exec ../../usr/local/proorigami-prod/ptgraph/ptgraph2.pyUsage: ../../usr/local/proorigami-prod/ptgraph/ptgraph2.py [-35acdnhrmvixgqzuwy] [ -o sse_color_scheme] [ -l connector_color_scheme ] [ -b sse_label_scheme ] [ -k <color> ] [ -g <separation> ] [ -e <color_list>|auto ] [ -f <color_list>|auto ] [-p domain_prog] [-t struct_prog] PDBfile -3 include 3_10 helices in diagram -5 include pi helices in diagram -a use Dunnart automatic graph layout -c use HELIX and SHEET cards from PDB file -d use GraphViz dot instead of Dunnart SVG -n use GraphViz neato instead of Dunnart SVG -p domain_prog : use domain_prog to parse domains supported is 'none' or 'ddomain' (default) or 'cath:cdf_file_name' -h graph hydrogen bonds with GraphViz -b SSE labelling scheme: 'none', 'sequential', 'separate' (default) -t struct_prog : use struct_prog define secondary structure supported is 'stride' or 'dssp' (default) -r compute angles internally, not with external TableauCreator -m write MATLAB M-files to plot strand axes -s write PyMOL .pml command file to show SSE definitions -v print verbose debugging messages to stderr -i use distance matrix information instead of heuristic/aesthetic algorithm for helix placement -j only valid when not using -i. Don't align helices on strand axes if they would push sheets apart -k <color> cluster helices, shading them all <color> -e <color_list>|auto shade nearby helix clusters the same color -x draw connector arrowheads -f <color_list>|auto shade each sheet a different color -g <separation> set the strand and minimum object separation -l connector color scheme: 'all[:<color>]' (default), 'chain[:<color_list>]', 'domain[:<intra_color>,<inter_color>'], crossing:<color_list> -o SSE color scheme: 'none' (default), 'simple:sheet=<sheet_colors>.helixcluster=<helixcluster_colors>.alpha=<helix_alpha_colors>.pi=<helix_pi_colors>.310=<helix_310_colors>.terminus=<terminus_colors>', 'gradient', 'sheet', 'fold' -u multidomain cartoon: place all domains in the one SVG file instead of one per file -w interdomain connectors: when using multidomain cartoons, draw connectors between domains (only in conjunction with -u) -q label start and end of helices and strands with first and last PDB residue id in that SSE. -y use uniform scaling to try to avoid overlaps. Ugly and often does not work anyway, use only as last resort -z print version information and exitFor example, the default is:
PTGRAPH2_OPTIONS="-r35 -t dssp -k purple -l crossing:black,red,green,navy,blue -b sequential -j -e auto -f auto -o gradient -p ddomain"
Protein substructure search
The Pro-origami server can also be used as a query interface to a substructure search program. The two programs it incorporates for this are QP Tableau Search and SA Tableau Search. Source code for these programs is available from my GitHub repository:
QP Tableau Search: Tableau-based protein substructure search using quadratic programming (Fortran)
SA Tableau Search: Fast and accurate protein substructure searching with simulated annealing and GPUs (CUDA C)
Related software
As I have not been able to update the Pro-origami server since 2017, the links to related software there are out of date, and there are some new ones published since then. Here are some other software or servers that may potentially be used as alternatives to Pro-origami (or indeed in one case which incorporate Pro-origami themselves).
Bond, C. S. (2003). TopDraw: a sketchpad for protein structure topology cartoons. Bioinformatics, 19(2), 311-312. doi.org/10.1093/bioinformatics/19.2.311
Westhead, D. R., Slidel, T. W., Flores, T. P., & Thornton, J. M. (1999). Protein structural topology: Automated analysis and diagrammatic representation. Protein Science, 8(4), 897-904. doi.org/10.1110/ps.8.4.897
Laskowski, R. A. (2001). PDBsum: summaries and analyses of PDB structures. Nucleic Acids Research, 29(1), 221-222. doi.org/10.1093/nar/29.1.221
You can generate PDBsum analyses (including diagram) for an uploaded PDB file with PDBsum Generate.
A standalone version of PDBsum Generate is available to download for local installation from PDBsum1.
Laskowski, R. A. (2022). PDBsum1: A standalone program for generating PDBsum analyses. Protein Science, 31(12), e4473. doi.org/10.1002/pro.4473
PDBsum generates secondary structure diagrams using PROMOTIF:
Hutchinson, E. G., & Thornton, J. M. (1996). PROMOTIF—a program to identify and analyze structural motifs in proteins. Protein Science, 5(2), 212-220. doi.org/10.1002/pro.5560050204
and HERA:
Hutchinson, E. G., & Thornton, J. M. (1990). HERA—a program to draw schematic diagrams of protein secondary structures. Proteins: Structure, Function, and Bioinformatics, 8(3), 203-212. doi.org/10.1002/prot.340080303
Wolf, J. N., Keßler, M., Ackermann, J., & Koch, I. (2021). PTGL: extension to graph-based topologies of cryo-EM data for large protein structures. Bioinformatics, 37(7), 1032-1034. doi.org/10.1093/bioinformatics/btaa706
Hutařová Vařeková, I., Hutař, J., Midlik, A., Horský, V., Hladká, E., Svobodová, R., & Berka, K. (2021). 2DProts: database of family-wide protein secondary structure diagrams. Bioinformatics, 37(23), 4599-4601. https://doi.org/10.1093/bioinformatics/btab505
Penev, P. I., McCann, H. M., Meade, C. D., Alvarez-Carreño, C., Maddala, A., Bernier, C. R., ... & Petrov, A. S. (2021). ProteoVision: web server for advanced visualization of ribosomal proteins. Nucleic Acids Research. 49(W1), W578-W588. https://doi.org/10.1093/nar/gkab351
"If an external 3D structure is provided instead using the ‘Upload a custom PDB’ option, ProteoVision takes it as an input for the Mol* Viewer, and generates a custom topology diagram using the Pro-Origami program (41,42)." [p. W582]
Midlik, A., Hutařová Vařeková, I., Hutař, J., Chareshneu, A., Berka, K., & Svobodová, R. (2022). OverProt: secondary structure consensus for protein families. Bioinformatics, 38(14), 3648-3650. https://doi.org/10.1093/bioinformatics/btac384
[Formatted as bullet points as tables do not seem to be available in the "new" Google Sites.]