Visualization of Protein Models
Roger S. Rowlett
Gordon & Dorothy Kline Professor, Emeritus
Colgate University Department of Chemistry
Pymol is a molecular rendering program that is capable of producing high-quality, publication image of protein structures from PDB files. Once you have solved an X-ray crystal structure, a rendering program like Pymol is used to create images that allow one to explore a structure in a meaningful way, and to selectively view and communicate the interesting features of your structure to your scientific peers.
Downloading and Installing Pymol
An unsupported version of Pymol for Windows XP or 7/8/10 (version 0.99) can be downloaded here. This version is not current and may be missing features in more current versions. Unzip all the the files in a directory, e.g. "Pymol" and double-click on setup.exe. You may need an unzipping program like WinZip to extract the installation files. Follow the appropriate instructions for installation. A current educational version of Pymol for the Mac or Windows can be downloaded directly from Schrodinger if you register. Of course you can always purchase a license for the full version.
Installing open-source Pymol
Linux
You can install the current version of open-source Pymol in Linux with the command snap install pymol-oss. You may have to install the snap package first.
Windows
Pre-compiled files for installing Pymol in Windows are available from the Gohlke laboratory at the University of California Irvine. You will need to install Python on your local machine and assemble several files:
- Install Python 3.7 for Windows. Make sure you obtain the 64-bit executable installer file.
- In the first installation window, check boxes to install python for all users, and add Python to your path
- All options on the Optional Features installation window should be checked
- In the Advanced Options installation window, check "install for all users" and verify your install location is C:\Program FIles\Python37. (If not, you can edit it.)
- Complete the installation
- Download the following package files and put them in an installation folder, e.g. C:\pymol-install. Verify that the files you download are appropriate for your version of Python and for 64-bit OS.
- pip (e.g. , pip‑19.1.1‑py2.py3‑none‑any.whl is v.19.1.1 for Python 2 or 3)
- numpy ( e.g., numpy‑1.16.4+mkl‑cp37‑cp37m‑win_amd64.whl is v.1.16.4 for Python 3.7 and 64-bit OS)
- pymol (e.g, pymol‑2.3.0‑cp37‑cp37m‑win_amd64.whl is v.2.3.0 for Python 37 and 64-bit OS)
- pymol_launcher (e.g., pymol_launcher‑2.1‑cp37‑cp37m‑win_amd64.whl is v.2.1 for Python 3.7 and 64-bit OS)
- pmw (e.g., Pmw‑2.0.1‑py3‑none‑any.whl is v.2.0.1 for Python 3)
- Open Command Prompt from the start menu and run as administrator
- Navigate to C:\pymol-install
- Issue the command following command (with the appropriate pip file name) to compile Pymol:
python pip-19.1.1-py2.py3-none-any.whl/pip install --no-index --find-links="%CD%" pymol_launcher
- The executable pymol file (PyMOL.exe) will be found in the C:\Program Files\Python37 directory. Right click on this file and pin to your start menu or create a shortcut on your desktop or taskbar.
- To update pymol in the future download the desired pymol wheel file and issue the following command, where pymol-xxx.whl is the new pymol wheel file:
pip install --upgrade --no-deps pymol-xxx.whl
Tutorial
The following tutorial is a crash course in Pymol. More documentation, tricks, tips, and secrets can be found in the Pymol wiki.
Downloading Protein Data Bank Coordinate Files
The Protein Data Bank is maintained by Rutgers University, and is the official world repository of protein and nucleic acid structures. Most of these structures have been determined by X-ray crystallographic methods. Each protein structure in the data bank has a unique 4-character identifier which is normally cited in scientific publications, For example, the structure of H. influenzae β-carbonic anhydrase complexed with bicarbonate ion, determined in our laboratory at Colgate, bears the identifier 2A8D. To retrieve a coordinate file, enter the PDB identifier in the search box, click on the download PDB icon right next to the identifier name, and save the file in an appropriate folder on your computer.
Alternatively, you can download files directly into pymol using the fetch command. For example, fetch 2A8D would load the PDB file with the identifier 2A8D directly into pymol and display it as lines.
Format of Protein Data Bank coordinate files
The following snippet is taken from the atomic coordinates section of a Protein Data Bank file. You can use Pymol to create selections for display based on this data. Each data line that starts with the text "ATOM" contains identifier information and atomic coordinates for a single atom. Left to right across each life after the "ATOM" identifier are:
- Atom number (not usually used in Pymol)
- Atom type (N = amide nitrogen, CA = alpha-carbon, C = carbonyl carbon, O = carbonyl oxygen, CB = beta-carbon, CG = gamma carbon, etc.). Pymol selector is name
- Residue name (ALA, GLN, etc.) Pymol selector is resname or resn
- Chain identifier (A, B, C, etc.) Pymol selector is chain
- Residue number. Pymol selector is residue or resi
- Cartesian coordinates (three numbers: X, Y, Z)
- Occupancy (typically 1.0)
- B-factor (a measure of coordinate uncertainty)
- Atom type (C, N, O, ZN, etc.) Pymol selector is element or elem
ATOM 721 CD1 LEU A 88 -4.217 52.133 39.459 1.00 33.25 C
ATOM 722 CD2 LEU A 88 -5.863 52.571 41.292 1.00 33.79 C
ATOM 723 N LYS A 89 -1.460 53.809 44.308 1.00 34.86 N
ATOM 724 CA LYS A 89 -0.285 53.535 45.133 1.00 35.87 C
ATOM 725 C LYS A 89 0.649 52.506 44.509 1.00 35.39 C
ATOM 726 O LYS A 89 1.290 51.734 45.222 1.00 36.79 O
ATOM 727 CB LYS A 89 -0.714 53.043 46.520 1.00 37.53 C
ATOM 728 CG LYS A 89 -1.526 54.047 47.316 1.00 38.84 C
ATOM 729 CD LYS A 89 -0.693 55.255 47.669 1.00 43.13 C
ATOM 730 CE LYS A 89 -1.462 56.215 48.556 1.00 46.49 C
ATOM 731 NZ LYS A 89 -2.679 56.722 47.869 1.00 49.51 N
ATOM 732 N ILE A 90 0.723 52.479 43.183 1.00 34.33 N
ATOM 733 CA ILE A 90 1.607 51.535 42.505 1.00 32.21 C
Basic Pymol Commands
When Pymol is started, two windows will open. The upper window contains standard graphical pull-down menus that are largely self-explanatory. The lower window contains a viewer window and a selections window that will keep track of various portions of the displayed structure that you have identified. The main power of Pymol arises from its rich command language. Typing commands in either the upper or lower windows can alter the rendering of the molecule being viewed. Examples of commonly used commands are given below.
Mouse actions
You can control the orientation, clipping, and slabbing of the molecule with the mouse:
- L-button drag rotates the molecule
- M-button drag translates the molecule
- R-button drag zooms the molecule
- Shift-R-button drag changes the clipping planes” NW-SE changes the viewing slab depth; NE-SW changes the distance of the viewing slab from the viewer
Selections
Making selections, i.e. identifying specific substructures within the displayed molecule, is key to the operation of Pymol. Simple selections can be made by using the mouse to click on a portion of the structure. The mouse can be configured to select atoms, residues, chains, molecules, by clicking on the "Selecting" box in the Mouse Mode window. "Residues" is the default mouse selection mode. Clicking on a portion of the structure will also give you information about its atom type, residue number, and protein chain in the main (text) window.
In general, you will be able to make more powerful and specific selections by typing selection commands in either of the Pymol GUI windows. The general syntax of command line selections is select somename, selectiontype selecteditems, where
- somename is a name you make up that can be used to represent the selection
- selectiontype is a selection category:
- resi selects residue numbers e.g. 1, 2, 5, 24, 327 etc.
- resn selects residue names, e.g. GLY, ALA, ASP, LYS, GLU etc.
- elem selects element types e.g. ZN, CA, CL, I, S etc. (If you have defined a charge state in the PDB file, the charge must be included in the selection as it appears in the PDB file, e.g. "ZN+2")
- name selects PDB atom types, e.g. C, O, N, CA, CB, CG, CD etc.
- chain selects a protein chain, e.g., A, B, C, D etc.
- ss selects a secondary structural element, e.g. s (sheet), h (helix)
- selecteditems represents items of the selection type:
Examples of selection syntax
Study these examples to see how you can use selections to define specific portions of a molecule or molecules described by a PDB file. Please note that Pymol selections are case-sensitive unless you issue the command set ignore_case, on first.
Command - Action
- select protein, polymer - select everything in the molecule that is polymeric (i.e., protein) and name the object “protein”
- select zinc, resn ZN- select everything in the molecule with residue name “ZN” and name the object “zinc”
- select ligands, resi 94+96+119 and not name C+O+N - select side chains only of residues 94, 96, & 119 and name the object “ligands”
- select achain, chain A - selects all atoms in chain A of the molecule and names the selection "achain"
- select segment, resi 25-32 - selects all atoms for residues 25 through 32 and names the selection "segment"
- select tetramer, chain A+B+C+D - select chains A, B, C, & D of a molecule and names the selection “tetramer”
- select stuff, chain a and ss s and name C+O+N+CA+CB - selects only main chain and beta-carbon atoms in chain a that are in a beta sheet; selection is named "stuff"
- select sidechain, name not c+o+n - selects all side chain atoms in a protein, including alpha carbons, and names the selection "sidechain". This is a useful selection to combine with other selections to display just sidechains including the alpha carbons.
- select mainchain, name c+o+n+ca - selects main chain atoms in a protein, and names the selection "mainchain". This is a useful selection to combine with other selections to show just the main chain atoms.
- select sheet, ss s - selects all residues in the molecule that have a β-sheet secondary structure and assigns it to the object name"sheet". The property “h” can be used to select α-helical secondary structural elements and “l” for loops. Note: The ss selection command selects only c-alpha atoms. If you wish to display sticks or lines, it is necessary to combine this command with an atom selection, e.g., 'select ss h & name C+O+N+CA+CB'. The name selection here will display only main chain and beta-carbons.
- select segment, resi 155-167 and name C+N+O+CA+CB - select main chain and α- and β-carbon atoms of residues 155-167, and name the object “segment”
- select chloride, elem CL - select all atoms of element type chlorine and name the object “chloride”
- select water, resi 263 and resn HOH - select water molecule 263 and name the object “water”
Combining selections
Once selections are defined, you can combine them, e.g., select stuff, mysheet and mainchain and chain a. If mysheet and mainchain have already been defined, the new object stuff will be only mysheet atoms in chain A that also meet the criterion of the mainchain definition. Please note that the and operator is Boolean: that is, the object displayed will include only atoms that meet both criteria surrounding the and operator. The plus sign is the Boolean or, e.g. select stuff, resi 42+44+98+101. In this case, all atoms in residue numbers 42, 44, 98, and 101 will be combined in the selection stuff.
Display
You can change the display characteristics of selections (objects) by either typing commands in either window, or using the (A)ctions (S)how (H)ide (L)abel (C)olor buttons in the selections window. If you leave off the selection in any command, the action will be applied to the entire molecule! If you accidentally create a selection and want to delete it, or you want to rename a selection, use the (A)ctions button in the selections window.
Command -Action
- hide everything - makes everything in all displayed objects invisible
- show cartoon, protein - display the object “protein” in cartoon format
- show sphere, zinc - display the object “zinc” as spheres
- color white, zinc - color the object “zinc” white
- show sticks, segment - show the object “segment” as sticks
- show lines, segment - show the object “segment” as lines
- show surface, protein - show the object “protein” as a molecular surface
- (Note: the show surface command will ignore HETATM records if not selected. If your surface rendering has "holes" in it, it is likely that your solvent chain and or other non-protein atom coordinates are specified as ATOM records instead of HETATM records. You can edit the PDB file to convert these to HETATM records to avoid this.)
Settings
There are many settings in Pymol that can be used to customize the display. Examples of some commonly used settings are shown below:
Command - Action
- zoom protein - zoom and center the display on the object “protein”
- center zinc - center the molecule and its rotation axis on the object “zinc”
- rebuild - re-construct the display including any alterations made by the alter command
- png d:/docfiles/image.png - save the current view as a PNG file named image.png in the folder d:/docfiles/
Selected Pymol tasks
Altering Van der Waals radii
It is often necessary to alter VDW radii of atoms to account for ionic state or to make pretty ball and stick models. Pymol attaches the atomic radius to elements; ionic radii are considerably different.
- To set global sphere radius by element, for example:
- alter elem ZN, vdw=0.85
- rebuild
- To set sphere radius by selection, for example:
- alter (resi 6 and resn HOH), vdw=vdw/2
- rebuild
Sample Pymol session
The following Pymol session displays a ribbon structure of human carbonic anhydrase II (PDB 1CA2), highlights the active site residues, and displays a partially transparent molecular surface to show the active site cavity. Pymol commands that can be typed in either window are italicized. Other listed tasks are carried out using the top menu or the Pymol GUI menus.
File…Open… choose 1ca2.pdb
hide everything
select protein, polymer
show cartoon, protein
Color protein by ss using (C)olor menu
select zn, resn ZN
show sphere, zn
alter zn, vdw = 0.85
rebuild
select ligand, resi 94+96+119 and not name C+O+N
show sticks, ligand
Color ligand by element using (C)olor menu
show sticks, zn
select hoh, resn HOH and resi 263
show sphere, hoh
alter HOH, vdw=vdw/2
rebuild
show sticks, znoh
select his64, resi 64 and not name C+O+N
show sticks, his64
Color ligand by element using (C)olor menu
select his64, resi 64 and not name C+O+N
show sticks, his64
Color ligand by element using (C)olor menu
Advanced Tasks
Visualizing Electron Density Maps
One way to visualize electron density maps in Pymol is to use electron density map information from CCP4. A common task is producing publication-quality figures of "omit maps" to justify the interpretation of a bound ligand to a protein. Directions for producing such Fo-Fc omit maps are described elsewhere. The following instructions will allow you to read in an electron density map (e.g., 2Fo-Fc or Fo-Fc map) into pymol and display it.
- Load the target .pdb file into Pymol
- Render the molecule to look the way you want
- Create a selection ,e.g., "site", describing the portion of the molecule around which you wish to see the electron density
- Use File...Open... to read in the electron density map. It should have an extension of .ccp4
- Center on the desired selection and set the desired viewport size
- The following commands will create a nicely displayed structure with electron density suitable for publication. Comments appear after the commands in italics. You may want to modify these slightly as required:
- map_double mymap.map, -1 doubles the sampling rate of mymap.map for a nicer display
- isomesh map, mymap.map, 2.0, site, carve=1.6 creates an electron density map at 2.0 sigma around the selection site with a 1.6 Å buffer zone around the selection.
- bgcolor white color the background white (or other color as desired)
- set ray_opaque_background, 1 show opaque background color (if 0, background will be transparent)
- color grey50, map set the map color to mid-gray
- set ray_shadow, 0 turn off ray-tracing shadows
- set mesh_width, 0.5 make the electron density meshes thinner
- ray produce a ray-traced image
- png myimage.png save a .png file image of your rendering
- Labels can be added in Gimp or Photoshop
Pymol Resources
The Pymol Wiki is a comprehensive, user-based resource for all things Pymol. Includes user-installable Pymol extensions as well as a fairly complete description of Pymol settings and commands.
Citing Pymol
If you use Pymol in your work, you should properly cite it. The following format is suggested:
- DeLano, W.L. The PyMOL Molecular Graphics System (2002) DeLano Scientific, Palo Alto, CA, USA. http://www.pymol.org
Coot
Examining or re-refining structures and electron density maps from the Protein Data Bank
Preparing data files
- Download the atomic coordinates (a .pdb file) as a text file and save
- Download the structure factors (a .gz file) and extract the .cif file using tar xvzf filename.gz (Linux) or WinZip (Windows)
- Start CCP4i and open a task window from Reflection Data Utilities...Convert to/modify/extend MTZ
- Select mmCIF as the input format
- Select "Create full unique set of reflections and keep existing FreeR data"
- Enter the input and output file paths and names (the output will be an .mtz file)
- Assign sensible, short crystal, project, and dataset names (e.g. HICA, wt-enzyme, all)
- Enter the space group (or space group number) and cell dimensions as given in the CRYST1 record of the PDB file
- Run the task to generate the .mtz file
Direct visualization in Coot
- Open Coot and load the coordinate (.pdb) file
- Select Auto-Open MTZ and open the .mtz file
- Coot will ask you from which molecule you want to calculate phases
- Choose the molecule you loaded from the .pdb file
- Coot will run an instance of Refmac to generate a 2Fo-Fc map (You must have CCP4 installed to do this.)
- After a little number crunching, Coot will display the 2Fo-Fc map around your molecule.
Note: If you want a traditional display of 2Fo-Fc and Fo-Fc (difference) maps, use the alternate method of map generation described in the next section
Generation of Refmac-style maps for visualization in Coot
- Open a Refmac task window and select rigid body refinement without prior phase information
- Use your .pdb and .mtz files created from your Protein Data Bank downloads as input files
- Choose sensible names for the output .pdb and .mtz files
- Set appropriate refinement parameters as described elsewhere
- Because this data is already highly refined, no more than 3-5 cycles of refinement is necessary to generate maps
- Open Coot and load the coordinate file
- Select Auto-Open MTZ and open the .mtz file
- Coot will display the familiar blue 2Fo-Fc and red/green Fo-Fc difference maps
- You can use the .pdb and .mtz files for further refinement or alteration of the structure if desired.