Visualization of Protein Models

Roger S. Rowlett

Gordon & Dorothy Kline Professor, Emeritus

Colgate University Department of Chemistry

Pymol is a molecular rendering program that is capable of producing high-quality, publication image of protein structures from PDB files. Once you have solved an X-ray crystal structure, a rendering program like Pymol is used to create images that allow one to explore a structure in a meaningful way, and to selectively view and communicate the interesting features of your structure to your scientific peers.

Downloading and Installing Pymol

An unsupported version of Pymol for Windows XP or 7/8/10 (version 0.99) can be downloaded here. This version is not current and may be missing features in more current versions. Unzip all the the files in a directory, e.g. "Pymol" and double-click on setup.exe. You may need an unzipping program like WinZip to extract the installation files. Follow the appropriate instructions for installation. A current educational version of Pymol for the Mac or Windows can be downloaded directly from Schrodinger if you register. Of course you can always purchase a license for the full version.

Installing open-source Pymol

Linux

You can install the current version of open-source Pymol in Linux with the command snap install pymol-oss. You may have to install the snap package first.

Windows

Pre-compiled files for installing Pymol in Windows are available from the Gohlke laboratory at the University of California Irvine. You will need to install Python on your local machine and assemble several files:

  • Install Python 3.7 for Windows. Make sure you obtain the 64-bit executable installer file.
    • In the first installation window, check boxes to install python for all users, and add Python to your path
    • All options on the Optional Features installation window should be checked
    • In the Advanced Options installation window, check "install for all users" and verify your install location is C:\Program FIles\Python37. (If not, you can edit it.)
    • Complete the installation
  • Download the following package files and put them in an installation folder, e.g. C:\pymol-install. Verify that the files you download are appropriate for your version of Python and for 64-bit OS.
    • pip (e.g. , pip‑19.1.1‑py2.py3‑none‑any.whl is v.19.1.1 for Python 2 or 3)
    • numpy ( e.g., numpy‑1.16.4+mkl‑cp37‑cp37m‑win_amd64.whl is v.1.16.4 for Python 3.7 and 64-bit OS)
    • pymol (e.g, pymol‑2.3.0‑cp37‑cp37m‑win_amd64.whl is v.2.3.0 for Python 37 and 64-bit OS)
    • pymol_launcher (e.g., pymol_launcher‑2.1‑cp37‑cp37m‑win_amd64.whl is v.2.1 for Python 3.7 and 64-bit OS)
    • pmw (e.g., Pmw‑2.0.1‑py3‑none‑any.whl is v.2.0.1 for Python 3)
  • Open Command Prompt from the start menu and run as administrator
  • Navigate to C:\pymol-install
  • Issue the command following command (with the appropriate pip file name) to compile Pymol:
python pip-19.1.1-py2.py3-none-any.whl/pip install --no-index --find-links="%CD%" pymol_launcher
  • The executable pymol file (PyMOL.exe) will be found in the C:\Program Files\Python37 directory. Right click on this file and pin to your start menu or create a shortcut on your desktop or taskbar.
  • To update pymol in the future download the desired pymol wheel file and issue the following command, where pymol-xxx.whl is the new pymol wheel file:
pip install --upgrade --no-deps pymol-xxx.whl

Tutorial

The following tutorial is a crash course in Pymol. More documentation, tricks, tips, and secrets can be found in the Pymol wiki.

Downloading Protein Data Bank Coordinate Files

The Protein Data Bank is maintained by Rutgers University, and is the official world repository of protein and nucleic acid structures. Most of these structures have been determined by X-ray crystallographic methods. Each protein structure in the data bank has a unique 4-character identifier which is normally cited in scientific publications, For example, the structure of H. influenzae β-carbonic anhydrase complexed with bicarbonate ion, determined in our laboratory at Colgate, bears the identifier 2A8D. To retrieve a coordinate file, enter the PDB identifier in the search box, click on the download PDB icon right next to the identifier name, and save the file in an appropriate folder on your computer.

Alternatively, you can download files directly into pymol using the fetch command. For example, fetch 2A8D would load the PDB file with the identifier 2A8D directly into pymol and display it as lines.

Format of Protein Data Bank coordinate files

The following snippet is taken from the atomic coordinates section of a Protein Data Bank file. You can use Pymol to create selections for display based on this data. Each data line that starts with the text "ATOM" contains identifier information and atomic coordinates for a single atom. Left to right across each life after the "ATOM" identifier are:

  • Atom number (not usually used in Pymol)
  • Atom type (N = amide nitrogen, CA = alpha-carbon, C = carbonyl carbon, O = carbonyl oxygen, CB = beta-carbon, CG = gamma carbon, etc.). Pymol selector is name
  • Residue name (ALA, GLN, etc.) Pymol selector is resname or resn
  • Chain identifier (A, B, C, etc.) Pymol selector is chain
  • Residue number. Pymol selector is residue or resi
  • Cartesian coordinates (three numbers: X, Y, Z)
  • Occupancy (typically 1.0)
  • B-factor (a measure of coordinate uncertainty)
  • Atom type (C, N, O, ZN, etc.) Pymol selector is element or elem
ATOM    721  CD1 LEU A  88      -4.217  52.133  39.459  1.00 33.25           C  
ATOM    722  CD2 LEU A  88      -5.863  52.571  41.292  1.00 33.79           C  
ATOM    723  N   LYS A  89      -1.460  53.809  44.308  1.00 34.86           N  
ATOM    724  CA  LYS A  89      -0.285  53.535  45.133  1.00 35.87           C  
ATOM    725  C   LYS A  89       0.649  52.506  44.509  1.00 35.39           C  
ATOM    726  O   LYS A  89       1.290  51.734  45.222  1.00 36.79           O  
ATOM    727  CB  LYS A  89      -0.714  53.043  46.520  1.00 37.53           C  
ATOM    728  CG  LYS A  89      -1.526  54.047  47.316  1.00 38.84           C  
ATOM    729  CD  LYS A  89      -0.693  55.255  47.669  1.00 43.13           C  
ATOM    730  CE  LYS A  89      -1.462  56.215  48.556  1.00 46.49           C  
ATOM    731  NZ  LYS A  89      -2.679  56.722  47.869  1.00 49.51           N  
ATOM    732  N   ILE A  90       0.723  52.479  43.183  1.00 34.33           N  
ATOM    733  CA  ILE A  90       1.607  51.535  42.505  1.00 32.21           C  

Basic Pymol Commands

When Pymol is started, two windows will open. The upper window contains standard graphical pull-down menus that are largely self-explanatory. The lower window contains a viewer window and a selections window that will keep track of various portions of the displayed structure that you have identified. The main power of Pymol arises from its rich command language. Typing commands in either the upper or lower windows can alter the rendering of the molecule being viewed. Examples of commonly used commands are given below.

Mouse actions

You can control the orientation, clipping, and slabbing of the molecule with the mouse:

  • L-button drag rotates the molecule
  • M-button drag translates the molecule
  • R-button drag zooms the molecule
  • Shift-R-button drag changes the clipping planes” NW-SE changes the viewing slab depth; NE-SW changes the distance of the viewing slab from the viewer

Selections

Making selections, i.e. identifying specific substructures within the displayed molecule, is key to the operation of Pymol. Simple selections can be made by using the mouse to click on a portion of the structure. The mouse can be configured to select atoms, residues, chains, molecules, by clicking on the "Selecting" box in the Mouse Mode window. "Residues" is the default mouse selection mode. Clicking on a portion of the structure will also give you information about its atom type, residue number, and protein chain in the main (text) window.

In general, you will be able to make more powerful and specific selections by typing selection commands in either of the Pymol GUI windows. The general syntax of command line selections is select somename, selectiontype selecteditems, where

  • somename is a name you make up that can be used to represent the selection
  • selectiontype is a selection category:
    • resi selects residue numbers e.g. 1, 2, 5, 24, 327 etc.
    • resn selects residue names, e.g. GLY, ALA, ASP, LYS, GLU etc.
    • elem selects element types e.g. ZN, CA, CL, I, S etc. (If you have defined a charge state in the PDB file, the charge must be included in the selection as it appears in the PDB file, e.g. "ZN+2")
    • name selects PDB atom types, e.g. C, O, N, CA, CB, CG, CD etc.
    • chain selects a protein chain, e.g., A, B, C, D etc.
    • ss selects a secondary structural element, e.g. s (sheet), h (helix)
  • selecteditems represents items of the selection type:

Examples of selection syntax

Study these examples to see how you can use selections to define specific portions of a molecule or molecules described by a PDB file. Please note that Pymol selections are case-sensitive unless you issue the command set ignore_case, on first.

Command - Action

  • select protein, polymer - select everything in the molecule that is polymeric (i.e., protein) and name the object “protein”
  • select zinc, resn ZN- select everything in the molecule with residue name “ZN” and name the object “zinc”
  • select ligands, resi 94+96+119 and not name C+O+N - select side chains only of residues 94, 96, & 119 and name the object “ligands”
  • select achain, chain A - selects all atoms in chain A of the molecule and names the selection "achain"
  • select segment, resi 25-32 - selects all atoms for residues 25 through 32 and names the selection "segment"
  • select tetramer, chain A+B+C+D - select chains A, B, C, & D of a molecule and names the selection “tetramer”
  • select stuff, chain a and ss s and name C+O+N+CA+CB - selects only main chain and beta-carbon atoms in chain a that are in a beta sheet; selection is named "stuff"
  • select sidechain, name not c+o+n - selects all side chain atoms in a protein, including alpha carbons, and names the selection "sidechain". This is a useful selection to combine with other selections to display just sidechains including the alpha carbons.
  • select mainchain, name c+o+n+ca - selects main chain atoms in a protein, and names the selection "mainchain". This is a useful selection to combine with other selections to show just the main chain atoms.
  • select sheet, ss s - selects all residues in the molecule that have a β-sheet secondary structure and assigns it to the object name"sheet". The property “h” can be used to select α-helical secondary structural elements and “l” for loops. Note: The ss selection command selects only c-alpha atoms. If you wish to display sticks or lines, it is necessary to combine this command with an atom selection, e.g., 'select ss h & name C+O+N+CA+CB'. The name selection here will display only main chain and beta-carbons.
  • select segment, resi 155-167 and name C+N+O+CA+CB - select main chain and α- and β-carbon atoms of residues 155-167, and name the object “segment”
  • select chloride, elem CL - select all atoms of element type chlorine and name the object “chloride”
  • select water, resi 263 and resn HOH - select water molecule 263 and name the object “water”


Combining selections

Once selections are defined, you can combine them, e.g., select stuff, mysheet and mainchain and chain a. If mysheet and mainchain have already been defined, the new object stuff will be only mysheet atoms in chain A that also meet the criterion of the mainchain definition. Please note that the and operator is Boolean: that is, the object displayed will include only atoms that meet both criteria surrounding the and operator. The plus sign is the Boolean or, e.g. select stuff, resi 42+44+98+101. In this case, all atoms in residue numbers 42, 44, 98, and 101 will be combined in the selection stuff.

Display

You can change the display characteristics of selections (objects) by either typing commands in either window, or using the (A)ctions (S)how (H)ide (L)abel (C)olor buttons in the selections window. If you leave off the selection in any command, the action will be applied to the entire molecule! If you accidentally create a selection and want to delete it, or you want to rename a selection, use the (A)ctions button in the selections window.

Command -Action

  • hide everything - makes everything in all displayed objects invisible
  • show cartoon, protein - display the object “protein” in cartoon format
  • show sphere, zinc - display the object “zinc” as spheres
  • color white, zinc - color the object “zinc” white
  • show sticks, segment - show the object “segment” as sticks
  • show lines, segment - show the object “segment” as lines
  • show surface, protein - show the object “protein” as a molecular surface
    • (Note: the show surface command will ignore HETATM records if not selected. If your surface rendering has "holes" in it, it is likely that your solvent chain and or other non-protein atom coordinates are specified as ATOM records instead of HETATM records. You can edit the PDB file to convert these to HETATM records to avoid this.)


Settings

There are many settings in Pymol that can be used to customize the display. Examples of some commonly used settings are shown below:

Command - Action

  • zoom protein - zoom and center the display on the object “protein”
  • center zinc - center the molecule and its rotation axis on the object “zinc”
  • rebuild - re-construct the display including any alterations made by the alter command
  • png d:/docfiles/image.png - save the current view as a PNG file named image.png in the folder d:/docfiles/


Selected Pymol tasks

Altering Van der Waals radii

It is often necessary to alter VDW radii of atoms to account for ionic state or to make pretty ball and stick models. Pymol attaches the atomic radius to elements; ionic radii are considerably different.

  • To set global sphere radius by element, for example:
    • alter elem ZN, vdw=0.85
    • rebuild
  • To set sphere radius by selection, for example:
    • alter (resi 6 and resn HOH), vdw=vdw/2
    • rebuild


Sample Pymol session

The following Pymol session displays a ribbon structure of human carbonic anhydrase II (PDB 1CA2), highlights the active site residues, and displays a partially transparent molecular surface to show the active site cavity. Pymol commands that can be typed in either window are italicized. Other listed tasks are carried out using the top menu or the Pymol GUI menus.

File…Open… choose 1ca2.pdb

hide everything

select protein, polymer

show cartoon, protein

Color protein by ss using (C)olor menu

select zn, resn ZN

show sphere, zn

alter zn, vdw = 0.85

rebuild

select ligand, resi 94+96+119 and not name C+O+N

show sticks, ligand

Color ligand by element using (C)olor menu

show sticks, zn

select hoh, resn HOH and resi 263

show sphere, hoh

alter HOH, vdw=vdw/2

rebuild

show sticks, znoh

select his64, resi 64 and not name C+O+N

show sticks, his64

Color ligand by element using (C)olor menu

select his64, resi 64 and not name C+O+N

show sticks, his64

Color ligand by element using (C)olor menu

Advanced Tasks

Visualizing Electron Density Maps

One way to visualize electron density maps in Pymol is to use electron density map information from CCP4. A common task is producing publication-quality figures of "omit maps" to justify the interpretation of a bound ligand to a protein. Directions for producing such Fo-Fc omit maps are described elsewhere. The following instructions will allow you to read in an electron density map (e.g., 2Fo-Fc or Fo-Fc map) into pymol and display it.

  • Load the target .pdb file into Pymol
  • Render the molecule to look the way you want
  • Create a selection ,e.g., "site", describing the portion of the molecule around which you wish to see the electron density
  • Use File...Open... to read in the electron density map. It should have an extension of .ccp4
  • Center on the desired selection and set the desired viewport size
  • The following commands will create a nicely displayed structure with electron density suitable for publication. Comments appear after the commands in italics. You may want to modify these slightly as required:
    • map_double mymap.map, -1 doubles the sampling rate of mymap.map for a nicer display
    • isomesh map, mymap.map, 2.0, site, carve=1.6 creates an electron density map at 2.0 sigma around the selection site with a 1.6 Å buffer zone around the selection.
    • bgcolor white color the background white (or other color as desired)
    • set ray_opaque_background, 1 show opaque background color (if 0, background will be transparent)
    • color grey50, map set the map color to mid-gray
    • set ray_shadow, 0 turn off ray-tracing shadows
    • set mesh_width, 0.5 make the electron density meshes thinner
    • ray produce a ray-traced image
    • png myimage.png save a .png file image of your rendering
  • Labels can be added in Gimp or Photoshop

Pymol Resources

The Pymol Wiki is a comprehensive, user-based resource for all things Pymol. Includes user-installable Pymol extensions as well as a fairly complete description of Pymol settings and commands.

Citing Pymol

If you use Pymol in your work, you should properly cite it. The following format is suggested:

  • DeLano, W.L. The PyMOL Molecular Graphics System (2002) DeLano Scientific, Palo Alto, CA, USA. http://www.pymol.org

Coot

Examining or re-refining structures and electron density maps from the Protein Data Bank

Preparing data files

  • Download the atomic coordinates (a .pdb file) as a text file and save
  • Download the structure factors (a .gz file) and extract the .cif file using tar xvzf filename.gz (Linux) or WinZip (Windows)
  • Start CCP4i and open a task window from Reflection Data Utilities...Convert to/modify/extend MTZ
    • Select mmCIF as the input format
    • Select "Create full unique set of reflections and keep existing FreeR data"
    • Enter the input and output file paths and names (the output will be an .mtz file)
    • Assign sensible, short crystal, project, and dataset names (e.g. HICA, wt-enzyme, all)
    • Enter the space group (or space group number) and cell dimensions as given in the CRYST1 record of the PDB file
  • Run the task to generate the .mtz file

Direct visualization in Coot

  • Open Coot and load the coordinate (.pdb) file
  • Select Auto-Open MTZ and open the .mtz file
    • Coot will ask you from which molecule you want to calculate phases
    • Choose the molecule you loaded from the .pdb file
    • Coot will run an instance of Refmac to generate a 2Fo-Fc map (You must have CCP4 installed to do this.)
  • After a little number crunching, Coot will display the 2Fo-Fc map around your molecule.


Note: If you want a traditional display of 2Fo-Fc and Fo-Fc (difference) maps, use the alternate method of map generation described in the next section


Generation of Refmac-style maps for visualization in Coot

  • Open a Refmac task window and select rigid body refinement without prior phase information
  • Use your .pdb and .mtz files created from your Protein Data Bank downloads as input files
  • Choose sensible names for the output .pdb and .mtz files
  • Set appropriate refinement parameters as described elsewhere
  • Because this data is already highly refined, no more than 3-5 cycles of refinement is necessary to generate maps
  • Open Coot and load the coordinate file
  • Select Auto-Open MTZ and open the .mtz file
  • Coot will display the familiar blue 2Fo-Fc and red/green Fo-Fc difference maps
  • You can use the .pdb and .mtz files for further refinement or alteration of the structure if desired.