Understanding the Outputs

The PDB File

The outut PDB file contains the designed structure from the inference run. There should be one PDB file for each design and their names will depend on inference.output_prefix on inference.design_startnum. Remember that the output from RFdiffusion is ONLY THE BACKBONE STRUCTURE, any designed pieces of the backbone will be composed entirely of glycine residues and other tools, such as ProteinMPNN will then be able to perform sequence design based on that structure.

The .trb File

This file stores metadata about the specific inference run including the specific contig used along with the full contig used by RFdiffusion. The file also contains information about how residues in any given inputs map to the residues in the output structure. There should be one for each requested design, named the same as the corresponding PDB file, just with a different file extension.

If you would like to view the contents of this file, you can extract the information using the Python pickle module.

Example script:

import pickle

with open('path/to/your/outputs/output_file.trb', 'rb') as file:

data=pickle.load(file)

print(data)

A Rosetta Commons member, Federico Olivieri, has created a VS Code plugin that automatically unpickles TRB files when opened. You can find it here.

Note: The Python pickle module is not secure, only unpickle files from a known and trusted source. Learn more here.

TRB Variables

The TRB file contains information about what occurred in the inference process to generate each design. Below are variable definitions to help you understand the information in these output files. It is important to note that:

Here a residue number is the number of a residue in a PDB without chain ID and without resetting between chains. Residue numbering starts at 0.
A residue ID is the chain and residue number of a residue in the PDB file, it is reset between chains.

The following variables may or may not be present in your TRB files depending on the type of inference calculation you ran:

complex_con_hal_idx0: An array containing a list of residues taken from the reference structure. They are represented by their residue numbers in the generated structure.
complex_con_hal_pdb_idx: An array of arrays describing the residues taken from the reference structure. They are represented by their residue ID in the generated structure.
complex_con_ref_idx0: An array containing the residues taken from the reference structure. They are represented by their residue number in the reference PDB.
complex_con_ref_pdb_idx: An array of arrays of residues taken from the reference structure. They are represented by their residue ID in the reference PDB.
receptor_con_hal_idx0: An array of residues taken from the reference structure in non-designed chains. They are represented by their residue number in the generated structure.
receptor_con_hal_pdb_idx: An array of arrays containing the residues taken from the reference structure in non-designed chains. They are represented by their residue IDs in the generated structure.
receptor_con_ref_idx0: An array of residues taken from the reference structure in non-designed chains. They are represented by their residue numbers in the reference PDB.
receptor_con_ref_pdb_idx: An array of arrays containing residues from the reference structure in non-designed chains. They are represented by their residue IDs in the reference PDB.
con_hal_idx0: An array containing residues taken from the reference structure and present in the redesigned chains. They are represented by their residue numbers in the generated structure.
con_hal_pdb_idx: An array of arrays of residues taken from the reference structure and present in the redesigned chains. They are represented by their residue IDs in the generated structure.
con_ref_idx0: An array of residues taken from the reference structure and present in the redesigned chains. They are represented by their residue numbers in the reference PDB.
con_ref_pdb_idx: An array of arrays containing residues taken from the reference structure and present in the redesigned chains. They are represented by their residue IDs in the reference PDB.
config: This is a dictionary of dictionaries specifying the configuration for the inference run. Some of these will be settings you specified when running inference.py, others will have been chosen automatically based on the type of calculation.
inpaint_seq: An array of booleans where True means that the residue in the generated structure (represented by its residue number) was fixed during the inference calculation. If false that residue was inpainted. If you supplied an inpaint_seq array, this should be the same as your input, but possibly reordered based on the differences in the residue numbers for the reference and generated structures.
inpaint_str: An array of booleans where True means that the residue (represented by its residue number) came from the reference structure.
mask_1d: An array of booleans that determines if a given residue, represented by residue ID, is masked (diffused) (True) or unmasked (fixed) (False).
plddt: Stands for "predicted local distance difference test" and is a per-residue confidence metric originally introduced by AlphaFold.
sampled_max: An array of strings representing the block of a protein chain that are to be "sampled" or inpainted by the model. It is the result of parsing the contig string.
time: This ist he time in seconds it took for the inference calculation to complete.
device: It is either CPU or GPU based on whether RFdiffusion was able to detect CUDA.

The Trajectory Files

Trajectory files are automatically placed in a traj folder within the directory that your output PDB and .trb files are being saved. These files can be visualized in PyMol as multi-step PDBs, but note that they are ordered in reverse! The first PDB is for the t=1 (last) predicution made by RFdiffusion during inference. This is due to how the generative (generating the backbone structures from model data) process is discussed in the literature. It is seen as the reverse of the noising process. See the original RFdiffusion Nature paper, specifically Figure 1a,b for more information. There will be two trajectory files for each designed backbone, one labeled pX0 which stores what the model predicted at each timestep, and one labeled Xt-1 which stores the structure that was fed into the model at each timestep.

Page updated

Report abuse

Understanding the Outputs

Table of Contents

The PDB File

The .trb File

TRB Variables

The Trajectory Files