This documentation is in Beta mode! Please give feedback by suggesting/commenting in embedded documents or via the Feedback Form!
If you have any questions/suggestions, please leave them as comments in the embedded google sheet!
The embedded documents can be opened in a new tab/window by clicking on the grey box that appears in the upper right hand corner of the document when you hover over it.
This is a comprehensive list of the different configuration options that users can modify in RFdiffusion. Configuration options are the pieces of information given to RFdiffusion via command line to specify the backbone structure that is being generated.
The spreadsheets and information below are meant to be used as reference documentation. If you are a new user we recommend starting with the Unconditional Monomer Generation tutorial and going through some of the examples in the GitHub repo to learn how to use RFdiffusion.
While information about contigs is given in the contigmap section below, it is the key piece of information for running inference calculations and has some unique formatting rules which will be discussed here.
Due to the use of Hydra, the contigs must be passed as a single-item list in single quotes, for example 'contigmap.contigs=[140-160]'. In this example, if no other configuration options are included, this would tell RFdiffusion to generate a backbone that is at least 140aa (150 amino acids) long and at most 160aa. If you want a protein that is exactly a certain length you can make the lower bound and the upper bound the same: 'contigmap.contigs=[150-150]'.
But what if we don't want to just generate an entirely new backbone? What if we have pieces or motifs that we know we want to have in the protein structure? The input structure(s) will need to be given to the calculation in PDB format using inference.input_pdb (see below). Anything in the contig string that is prefixed by a letter is assumed to be a motif, the letter must correspond to the chain letter in the given PDB file. Anything not prefixed by a letter is assumed to need building. / is used to differentiate between different portions of the structure. So the contig string 'contigmap.contigs=[5-15/A10-25/30-40]' tells RFdiffusion to build 5-15 residues N-terminally on the A10-25 chain from the input PDB, then build an additional 30-40 C-terminally. The only residues taken from the input PDB will be those represented by A10-25, even if other chains are included in the input structure.
Finally, if you would like to specify a break in the input chain, it can be done using /0. This tells RFdiffusion to add a large residue jump (200aa) to the input so that the model sees it as a separate chain. So something like 'contigmap.contigs=[5-15/A10-25/30-40/0 B1-100]' will place a break between the protein chain built as described above and the first 100aa of the B chain in the input PDB.
This notation will be used for all other configuration options that define the backbone structure to be generated.
○ [x-y] tells RFdiffusion to generate a backbone with a randomly sampled length of at least x and at most y amino acids. If x and y are the same a chain of exactly that length will be generated.
○ [Ax-y] tells RFdiffusion to use the x-y residues of chain A from an input PDB structure in the model.
○ / is used to denote different contig instructions
○ /0 is used to denote a chain break
These are the configuration options that can be found in config/inference/base.yaml. There are three tabs in the embedded spreadsheet:
○ Commonly Used Options: these are the configuration options that are commonly used for backbone generation with RFdiffusion these options are also all used in at least one of the example scripts provided in theGitHub repository.
○ Advanced Options: these options are not used in any of the examples in the GitHub repository and typically have much more niche uses.
○ Should NOT be Changed: these options should not be changed by the user either because it might cause the inference run to produce unrealistic backbones or because these options do not change the results of the inference run.
These are configuration settings that can be used in inference runs sampling symmetric assemblies, these options can be found in config/inference/symmetry.yaml.