A simplified CADD pipeline often looks like the following
Target protein sequence (FASTA) → Structure (AlphaFold / homology / PDB, or already available 3D structure) → Docking / Virtual screening → Molecular Dynamics Simulation → Analysis.
In this Tutorial, we are going to focus on the first block, which is the protein, and how to manipulate a protein sequence and other important modifications.
You can access the Google Colab here; it contains all the code used in this tutorial: Google Colab.
Google Colab Setup
Open a new notebook
Click New Notebook.
To run the code present in a cell, press Shift + Enter (Figure 1)
Figure 1: Google colab cell
Copy/paste into a code cell, pip is the installer. -q means “quiet” (less output) (Figure 2).
Code:
!pip -q install biopython
Figure 2: Installation of Biopython
If a version prints without errors, you’re ready (Figure 3).
Copy/paste into a code cell
Code:
import Bio
print("Biopython version:", Bio.__version__)
Figure 3: Confirmation of Biopython installation.