Introducing ProteinPrep: Your Automated Preparation Tool
ProteinPrep is an automated, open-source pipeline built using Python scripting. It is designed to streamline and standardize the process of preparing protein structures.
ProteinPrep is a small, user-friendly tool that fetches PDB structures, cleans them (removes water/heteroatoms), optionally retains only selected chains, protonates (adds hydrogens) using OpenBabel, and converts to PDBQT for AutoDock Vina. It supports both a command-line interface (CLI) and a graphical user interface (GUI). 🧪
Features of ProteinPrep
Download PDB by ID from the RCSB database with built-in retries.
Clean PDB: Remove water molecules and heteroatoms.
Select Chains: Keep only specified chains (e.g., --keep-chains A,C).
Keep Ligands: Optionally preserve specific heteroatoms/ligands
Add hydrogen: Add hydrogens (--auto-add-h).
Convert: Generate PDBQT files for docking (--auto-pdbqt).
Batch Mode: Process multiple PDB IDs from a file (--batch-file).
Simple GUI: An intuitive graphical interface for all features.
Logging: Generates a proteinprep_log.json file with a full report of the operations performed.
Please refer to my GitHub repository for instructions on using ProteinPrep via the Command Line Interface (CLI) and Graphical User Interface (GUI). We will utilize Google Colab here for those who are not familiar with coding.
Using ProteinPrep via Google Colab
Google Colab is a cloud-hosted Jupyter notebook environment with free GPU/CPU that can run Python scripts. Here’s how to run your CLI ProteinPrep tool in Colab efficiently:
Step 1: Upload your script and dependencies to your Colab notebook:
# upload the proteinprep.py directly to the session filesystem
!wget https://raw.githubusercontent.com/akinwumiishola5000/Proteinprep/main/proteinprep.py
# Install required Python packages
!pip install typer requests
Step 2: Install OpenBabel in Colab OpenBabel is required for protonation and PDBQT conversion steps
!apt-get update -qq !apt-get install -y openbabel
#Confirm installation:
!obabel -V
Step 3: Run the proteinprep.py script. You can run the script directly.
!python3 proteinprep.py 8fv4 --auto-add-h --auto-pdbqt --keep-chains A,B
This will: Download PDB 8fv4 Clean it Keep only chains A and B Protonate with OpenBabel Convert to PDBQT format Output files will be saved in the Colab session’s current directory (/content).
You can run the script if you don't have a preference for chains.
!python3 proteinprep.py 8fv4 --auto-add-h --auto-pdbqt
Step 4: Download the output files from Colab. You can either: Use the file browser in Colab (left pane → Files) to manually download files, OR Use the code to download programmatically:
from google.colab import files
files.download('8FV4_clean.pdb')
files.download('8FV4_protonated.pdb')
files.download('8FV4.pdbqt')
files.download('proteinprep_log.json')
Instead of running the code one by one in a separate cell, you can copy all the code and run it once, ensuring that you copy all the correct code.
This file is now ready for the next stage of your CADD project! Open it in a molecular viewer like PyMOL or Chimera. You will see that the water molecules, extra chains, and any experimental artifacts are gone, and all necessary atoms, including hydrogens, have been added.
Congratulations! 😅 You have successfully transformed a raw structure into a high-quality model, ready for drug discovery. This is a foundational skill in computational chemistry and biology.