AlphaFold2

AlphaFold is intended to provide accurate protein structure predictions[1]. It is installed locally, meaning that it is not required to use containers to run AlphaFold.

Important Notes:

Check bug reports on the AlphaFold github - https://github.com/deepmind/alphafold
Try reduced database search if you encounter memory issues, use GPU with more memory (check HPC resources, High Memory GPU job). For example, in pioneer cluster in aisc partition, the GPU has up to 80gb of GPU memory.

Quickstart on Running Alphafold2

Access pioneer cluster (pioneer.case.edu) - check Quickstart Guide.

Submit the job (uses 2 GPUs and the reduced database for search):

sbatch /mnt/pan/AlphaFoldScripts/multimer_reduced_db.slurm <your_seq_file_name>

Please review the script and make a copy specific to your research needs if you would like to modify the search. Refer to documentation appropriate to your own data and objectives to determine which runtime flags to set when launching AlphaFold.

Code and Database organization

The database files are stored at /mnt/pan/AlphaFold031023

Batch Job Script

You will pass the FASTA file name to this job script, which is then accessible as the shell variable $1 from within the script.

#!/bin/bash

#SBATCH -p gpu

#SBATCH --gres=gpu:2

#SBATCH --mem=100g

#SBATCH -c 10

#SBATCH -o alphafold_%j.log

# Copy the sequence to scratch

cp $1 $PFSDIR

mkdir $PFSDIR/output

# Setup environment

module purge

module load AlphaFold/2.3.1-foss-2022a-CUDA-11.7.0

# Run AlphaFold

export ALPHAFOLD_DATA_DIR="/mnt/pan/AlphaFold031023"

/usr/local/easybuild_allnodes/software/AlphaFold/2.3.1-foss-2022a-CUDA-11.7.0/bin/run_alphafold.py \

--fasta_paths=$PFSDIR/$1 \

--data_dir=$ALPHAFOLD_DATA_DIR \

--output_dir=$PFSDIR/output \

--max_template_date=2022-11-01 \

--model_preset=multimer \

--db_preset=reduced_dbs \

--use_gpu_relax=True \

--num_multimer_predictions_per_model=2

# Copy result back

cp -r $PFSDIR/output $SLURM_SUBMIT_DIR/alphafold_result_$SLURM_JOBID

Search and Output

The numerous stages of the AlphaFold workflow will be reported in the output. A simple example is offered here to illustrate the different sections. The T1050 single-sequence data was obtained as a fasta file, and passed to AlphaFold. Typically sections involve:

JackHammr (2 stages)
- Finished Jackhammr (uniref90.fasta) query in 455.763 seconds
- Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 472.964 seconds
HHSearch
- Finished HHsearch query in 146.808 seconds
HHBLITS
- Finished HHblits query in 7292.667 seconds
Runing Model, Invoking Tensorflow
- run_alphafold.py:225] Running model model_1_ptm on T1050
  - 1. 1. model.py:165] Running predict with shape(feat) = {'aatype': (4, 779), 'residue_index': (4, 779), ....... ]
        run_alphafold.py:237] Total JAX model model_1_ptm on T1050 predict time (includes compilation time, see --benchmark): 574.1s
- run_alphafold.py:225] Running model model_2_ptm on T1050
  - 1. 1. model.py:165] Running predict with shape(feat) = {'aatype': (4, 779), 'residue_index': (4, 779), ........ ]
        run_alphafold.py:237] Total JAX model model_2_ptm on T1050 predict time (includes compilation time, see --benchmark): 489.1s
- run_alphafold.py:225] Running model model_3_ptm on T1050
- run_alphafold.py:225] Running model model_4_ptm on T1050
- run_alphafold.py:225] Running model model_5_ptm on T1050
- run_alphafold.py:306] Final timings for T1050: {'features': 8557.561393976212, 'process_features_model_1_ptm': 38.50683522224426, ' predict_and_compile_model_1_ptm': 574.1196734905243, 'process_features_model_2_ptm': 8.790717363357544, 'predict_and_compile_model_2_ptm': 489.09383630752563, 'process_f eatures_model_3_ptm': 8.145345687866211, 'predict_and_compile_model_3_ptm': 465.3740735054016, 'process_features_model_4_ptm': 7.742178916931152, 'predict_and_compile_mo del_4_ptm': 461.7440278530121, 'process_features_model_5_ptm': 7.70080304145813, 'predict_and_compile_model_5_ptm': 424.1679847240448}

Full sample output for T1050: alphafold_T1050.out

References

[1] https://www.nature.com/articles/s41586-021-03819-2

Page updated

Report abuse