AlphaFold is intended to provide accurate protein structure predictions[1]. It is installed locally, meaning that it is not required to use containers to run AlphaFold.
Important Notes:
Check bug reports on the AlphaFold github - https://github.com/deepmind/alphafold
Try reduced database search if you encounter memory issues, use GPU with more memory (check HPC resources, High Memory GPU job). For example, in pioneer cluster in aisc partition, the GPU has up to 80gb of GPU memory.
Access pioneer cluster (pioneer.case.edu) - check Quickstart Guide.
Submit the job (uses 2 GPUs and the reduced database for search):
sbatch /mnt/pan/AlphaFoldScripts/multimer_reduced_db.slurm <your_seq_file_name>
Please review the script and make a copy specific to your research needs if you would like to modify the search. Refer to documentation appropriate to your own data and objectives to determine which runtime flags to set when launching AlphaFold.
The database files are stored at /mnt/pan/AlphaFold031023
You will pass the FASTA file name to this job script, which is then accessible as the shell variable $1 from within the script.
#!/bin/bash
#SBATCH -p gpu
#SBATCH --gres=gpu:2
#SBATCH --mem=100g
#SBATCH -c 10
#SBATCH -o alphafold_%j.log
# Copy the sequence to scratch
cp $1 $PFSDIR
mkdir $PFSDIR/output
# Setup environment
module purge
module load AlphaFold/2.3.1-foss-2022a-CUDA-11.7.0
# Run AlphaFold
export ALPHAFOLD_DATA_DIR="/mnt/pan/AlphaFold031023"
/usr/local/easybuild_allnodes/software/AlphaFold/2.3.1-foss-2022a-CUDA-11.7.0/bin/run_alphafold.py \
--fasta_paths=$PFSDIR/$1 \
--data_dir=$ALPHAFOLD_DATA_DIR \
--output_dir=$PFSDIR/output \
--max_template_date=2022-11-01 \
--model_preset=multimer \
--db_preset=reduced_dbs \
--use_gpu_relax=True \
--num_multimer_predictions_per_model=2
# Copy result back
cp -r $PFSDIR/output $SLURM_SUBMIT_DIR/alphafold_result_$SLURM_JOBID
The numerous stages of the AlphaFold workflow will be reported in the output. A simple example is offered here to illustrate the different sections. The T1050 single-sequence data was obtained as a fasta file, and passed to AlphaFold. Typically sections involve:
JackHammr (2 stages)
Finished Jackhammr (uniref90.fasta) query in 455.763 seconds
Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 472.964 seconds
HHSearch
Finished HHsearch query in 146.808 seconds
HHBLITS
Finished HHblits query in 7292.667 seconds
Runing Model, Invoking Tensorflow
run_alphafold.py:225] Running model model_1_ptm on T1050
model.py:165] Running predict with shape(feat) = {'aatype': (4, 779), 'residue_index': (4, 779), ....... ]
run_alphafold.py:237] Total JAX model model_1_ptm on T1050 predict time (includes compilation time, see --benchmark): 574.1s
run_alphafold.py:225] Running model model_2_ptm on T1050
model.py:165] Running predict with shape(feat) = {'aatype': (4, 779), 'residue_index': (4, 779), ........ ]
run_alphafold.py:237] Total JAX model model_2_ptm on T1050 predict time (includes compilation time, see --benchmark): 489.1s
run_alphafold.py:225] Running model model_3_ptm on T1050
run_alphafold.py:225] Running model model_4_ptm on T1050
run_alphafold.py:225] Running model model_5_ptm on T1050
run_alphafold.py:306] Final timings for T1050: {'features': 8557.561393976212, 'process_features_model_1_ptm': 38.50683522224426, ' predict_and_compile_model_1_ptm': 574.1196734905243, 'process_features_model_2_ptm': 8.790717363357544, 'predict_and_compile_model_2_ptm': 489.09383630752563, 'process_f eatures_model_3_ptm': 8.145345687866211, 'predict_and_compile_model_3_ptm': 465.3740735054016, 'process_features_model_4_ptm': 7.742178916931152, 'predict_and_compile_mo del_4_ptm': 461.7440278530121, 'process_features_model_5_ptm': 7.70080304145813, 'predict_and_compile_model_5_ptm': 424.1679847240448}
Full sample output for T1050: alphafold_T1050.out
[1] https://www.nature.com/articles/s41586-021-03819-2