RELION [1] stands for REgularised LIkelihood OptimisatioN and is a program used for cryo-electron microscopy (cryo-EM) processing. The software was developed from Sjors H.W. Schere's Lab at MRC Laboratory of Molecular Biology in Cambridge. So far, the program has been used to resolve large macromolecular cryo-EM structures like ribosomes
We sincerely ask the user not to press the "Run" button in the GUI as then it would run on the head nodes instead of the on the proper compute nodes.
What we are asking is to use the relion GUI to construct the command, "print command", and use the command in the slurm submit script to run the job.
You might know this already.
If you are using GUI, make sure that "Number of MPI procs" field at "Running" tab, matches the total number of requested processors (# of nodes * ppn) in "Standard submission script".
Request enough memory for your job with #SBATCH -n <x> -c <y> --mem-per-cpu=<m>gb.
All the available versions of RELION for use can be viewed by issuing the following command. This applies for other applications as well.
module spider relion
Output:
Versions:
relion/3.1.3
Loading the module:
module load relion
Request a compute node with X-Forwarding on. The node-request-options can be -n 1 -c 4 --mem=5gb etc.
srun --x11 -n 1 -c 4 --mem=8gb --pty bash
Load the Relion module
module load relion
Execute Relion:
relion
You will see the Relion GUI as showed:
We sincerely ask the user not to press the "Run" button in the GUI as then it would run on the head nodes instead of the proper compute nodes.
What we are asking is to use the relion GUI to construct the command, "print command", and use the command in the slurm submit script to run the job.
Download the Relion benchmark file (tar.gz) from the Relion website, unzip it and cd to relion_benchmark directory. It is a huge directory. So, you may want to download it in the /scratch space.
wget ftp://ftp.mrc-lmb.cam.ac.uk/pub/scheres/relion_benchmark.tar.gz
tar xzvf relion_benchmark.tar.gz
cd relion_benchmark
Copy the job file from /usr/local/doc/RELION/relion-batch to relion_benchmark
cp /usr/local/doc/RELION/relion-batch/2gpu-j6-p100.sh .
Submit the job. Please check the job file to use different flags associated with relion. Note that there are options for using SSD or $PFSDIR for scratch space
sbatch 2gpu-j6-p100.sh
Monitor your job:
We are using 3 tasks ( -n 3 -> NGpus (2) + 1) and 6 processors per task (-c 6 or --j 6) i.e. 3*6 = 18 processors.
Check the CPU utilization:
ssh -t <gpu-node> top
Output:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1672 <caseID> 20 0 23.7g 4.7g 108728 S 127.6 3.7 352:32.10 relion_refine_m
1671<caseID> 20 0 23.7g 4.7g 108948 S 122.6 3.7 340:40.66 relion_refine_m
1670 <caseID> 20 0 2830744 2.2g 9052 R 66.8 1.8 151:46.84 relion_refine_m
Check GPU utilization (Note: For pascal architecture, only one GPU is going to be used at a time and it takes much longer - see the benchmark - https://www3.mrc-lmb.cam.ac.uk/relion/index.php?title=Benchmarks_%26_computer_hardware):
ssh <gpu-node> nvidia-smi -l 5
Output:
Fri May 17 14:26:48 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:02:00.0 Off | N/A |
| 33% 55C P2 180W / 250W | 9885MiB / 10989MiB | 52% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... Off | 00000000:03:00.0 Off | N/A |
| 33% 56C P2 180W / 250W | 9885MiB / 10989MiB | 62% Default |
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1671 C ...relion/3.0.5-c9.2/bin/relion_refine_mpi 9875MiB |
| 1 1672 C ...relion/3.0.5-c9.2/bin/relion_refine_mpi 9875MiB |
+-----------------------------------------------------------------------------+
Check the output log file
cat slurm-<jobid>
Output:
Fri May 17 11:46:57 EDT 2019
RELION version: 3.0.5
Precision: BASE=double, CUDA-ACC=single
=== RELION MPI setup ===
+ Number of MPI processes = 3
+ Number of threads per MPI process = 6
+ Total number of threads therefore = 18
...
Expectation iteration 25 of 25
3.47/3.47 min ............................................................~~(,_,">
Maximization ...
1.65/1.65 min ............................................................~~(,_,">
real 170m37.856s
user 854m2.372s
sys 103m30.085s
Fri May 17 14:37:35 EDT 2019