Relion
Relion
RELION [1] stands for REgularised LIkelihood OptimisatioN and is a program used for cryo-electron microscopy (cryo-EM) processing. The software was developed from Sjors H.W. Schere's Lab at MRC Laboratory of Molecular Biology in Cambridge. So far, the program has been used to resolve large macromolecular cryo-EM structures like ribosomes
Important Notes
We sincerely ask the user not to press the "Run" button in the GUI as then it would run on the head nodes instead of the on the proper compute nodes.
What we are asking is to use the relion GUI to construct the command, "print command", and use the command in the slurm submit script to run the job.
You might know this already.
If you are using GUI, make sure that "Number of MPI procs" field at "Running" tab, matches the total number of requested processors (# of nodes * ppn) in "Standard submission script".
Request enough memory for your job with #SBATCH -n <x> -c <y> --mem-per-cpu=<m>gb.
Installed Versions
All the available versions of RELION for use can be viewed by issuing the following command. This applies for other applications as well.
module spider relion
Output:
Versions:
relion/2.1.b1
relion/3.0-beta_cpu
relion/3.0-beta
relion/3.0.5-c9.2
relion/3.0.5
Some of the versions may be compiled with GCC for which you need to switch to it using:
module switch intel gcc
Now, check the default module with (D)
module avail relion
-------- /usr/local/share/modulefiles/MPI/gcc/6.3.0/openmpi/2.0.1 ---------
relion/2.1.b1 relion/3.0-beta_cpu relion/3.0-beta relion/3.0.5-c9.2 (D) relion/3.0.5
The default version is identified by "(D)" behind the module name and can be loaded as:
module load relion
The other versions of Relion can be loaded as:
module load relion/<version>
Running Relion on HPC
Interactive Job Submission
Request a compute node with X-Forwarding on. The node-request-options can be -n 1 -c 4 --mem=5gb etc.
srun --x11 -n 1 -c 4 --mem=8gb -pty bash
Load the Relion module
module swap intel gcc
module load relion
Execute Relion:
relion
You will see the Relion GUI as showed:
We sincerely ask the user not to press the "Run" button in the GUI as then it would run on the head nodes instead of the proper compute nodes.
What we are asking is to use the relion GUI to construct the command, "print command", and use the command in the slurm submit script to run the job.
Batch Job Submission
Download the Relion benchmark file (tar.gz) from the Relion website, unzip it and cd to relion_benchmark directory. It is a huge directory. So, you may want to download it in the /scratch space.
wget ftp://ftp.mrc-lmb.cam.ac.uk/pub/scheres/relion_benchmark.tar.gz
tar xzvf relion_benchmark.tar.gz
cd relion_benchmark
Copy the job file from /usr/local/doc/RELION/relion-batch to relion_benchmark
cp /usr/local/doc/RELION/relion-batch/2gpu-j6-p100.sh .
Submit the job. Please check the job file to use different flags associated with relion. Note that there are options for using SSD or $PFSDIR for scratch space
sbatch 2gpu-j6-p100.sh
Monitor your job:
We are using 3 tasks ( -n 3 -> NGpus (2) + 1) and 6 processors per task (-c 6 or --j 6) i.e. 3*6 = 18 processors.
Check the CPU utilization:
ssh -t <gpu-node> top
Output:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1672 <caseID> 20 0 23.7g 4.7g 108728 S 127.6 3.7 352:32.10 relion_refine_m
1671<caseID> 20 0 23.7g 4.7g 108948 S 122.6 3.7 340:40.66 relion_refine_m
1670 <caseID> 20 0 2830744 2.2g 9052 R 66.8 1.8 151:46.84 relion_refine_m
Check GPU utilization (Note: For pascal architecture, only one GPU is going to be used at a time and it takes much longer - see the benchmark - https://www3.mrc-lmb.cam.ac.uk/relion/index.php?title=Benchmarks_%26_computer_hardware):
ssh <gpu-node> nvidia-smi -l 5
Output:
Fri May 17 14:26:48 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:02:00.0 Off | N/A |
| 33% 55C P2 180W / 250W | 9885MiB / 10989MiB | 52% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... Off | 00000000:03:00.0 Off | N/A |
| 33% 56C P2 180W / 250W | 9885MiB / 10989MiB | 62% Default |
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1671 C ...relion/3.0.5-c9.2/bin/relion_refine_mpi 9875MiB |
| 1 1672 C ...relion/3.0.5-c9.2/bin/relion_refine_mpi 9875MiB |
+-----------------------------------------------------------------------------+
Check the output log file
cat slurm-<jobid>
Output:
Fri May 17 11:46:57 EDT 2019
RELION version: 3.0.5
Precision: BASE=double, CUDA-ACC=single
=== RELION MPI setup ===
+ Number of MPI processes = 3
+ Number of threads per MPI process = 6
+ Total number of threads therefore = 18
...
Expectation iteration 25 of 25
3.47/3.47 min ............................................................~~(,_,">
Maximization ...
1.65/1.65 min ............................................................~~(,_,">
real 170m37.856s
user 854m2.372s
sys 103m30.085s
Fri May 17 14:37:35 EDT 2019
Older Stuff
Request a compute node with X-Forwarding on. The node-request-options can be -n 2 -c 4 --mem=5gb etc.
srun <node request options> --x11 -pty bash
Copy the relion-interactive directory from /usr/local/doc/RELION directory to your home directory. You will find the submission script (qsub.csh), executable (ctffind3.exe) and Micrographs directory.
cp -r /usr/local/doc/RELION/relion-interactive .
Load the relion module
module load relion
Change directory to relion-interactive
cd <path-to-relion-interactive>
Run (Imp Note: GUI may not be available for the later version of RELION so use the batch script)
relion &
The relion in the hpc1/hpc2 is installed properly and ready to be used (that now includes the relion executable that has GUI access).
However, we sincerely ask the user not to press the "Run" button in the GUI as then it would run on the head nodes instead of the proper compute nodes.
What we are asking is to use the relion GUI to construct the command, "print command", and use the command in the slurm submit script to run the job.
You might know this already.
In the GUI, keep everything as they are except the following changes:
In "CTF estimation" option, in the "CTFFIND" tab, at "CTFFIND Executable Field", browse to "ctffind3.exe" in your working directory <path-to-relion-interactive>. Make sure that "Run CTFFIND3" option is Yes.
In "Extract" tab, choose Yes option in "Generate particle STAR file" field.
In the "Running" tab, you need to make the following changes:
Here, the Number of MPI procs is same as the total number of processors in the job file (qsub.csh). Browse to the qsub.csh file in the directory in "Standard submission script" field. You can customize the script as your need.
job.csh:
#!/bin/tcsh
#SBATCH -n 2
#SBATCH -c 4
#SBATCH --mem=5gb
#SBATCH --time=10:00:00
module load relion
# Environment
source ~/.bashrc
mpiexec --bynode -n XXXmpinodesXXX XXXcommandXXX
(Imp) Do not click on "Run!" button. Instead submit the job as a slurm script
In the session where you typed "relion &", you will see the batch job being assigned with JOBID. Once the job gets completed, you should be able to see the particles.star file and a new directory Particles with sub-directory Micrographs containing .star and .mrcs files.
See the content of the particles.star file
less particles.star
output:
data_
loop_
_rlnMicrographName #1
_rlnCoordinateX #2
_rlnCoordinateY #3
_rlnImageName #4
_rlnDefocusU #5
_rlnDefocusV #6
_rlnDefocusAngle #7
_rlnVoltage #8
_rlnSphericalAberration #9
_rlnAmplitudeContrast #10
_rlnMagnification #11
_rlnDetectorPixelSize #12
_rlnCtfFigureOfMerit #13
Micrographs/006.mrc 453.000000 604.000000 000001@Particles/Micrographs/006_particles.mrcs 5707.600098 5798.459961 48.970001 300.000000 2.000000 0.100000 60000.000000 14.000000 0.113710
...
Following the tutorial [2] (in section 4.2), you can plot the .star file using GNUPlot as showed. You can also copy the PrecalculatedResults directory form /usr/loc/doc/Relion to plot the graph.
module load relion
relion_star_plottable Class3D/run1_it025_model.star data_model_class_1 rlnResoultion rlnSsnrMap
output:
Relion 2 (under construction)
This is an example of the slurm submit script that includes the Relion 2 command properly, the bolded colored parts need to match between the slurm part and the relion command:
#!/bin/bash
#SBATCH -n 8
#SBATCH -c 4
#SBATCH --mem-per-cpu=2g
#SBATCH -o refine-%j.out
module load relion/2.0
mpirun -n 8 relion_refine_mpi --o Down1Class3D/run1 --i particles_grouped.star --particle_diameter 360 --angpix 1.12 --ref Box488_3D.mrc --firstiter_cc --ini_high 50 --no_parallel_disc_io --ctf --iter 25 --tau2_fudge 2 --K 3 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --offset_range 1 --offset_step 2 --sym C1 --norm --scale --j 4 --memory_per_thread 2
Troubleshooting
Memory errors
If you see the following error:
File: ml_model.cpp line: 1328
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
This error occurs if the MPI tasks need more memory. To solve this problem, do not just increase your memory allocation in the Slurm script. Instead of (or in addition to) that, increase the number of CPUs per task. This will ensure that your MPI taks will have the right memory.
Segmentation Error:
For datasets of smaller particles, higher value of --j (-c 6) may not be a problem. However, for dataset of larger particles, the value --j must be reduced to avoid segmentation fault.
References:
[1] HOME: http://www2.mrc-lmb.cam.ac.uk/relion/index.php/Main_Page
[2] Tutorial: http://www2.mrc-lmb.cam.ac.uk/groups/scheres/relion_tutorial.pdf