Relion

Relion 

RELION [1] stands for REgularised LIkelihood OptimisatioN and is a program used for cryo-electron microscopy (cryo-EM) processing. The software was developed from Sjors H.W. Schere's Lab at MRC Laboratory of Molecular Biology in Cambridge. So far, the program has been used to resolve large macromolecular cryo-EM structures like ribosomes

Important Notes

Installed Versions

All the available versions of RELION for use can be viewed by issuing the following command. This applies for other applications as well. 

module spider relion

Output:

Versions:

        relion/2.1.b1

        relion/3.0-beta_cpu

        relion/3.0-beta

        relion/3.0.5-c9.2

        relion/3.0.5

Some of the versions may be compiled with GCC for which you need to switch to it using:

module switch intel gcc

Now, check the default  module with (D)

 module avail relion

-------- /usr/local/share/modulefiles/MPI/gcc/6.3.0/openmpi/2.0.1 ---------

   relion/2.1.b1    relion/3.0-beta_cpu    relion/3.0-beta    relion/3.0.5-c9.2 (D)    relion/3.0.5

The default version is identified by "(D)" behind the module name and can be loaded as:

module load relion

The other versions of Relion can be loaded as:

module load relion/<version>

Running Relion on HPC

Interactive Job Submission

Request a compute node with X-Forwarding on. The node-request-options can be -n 1 -c 4 --mem=5gb etc.

srun --x11 -n 1 -c 4 --mem=8gb -pty bash

Load the Relion module

module swap intel gcc

module load relion

Execute Relion:

relion

You will see the Relion GUI as showed:

We sincerely ask the user not to press the "Run" button in the GUI as then it would run on the head nodes instead of the proper compute nodes.

What we are asking is to use the relion GUI to construct the command, "print command", and use the command in the slurm submit script to run the job.

Batch Job Submission

Download the Relion benchmark file (tar.gz) from the Relion website, unzip it and cd to relion_benchmark directory. It is a huge directory. So, you may want to download it in the /scratch space.

wget ftp://ftp.mrc-lmb.cam.ac.uk/pub/scheres/relion_benchmark.tar.gz

tar xzvf relion_benchmark.tar.gz

cd relion_benchmark

Copy the job file from /usr/local/doc/RELION/relion-batch to relion_benchmark

cp  /usr/local/doc/RELION/relion-batch/2gpu-j6-p100.sh .

Submit the job. Please check the job file to use different flags associated with relion. Note that there are options for using SSD or $PFSDIR for scratch space

sbatch 2gpu-j6-p100.sh

Monitor your job:

We are using 3 tasks ( -n 3 -> NGpus (2) + 1) and 6 processors per task (-c 6 or --j 6) i.e. 3*6 = 18 processors. 

Check the CPU utilization:

 ssh -t <gpu-node> top

Output:

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND

 1672 <caseID>    20   0   23.7g   4.7g 108728 S 127.6  3.7 352:32.10 relion_refine_m

 1671<caseID>   20   0   23.7g   4.7g 108948 S 122.6  3.7 340:40.66 relion_refine_m

 1670 <caseID>    20   0 2830744   2.2g   9052 R  66.8  1.8 151:46.84 relion_refine_m

Check GPU utilization (Note: For pascal architecture, only one GPU is going to be used at a time and it takes much longer - see the benchmark - https://www3.mrc-lmb.cam.ac.uk/relion/index.php?title=Benchmarks_%26_computer_hardware):

ssh <gpu-node> nvidia-smi -l 5

Output:

Fri May 17 14:26:48 2019

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 418.56       Driver Version: 418.56       CUDA Version: 10.1     |

|-------------------------------+----------------------+----------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|===============================+======================+======================|

|   0  GeForce RTX 208...  Off  | 00000000:02:00.0 Off |                  N/A |

| 33%   55C    P2   180W / 250W |   9885MiB / 10989MiB |     52%      Default |

+-------------------------------+----------------------+----------------------+

|   1  GeForce RTX 208...  Off  | 00000000:03:00.0 Off |                  N/A |

| 33%   56C    P2   180W / 250W |   9885MiB / 10989MiB |     62%      Default |

+-----------------------------------------------------------------------------+

| Processes:                                                       GPU Memory |

|  GPU       PID   Type   Process name                             Usage      |

|=============================================================================|

|    0      1671      C   ...relion/3.0.5-c9.2/bin/relion_refine_mpi  9875MiB |

|    1      1672      C   ...relion/3.0.5-c9.2/bin/relion_refine_mpi  9875MiB |

+-----------------------------------------------------------------------------+

Check the output log file

cat slurm-<jobid>

Output:

Fri May 17 11:46:57 EDT 2019

RELION version: 3.0.5

Precision: BASE=double, CUDA-ACC=single

 === RELION MPI setup ===

 + Number of MPI processes             = 3

 + Number of threads per MPI process  = 6

 + Total number of threads therefore  = 18

...

 Expectation iteration 25 of 25

3.47/3.47 min ............................................................~~(,_,">

 Maximization ...

1.65/1.65 min ............................................................~~(,_,">

real    170m37.856s

user    854m2.372s

sys     103m30.085s

Fri May 17 14:37:35 EDT 2019

Older Stuff

Request a compute node with X-Forwarding on. The node-request-options can be -n 2 -c 4 --mem=5gb etc.

srun <node request options> --x11 -pty bash

Copy the relion-interactive directory from /usr/local/doc/RELION directory to your home directory. You will find the submission script (qsub.csh), executable (ctffind3.exe) and Micrographs directory.

cp -r /usr/local/doc/RELION/relion-interactive .

Load the relion module

module load relion

Change directory to relion-interactive

cd <path-to-relion-interactive>

Run (Imp Note: GUI may not be available for the later version of RELION so use the batch script)

relion &

The relion in the hpc1/hpc2 is installed properly and ready to be used (that now includes the relion executable that has GUI access).

However, we sincerely ask the user not to press the "Run" button in the GUI as then it would run on the head nodes instead of the proper compute nodes.

What we are asking is to use the relion GUI to construct the command, "print command", and use the command in the slurm submit script to run the job.

You might know this already.

In the GUI, keep everything as they are except the following changes:

Here, the Number of MPI procs is same as the total number of processors in the job file (qsub.csh). Browse to the qsub.csh file in the directory in "Standard submission script" field. You can customize the script as your need.

job.csh:

#!/bin/tcsh

#SBATCH -n 2

#SBATCH -c 4

#SBATCH --mem=5gb

#SBATCH --time=10:00:00

module load relion

# Environment

source ~/.bashrc

mpiexec --bynode -n XXXmpinodesXXX  XXXcommandXXX

In the session where you typed "relion &", you will see the batch job being assigned with JOBID. Once the job gets completed, you should be able to see the particles.star file and a new directory Particles with sub-directory Micrographs containing .star and .mrcs files.

See the content of the particles.star file

less particles.star

output:

data_

loop_ 

_rlnMicrographName #1 

_rlnCoordinateX #2 

_rlnCoordinateY #3 

_rlnImageName #4 

_rlnDefocusU #5 

_rlnDefocusV #6 

_rlnDefocusAngle #7 

_rlnVoltage #8 

_rlnSphericalAberration #9 

_rlnAmplitudeContrast #10 

_rlnMagnification #11 

_rlnDetectorPixelSize #12 

_rlnCtfFigureOfMerit #13 

Micrographs/006.mrc   453.000000   604.000000 000001@Particles/Micrographs/006_particles.mrcs  5707.600098  5798.459961    48.970001   300.000000     2.000000     0.100000 60000.000000    14.000000     0.113710

...

Following the tutorial [2] (in section 4.2), you can plot the .star file using GNUPlot as showed. You can also copy the PrecalculatedResults directory form /usr/loc/doc/Relion to plot the graph. 

module load relion

 relion_star_plottable Class3D/run1_it025_model.star data_model_class_1 rlnResoultion rlnSsnrMap

output:

Relion 2 (under construction)

This is an example of the slurm submit script that includes the Relion 2 command properly, the bolded colored parts need to match between the slurm part and the relion command:

#!/bin/bash

#SBATCH -n 8

#SBATCH -c 4

#SBATCH --mem-per-cpu=2g

#SBATCH -o refine-%j.out

 

module load relion/2.0

 

mpirun -n 8 relion_refine_mpi --o Down1Class3D/run1 --i particles_grouped.star --particle_diameter 360 --angpix 1.12 --ref Box488_3D.mrc --firstiter_cc --ini_high 50 --no_parallel_disc_io --ctf --iter 25 --tau2_fudge 2 --K 3 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --offset_range 1 --offset_step 2 --sym C1 --norm --scale  --j 4 --memory_per_thread 2

Troubleshooting

Memory errors

If you see the following error:

File: ml_model.cpp line: 1328

--------------------------------------------------------------------------

MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD

with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.

You may or may not see output from other processes, depending on

exactly when Open MPI kills them.

--------------------------------------------------------------------------

This error occurs if the MPI tasks need more memory. To solve this problem, do not just increase your memory allocation in the Slurm script. Instead of (or in addition to) that, increase the number of CPUs per task. This will ensure that your MPI taks will have the right memory.

Segmentation Error:

For datasets of smaller particles, higher value of --j (-c 6) may not be a problem. However, for dataset of larger particles, the value --j must be reduced to avoid segmentation fault.

References:

[1] HOME: http://www2.mrc-lmb.cam.ac.uk/relion/index.php/Main_Page

[2] Tutorial: http://www2.mrc-lmb.cam.ac.uk/groups/scheres/relion_tutorial.pdf