Relion

RELION [1] stands for REgularised LIkelihood OptimisatioN and is a program used for cryo-electron microscopy (cryo-EM) processing. The software was developed from Sjors H.W. Schere's Lab at MRC Laboratory of Molecular Biology in Cambridge. So far, the program has been used to resolve large macromolecular cryo-EM structures like ribosomes

Important Notes

- We sincerely ask the user not to press the "Run" button in the GUI as then it would run on the head nodes instead of the on the proper compute nodes.
- What we are asking is to use the relion GUI to construct the command, "print command", and use the command in the slurm submit script to run the job.

You might know this already.
If you are using GUI, make sure that "Number of MPI procs" field at "Running" tab, matches the total number of requested processors (# of nodes * ppn) in "Standard submission script".
Request enough memory for your job with #SBATCH -n <x> -c <y> --mem-per-cpu=<m>gb.

Installed Versions

All the available versions of RELION for use can be viewed by issuing the following command. This applies for other applications as well.

module spider relion

Output:

Versions:

relion/2.1.b1

relion/3.0-beta_cpu

relion/3.0-beta

relion/3.0.5-c9.2

relion/3.0.5

Some of the versions may be compiled with GCC for which you need to switch to it using:

module switch intel gcc

Now, check the default module with (D)

module avail relion

-------- /usr/local/share/modulefiles/MPI/gcc/6.3.0/openmpi/2.0.1 ---------

relion/2.1.b1 relion/3.0-beta_cpu relion/3.0-beta relion/3.0.5-c9.2 (D) relion/3.0.5

The default version is identified by "(D)" behind the module name and can be loaded as:

module load relion

The other versions of Relion can be loaded as:

module load relion/<version>

Running Relion on HPC

Interactive Job Submission

Request a compute node with X-Forwarding on. The node-request-options can be -n 1 -c 4 --mem=5gb etc.

srun --x11 -n 1 -c 4 --mem=8gb -pty bash

Load the Relion module

module swap intel gcc

module load relion

Execute Relion:

relion

You will see the Relion GUI as showed:

We sincerely ask the user not to press the "Run" button in the GUI as then it would run on the head nodes instead of the proper compute nodes.

What we are asking is to use the relion GUI to construct the command, "print command", and use the command in the slurm submit script to run the job.

Batch Job Submission

Download the Relion benchmark file (tar.gz) from the Relion website, unzip it and cd to relion_benchmark directory. It is a huge directory. So, you may want to download it in the /scratch space.

wget ftp://ftp.mrc-lmb.cam.ac.uk/pub/scheres/relion_benchmark.tar.gz

tar xzvf relion_benchmark.tar.gz

cd relion_benchmark

Copy the job file from /usr/local/doc/RELION/relion-batch to relion_benchmark

cp /usr/local/doc/RELION/relion-batch/2gpu-j6-p100.sh .

Submit the job. Please check the job file to use different flags associated with relion. Note that there are options for using SSD or $PFSDIR for scratch space

sbatch 2gpu-j6-p100.sh

Monitor your job:

We are using 3 tasks ( -n 3 -> NGpus (2) + 1) and 6 processors per task (-c 6 or --j 6) i.e. 3*6 = 18 processors.

Check the CPU utilization:

ssh -t <gpu-node> top

Output:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

1672 <caseID> 20 0 23.7g 4.7g 108728 S 127.6 3.7 352:32.10 relion_refine_m

1671<caseID> 20 0 23.7g 4.7g 108948 S 122.6 3.7 340:40.66 relion_refine_m

1670 <caseID> 20 0 2830744 2.2g 9052 R 66.8 1.8 151:46.84 relion_refine_m

Check GPU utilization (Note: For pascal architecture, only one GPU is going to be used at a time and it takes much longer - see the benchmark - https://www3.mrc-lmb.cam.ac.uk/relion/index.php?title=Benchmarks_%26_computer_hardware):

ssh <gpu-node> nvidia-smi -l 5

Output:

Fri May 17 14:26:48 2019

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

|===============================+======================+======================|

| 0 GeForce RTX 208... Off | 00000000:02:00.0 Off | N/A |

| 33% 55C P2 180W / 250W | 9885MiB / 10989MiB | 52% Default |

+-------------------------------+----------------------+----------------------+

| 1 GeForce RTX 208... Off | 00000000:03:00.0 Off | N/A |

| 33% 56C P2 180W / 250W | 9885MiB / 10989MiB | 62% Default |

+-----------------------------------------------------------------------------+

| Processes: GPU Memory |

| GPU PID Type Process name Usage |

|=============================================================================|

| 0 1671 C ...relion/3.0.5-c9.2/bin/relion_refine_mpi 9875MiB |

| 1 1672 C ...relion/3.0.5-c9.2/bin/relion_refine_mpi 9875MiB |

+-----------------------------------------------------------------------------+

Check the output log file

cat slurm-<jobid>

Output:

Fri May 17 11:46:57 EDT 2019

RELION version: 3.0.5

Precision: BASE=double, CUDA-ACC=single

=== RELION MPI setup ===

+ Number of MPI processes = 3

+ Number of threads per MPI process = 6

+ Total number of threads therefore = 18

...

Expectation iteration 25 of 25

3.47/3.47 min ............................................................~~(,_,">

Maximization ...

1.65/1.65 min ............................................................~~(,_,">

real 170m37.856s

user 854m2.372s

sys 103m30.085s

Fri May 17 14:37:35 EDT 2019

Older Stuff

Request a compute node with X-Forwarding on. The node-request-options can be -n 2 -c 4 --mem=5gb etc.

srun <node request options> --x11 -pty bash

Copy the relion-interactive directory from /usr/local/doc/RELION directory to your home directory. You will find the submission script (qsub.csh), executable (ctffind3.exe) and Micrographs directory.

cp -r /usr/local/doc/RELION/relion-interactive .

Load the relion module

module load relion

Change directory to relion-interactive

cd <path-to-relion-interactive>

Run (Imp Note: GUI may not be available for the later version of RELION so use the batch script)

relion &

The relion in the hpc1/hpc2 is installed properly and ready to be used (that now includes the relion executable that has GUI access).

However, we sincerely ask the user not to press the "Run" button in the GUI as then it would run on the head nodes instead of the proper compute nodes.

What we are asking is to use the relion GUI to construct the command, "print command", and use the command in the slurm submit script to run the job.

You might know this already.

In the GUI, keep everything as they are except the following changes:

In "CTF estimation" option, in the "CTFFIND" tab, at "CTFFIND Executable Field", browse to "ctffind3.exe" in your working directory <path-to-relion-interactive>. Make sure that "Run CTFFIND3" option is Yes.
In "Extract" tab, choose Yes option in "Generate particle STAR file" field.
In the "Running" tab, you need to make the following changes:

Here, the Number of MPI procs is same as the total number of processors in the job file (qsub.csh). Browse to the qsub.csh file in the directory in "Standard submission script" field. You can customize the script as your need.

job.csh:

#!/bin/tcsh

#SBATCH -n 2

#SBATCH -c 4

#SBATCH --mem=5gb

#SBATCH --time=10:00:00

module load relion

# Environment

source ~/.bashrc

mpiexec --bynode -n XXXmpinodesXXX XXXcommandXXX

(Imp) Do not click on "Run!" button. Instead submit the job as a slurm script

In the session where you typed "relion &", you will see the batch job being assigned with JOBID. Once the job gets completed, you should be able to see the particles.star file and a new directory Particles with sub-directory Micrographs containing .star and .mrcs files.

See the content of the particles.star file

less particles.star

output:

data_

loop_

_rlnMicrographName #1

_rlnCoordinateX #2

_rlnCoordinateY #3

_rlnImageName #4

_rlnDefocusU #5

_rlnDefocusV #6

_rlnDefocusAngle #7

_rlnVoltage #8

_rlnSphericalAberration #9

_rlnAmplitudeContrast #10

_rlnMagnification #11

_rlnDetectorPixelSize #12

_rlnCtfFigureOfMerit #13

Micrographs/006.mrc 453.000000 604.000000 000001@Particles/Micrographs/006_particles.mrcs 5707.600098 5798.459961 48.970001 300.000000 2.000000 0.100000 60000.000000 14.000000 0.113710

...

Following the tutorial [2] (in section 4.2), you can plot the .star file using GNUPlot as showed. You can also copy the PrecalculatedResults directory form /usr/loc/doc/Relion to plot the graph.

module load relion

relion_star_plottable Class3D/run1_it025_model.star data_model_class_1 rlnResoultion rlnSsnrMap

output:

Relion 2 (under construction)

This is an example of the slurm submit script that includes the Relion 2 command properly, the bolded colored parts need to match between the slurm part and the relion command:

#!/bin/bash

#SBATCH -n 8

#SBATCH -c 4

#SBATCH --mem-per-cpu=2g

#SBATCH -o refine-%j.out

module load relion/2.0

mpirun -n 8 relion_refine_mpi --o Down1Class3D/run1 --i particles_grouped.star --particle_diameter 360 --angpix 1.12 --ref Box488_3D.mrc --firstiter_cc --ini_high 50 --no_parallel_disc_io --ctf --iter 25 --tau2_fudge 2 --K 3 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --offset_range 1 --offset_step 2 --sym C1 --norm --scale --j 4 --memory_per_thread 2

Troubleshooting

Memory errors

If you see the following error:

File: ml_model.cpp line: 1328

--------------------------------------------------------------------------

MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD

with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.

You may or may not see output from other processes, depending on

exactly when Open MPI kills them.

--------------------------------------------------------------------------

This error occurs if the MPI tasks need more memory. To solve this problem, do not just increase your memory allocation in the Slurm script. Instead of (or in addition to) that, increase the number of CPUs per task. This will ensure that your MPI taks will have the right memory.

Segmentation Error:

For datasets of smaller particles, higher value of --j (-c 6) may not be a problem. However, for dataset of larger particles, the value --j must be reduced to avoid segmentation fault.

References:

[1] HOME: http://www2.mrc-lmb.cam.ac.uk/relion/index.php/Main_Page

[2] Tutorial: http://www2.mrc-lmb.cam.ac.uk/groups/scheres/relion_tutorial.pdf