Getting Started    
Running a job on a single machine   
Data storage  
Setting up & running a cluster on Amazon EC2
https://groups.google.com/forum/#!forum/cryo-em-in-the-cloud


    

GPU cryo-EM data analysis on AWS  

With the introduction of GPU-accelerated Relion and newly available GPU-compute virtual machines from AWS, we can now run Relion-GPU jobs on these virtual machines ('p2'). This means that you don't have to start a starcluster on AWS in order to have access to powerful computing recourses. 

We have installed all necessary software to run Relion2.0-beta using GPU and CPU processors, which includes associated software packages: 

  • Relion2.0-beta
  • Gctf
  • CTFFIND-4.1
  • Unblur
  • Summovie 
  • MotionCorr 
  • MotionCor2
  • ResMap
  • Aspera Connect (for downloading data from EMDB)
AND, we included the tutorial dataset that can be found in your home directory when you boot up the virtual machine.

Virtual Machine Choices

Available only in US-East-1 (N. Virginia), US-West-2 (Oregon), EU-West-1 (Ireland)

  1. p2.xlarge
    • vCPUs = 4
    • K80 Tesla NVIDIA GPU = 1
    • Memory = 61 GiB
    • GPU Memory = 12 GiB
  2. p2.8xlarge
    • vCPUs = 32
    • K80 Tesla NVIDIA GPU = 8
    • Memory = 488 GiB
    • GPU Memory = 96 GiB
  3. p2.16xlarge
    • vCPUs = 64
    • K80 Tesla NVIDIA GPU = 16
    • Memory = 732 GiB
    • GPU Memory = 192 GiB 

Benchmark Tests

To benchmark the performance of these virtual machines, we used the benchmarking routine as described on the Relion website

We ran the following command while changing number of MPI threads, number of GPUs, and --j threads on three different GPU configurations using EBS backed volumes (SSD). 

NOTE: As of 10/8/16 we are still tweaking the job submission parameters to optimize performance, check back regularly to see if this command has changed.

Command to run relion and put it in the background for 3D classification:

nohup mpirun -np [#MPIused] relion_refine_mpi --i Particles/shiny_2sets.star --ref emd_2660.map:mrc --firstiter_cc --ini_high 60 --ctf --ctf_corrected_ref --iter 25 --tau2_fudge 4 --particle_diameter 360 --K 6 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --offset_range 5 --offset_step 2 --sym C1 --norm --scale --random_seed 0 --o class3d --gpu --pool 100 --j [#jthreads] --dont_combine_weights_via_disc --preread_images 

Command to run relion and put it in the background for 2D classification:

nohup mpirun -np [#MPIused] relion_refine_mpi --i Particles/shiny_2sets.star --ctf --iter 25 --tau2_fudge 2 --particle_diameter 360 --K 200 --zero_mask --oversampling 1 --psi_step 6 --offset_range 5 --offset_step 2 --norm --scale --random_seed 0 --o class2d --j [#jthreads] --dont_combine_weights_via_disc --preread_images --gpu --pool 100

SDSC command: 
ibrun -np 5 -tpr 6 relion_refine_mpi` --i Particles/shiny_2sets.star --ref emd_2660.map:mrc --firstiter_cc --ini_high 60 --ctf --ctf_corrected_ref --iter 25 --tau2_fudge 4 --particle_diameter 360 --K 6 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --offset_range 5 --offset_step 2 --sym C1 --norm --scale --random_seed 0 --o class3d --j 6 --dont_combine_weights_via_disc --gpu 0:1:2:3 --pool 100

 Instance   Job
 # MPI used  # GPU used --j threads used # CPU available Instance volume type EBS typeElapsed time (HH:MM)
Total Cost*
 p2.xlarge Class3D 2 1 4  4gp2 gp2 12:06 $10.89
 p2.8xlarge  Class3D 9 8  32gp2gp2 2:05 $21.60
 p2.8xlarge  Class3D 9 83 32gp2io22:30 $21.60
 p2.16xlarge Class3D 1716  3  64  gp2   gp2 1:50  $14.40
 g3.16xlarge  Class3D  5 4 12  64 gp2 gp2 2:53 $13.68
g3.16xlarge Class2D  5 412  64gp2 gp2  tbdtbd 
SDSC Comet  Class3D 5 424 P100 GPUsn/a  3:37 $0.00


Total cost was estimated using on-demand instance pricing: 
  • p2.xlarge  = $0.90 / hr.
  • p2.8xlarge = $7.20 / hr.
  • p2.16xlarge = $14.40 / hr.
  • g3.16xlarge = $4.56 /hr.
Note that there are two types of SSD-backed EBS volumes on AWS: gp2 and io2. When doing the comparison, we found that it was not faster, and perhaps slower.

2D classification benchmarking for sorting 'junk'

Instead of the benchmarking test on Relion's page for 2D classification, we wanted more realistic numbers for 2D classification. So here is our test dataset: :

Dataset #1: 

  • 187,931 particles
  • 80 x 80 pixels
  • 200 classes // psi_step=10 // offset_range=5 // offset_step=2
  • 25 iterations

Dataset #2: 

  • 187,00 particles
  • 144 x 144 pixels
  • 200 classes // psi_step=10 // offset_range=5 // offset_step=2
  • 18 iterations

Dataset #3: 

  • 382,885 particles
  • 64 x 64 pixels
  • 150 classes // psi_step=10 // offset_range=5 // offset_step=2
  • 25 iterations
Command for 2D classification: 

nohup mpirun -np [#MPIused] relion_refine_mpi --i particles.star --ctf --iter 25 --tau2_fudge 2 --particle_diameter 190 --K 200 --zero_mask --oversampling 1 --psi_step 10 --offset_range 5 --offset_step 2 --norm --scale --random_seed 0 --o class2d --preread_images --gpu --j [#jthreads] --dont_combine_weights_via_disc --pool 100  &


 Instance Job # MPI --j threads # GPUsDatasetElapsed time (HH:MM) Cost
 SDSC Comet Class2D 192 10#112:25--
 p2.8xlarge Class2D  3 8 #1    5:45  $41.40
 p2.8xlarge Class2D 38#2  5:15 $37.80
 p2.16xlarge Class2D 17  316 #1  2:40 $38.88
 p2.16xlarge Class2D 17 3 16 #3   6:20   $90.72 


Recommended virtual machine

Based upon these performance tests, we recommend p2.8xlarge instances for cryo-EM jobs as they are 6X faster than p2.xlarge instances. While p2.16xlarge has improved calculation times for the Expectation step in Relion2, the data writing & maximization steps limit calculation time speed ups. 

How to use this GPU computing resource

  1. Create AWS account & set up security settings
  2. Launch p2.xlarge or p2.8xlarge from console in US-East-1, US-West-2, or EU-West1:
    • Search public AMIs for 'EM-Packages-in-the-Cloud-4.0-GPU'
    • Select Spot Instance or Reserved instance when selecting machine:
      • p2.xlarge reserved price = $0.90/hr 
    • Select availability zone that is the same as your EBS volume (e.g. us-east-1d)
    • Log onto virtual machine using public IP address: 
      • $ ssh -X -i {keypair}.pem ubuntu@{Public IP Address}
  3. Mount and upload data onto an EBS volume
    • Attach EBS volume to your running instance via console
    • Then mount EBS volume via command line on our virtual machine: 
      • $ sudo mount /dev/xvdf /data
  4. Run Relion-2.0 jobs on directory in your EBS volume
  5. To close everything down when finished: 
    • Unmount your EBS volume via command line: 
      • $ cd 
      • $ sudo umount /dev/xvdf
    • Exit virtual machine
    • Detach EBS volume from running virtual machine
      • Via console: Select EBS volume, then Action -> Detach
    • Terminate virtual machine 

Example run times


     Instance   Job
     # MPI  # GPU --j # CPU avail.Run time
    # Particles Box size Iterations Resolution Total Cost
     p2.8xlarge   Refine3D 9 8  321:5469,455 
     324 21 7.8 A$14.40