CryoSPARC

What is CryoSPARC?

CryoSPARC is a state of the art scientific software platform for cryo-electron microscopy (cryo-EM) used in research and drug discovery pipelines.  More information about CryoSPARC can be found on their website.

Building CryoSPARC on the Cluster

CryoSPARC needs to be built and run on the GPUs as it is designed with that accelerator as its main computational device. It also relies heavily on the SSD as its main storage for processing the data. 

Requesting GPU node and Environment

Request a GPU node with appropriate partition or feature (see HPC resource view for GPU resources and GPU cards with SSD drives . Please use /tmp ($TMPDIR) as scratch space to use SSDs. That will be your cryosparc master node. Request a GPU node with all GPU cards available in that node (see HPC resource view). Here, 16gb of memory is requested. Please assign the memory value accordingly as per your job memory requirement.

srun -p gpu -C gpu2080 --gres=gpu:2 --x11 --mem=16gb --pty bash # using partition gpu and feature gpu2020

It is recommended to install cryosparc in another location (/mnt/pan or /mnt/vstor) rather than your /home directory. Create a cryosparc version directory in that location (e.g. /mnt/pan/cryoem/<user>/cryosparc/<cryosparc-version>) and change directory (cd) to it.

cd <path-to-cryosparc-version-dir>

Create "environment.sh" file with the following content and update the field accordingly:

#!/bin/bash


install_path="<path-to-the-version-of-cryosparc>"

license_id=< license ID from cryosparc - https://guide.cryosparc.com/licensing>

worker_path="<path-to-the-version-of-cryosparc>/cryosparc2_worker"

cuda_path=/usr/local/cuda

ssd_path=/tmp                   # use appropriate SSD path that you have access to

user_email="<first-name>.<last-name>@case.edu"

user_name="<caseID>"   # your actual caseID

user_password=<password>

user_firstname="<first-name>"

user_lastname="<Last-name>"

export LICENSE_ID=$license_id

Source the environment:

source ./environment.sh 

Standalone Installation with CryoSPARC Master and Worker

Follow the "standalone" installation instructions using the "Single Workstation (Master and Worker Combined)" tab - https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/downloading-and-installing-cryosparc 

Verify TCP Port

Check the TCP ports 39000 -  whether  they are already being used (source cryoSparc forum):

netstat -tuplen | grep :3900

output:

(Not all processes could be identified, non-owned process info

 will not be shown, you would have to be root to see it all.)

tcp        0      0 0.0.0.0:39001           0.0.0.0:*               LISTEN      533150     33147754   7185/mongod        

tcp        0      0 0.0.0.0:39002           0.0.0.0:*               LISTEN      533150     32909929   7273/python        

tcp        0      0 0.0.0.0:39003           0.0.0.0:*               LISTEN      533150     33170486   7325/python        

tcp        0      0 0.0.0.0:39005           0.0.0.0:*               LISTEN      533150     32909946   7330/python        

tcp        0      0 0.0.0.0:39006           0.0.0.0:*               LISTEN      533150     33170504   7414/node          

tcp        0      0 0.0.0.0:39007           0.0.0.0:*               LISTEN      533150     33171485   7430/node          

tcp        0      0 0.0.0.0:39000           0.0.0.0:*               LISTEN      533150     32909961   7405/node          

In general, try NOT to use ports under 1024. For your purposes, if your cryoSPARC instance A was hosted at port 39000 (with the installation flag  "--port <port_number>") as showed in output above, given cryoSPARC only requires a 10-port range, you can safely instantiate your cryoSPARC instance B on port 39010.


CryoSPARC Installation

Download Tarball files:

curl -L https://get.cryosparc.com/download/master-latest/$LICENSE_ID -o cryosparc_master.tar.gz

curl -L https://get.cryosparc.com/download/worker-latest/$LICENSE_ID -o cryosparc_worker.tar.gz

Extract tar files. It may take several minutes

tar -xvf cryosparc_master.tar.gz cryosparc_master

tar -xvf cryosparc_worker.tar.gz cryosparc_worker

Change directory to cryosparc_master:

cd cryosparc_master/

Install cryosparc_master package. (Note: use different port with the flag "--port <port_number>" if the default port 39000 has already been used.

 ./install.sh --standalone --license $LICENSE_ID --worker_path $worker_path --cudapath $cuda_path --ssdpath $ssd_path --initial_email $user_email --initial_password $user_password --initial_username $user_name --initial_firstname $user_firstname --initial_lastname $user_lastname

Change Directory to cryosparc_worker:

cd ../cryosparc_worker/

Install cryosparc_worker package:

 ./install.sh --license $LICENSE_ID --cudapath $cuda_path

Check the Installation Status:

<path to cryosparc_master>/bin/cryosparcm status

output:

----------------------------------------------------------------------------

CryoSPARC System master node installed at

/usr/local/cryosparc/v3.2/cryosparc_master

Current cryoSPARC version: v3.2.0

----------------------------------------------------------------------------


CryoSPARC process status:


app                              RUNNING   pid 9227, uptime 2:14:05

app_dev                          STOPPED   Not started

command_core                     RUNNING   pid 9089, uptime 2:14:25

command_rtp                      RUNNING   pid 9147, uptime 2:14:15

command_vis                      RUNNING   pid 9138, uptime 2:14:16

database                         RUNNING   pid 9004, uptime 2:14:28

liveapp                          RUNNING   pid 9258, uptime 2:14:04

liveapp_dev                      STOPPED   Not started

webapp                           RUNNING   pid 9210, uptime 2:14:07

webapp_dev                       STOPPED   Not started


----------------------------------------------------------------------------


global config variables:


export CRYOSPARC_LICENSE_ID="xxxxx"

export CRYOSPARC_MASTER_HOSTNAME="hpc2"

export CRYOSPARC_DB_PATH="/usr/local/cryosparc/v3.2/cryosparc_database"

export CRYOSPARC_BASE_PORT=39000

export CRYOSPARC_DEVELOP=false

export CRYOSPARC_INSECURE=false

export CRYOSPARC_CLICK_WRAP=true

Run worker command "cryosparcw" from the worker gpu node to include it in a lane.  This needs to be done once for each node (if the worker node has not been used previously)-

<path-to-cryosparc_worker>/bin/cryosparcw connect --worker <present-node> --master <present-node> --ssdpath /tmp --lane <name-of-the-lane> --newlane

Check the status of GPUs in one of the worker nodes:

<path-to-cryosparc_worker>/bin/cryosparcw gpulist

output:

Detected 2 CUDA devices.


   id           pci-bus  name

   ---------------------------------------------------------------

       0      0000:02:00.0  GeForce RTX 2080 Ti

       1      0000:81:00.0  GeForce RTX 2080 Ti

   ---------------------------------------------------------------

Display the GPU node(s) in the lane

<path-to-cryosparc_master>/bin/cryosparcm cli "get_worker_nodes()"

output:

[{'cache_path': '/tmp', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 11554717696, 'name': 'GeForce RTX 2080 Ti'}, {'id': 1, 'mem': 11554717696, 'name': 'GeForce RTX 2080 Ti'}], 'hostname': '<gpu-node>', 'lane': 'gpu2080', 'monitor_port': None, 'name': '<gpu-node>', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], 'GPU': [0, 1], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}, 'ssh_str': '<caseID>@<gpu-node>', 'title': 'Worker node <gpu-node>', 'type': 'node', 'worker_bin_path': '/usr/local/cryosparc/v3.2/cryosparc_worker/bin/cryosparcw'},

Running CryoSPARC from the Portal

Open firefox browser from the compute node or use ssh tunneling. The ssh tunneling instead of the firefox on the HPC seems to be much more reliable and have a faster response.

SSH Tunnel to CryoSPARC:

ssh -N -L 39000:<ip-address>:39000 <caseID>@pioneer.case.edu

Open a web browser from your PC and navigate to http://localhost:39000

OR

Firefox

From the compute node, open the firefox borwser:

firefox &

Access the cryosparc portal from the browser by typing http://localhost:39000/ and using your credentials that you used in the "environment.sh" file.

If you are accessing for the first time,  Accept terms.

At the bottom, click on Resource Manager and find "Instance Information" tab:

gpu2080

Target 1: <gpu-node>NODE

Cores20

Memory128 GB

GPUs2

Worker bin path/usr/local/cryosparc/v3.2/cryosparc_worker/bin/cryosparcw

Hostname<gpu-node>

Name<gpu-node>

Cache path/tmp

Cache quota (MB)

SSH String<caseID>@<gpu-node>

Cache Reserve (MB)10000

Testing:

For testing, follow the instructions at Data Processing Introductory Tutorial -  https://guide.cryosparc.com/processing-data/cryo-em-data-processing-in-cryosparc-introductory -tutorial

Once the job is completed, stop the cryosparc process.

<path to cryosparc_master>/bin/cryosparcm stop

After Installation and testing

When you request the GPU node,  you may get different GPU node.  So, you need to change the name of the master node in "config.sh" file at path to cryosparc_master accordingly.  

export CRYOSPARC_MASTER_HOSTNAME="<gpu-node>"

Also, the default TCP port (39000) may have already been used in that server. Check the section "Verify TCP Ports" above and use the unused ports by including it in "config.sh file".

export CRYOSPARC_BASE_PORT=<port>

Start  cryosparc process

<path to cryosparc_master>/bin/cryosparcm start

Follow the procedure above starting from running worker command "cryosparcw" from the new worker gpu node to include it in a lane


Troubleshooting

Please refer to cryosparc FAQ or contact cryosparc support [1] if you encounter issues with cryosparc. Some of the frequently encountered issues are included below:

If you accidentally leave the node while cryosparc is running, you will need to  remove the *.sock file at /tmp before starting cryosparc process. You may also need to kill any other remaining processes that may be interfering using "kill <PID> after getting them using commands below:

ps -ax | grep "supervisord"

ps -ax | grep "cryosparc2_command"

ps -ax | grep "mongod"

Database Failure: Please kill the mongo processes, delete the lock files, and restart cryoSPARC.

ps -ax | grep “mongod”

kill <process_pid>

//delete the .lock file at <cryosparc-install-dir>/cryosparc_database

cryosparcm start

Database Spawn Error:  To get around this, change the name of the cryosparc database directory in the install folder. Then start cryosparc and then immediately stop it. You can then merger the renamed database back into the newly spawned database. Cryosparc should then run normally and the database information should be preserved. It is a good idea to keep the backup database until you confirm everything is running smoothly and the projects are in place.


References: