CryoSPARC

What is CryoSPARC?

CryoSPARC is a state of the art scientific software platform for cryo-electron microscopy (cryo-EM) used in research and drug discovery pipelines. More information about CryoSPARC can be found on their website.

Building CryoSPARC on the Cluster

CryoSPARC needs to be built and run on the GPUs as it is designed with that accelerator as its main computational device. It also relies heavily on the SSD as its main storage for processing the data.

Requesting GPU node and Environment

Request a GPU node with appropriate partition or feature (see HPC resource view for GPU resources and GPU cards with SSD drives . Please use /tmp ($TMPDIR) as scratch space to use SSDs. That will be your cryosparc master node. Request a GPU node with all GPU cards available in that node (see HPC resource view). Here, 16gb of memory is requested. Please assign the memory value accordingly as per your job memory requirement.

srun -p gpu -C gpu2080 --gres=gpu:2 --x11 --mem=16gb --pty bash # using partition gpu and feature gpu2020

It is recommended to install cryosparc in another location (/mnt/pan or /mnt/vstor) rather than your /home directory. Create a cryosparc version directory in that location (e.g. /mnt/pan/cryoem/<user>/cryosparc/<cryosparc-version>) and change directory (cd) to it.

cd <path-to-cryosparc-version-dir>

Create "environment.sh" file with the following content and update the field accordingly:

#!/bin/bash

install_path="<path-to-the-version-of-cryosparc>"

license_id=< license ID from cryosparc - https://guide.cryosparc.com/licensing>

worker_path="<path-to-the-version-of-cryosparc>/cryosparc2_worker"

cuda_path=/usr/local/cuda

ssd_path=/tmp # use appropriate SSD path that you have access to

user_email="<first-name>.<last-name>@case.edu"

user_name="<caseID>" # your actual caseID

user_password=<password>

user_firstname="<first-name>"

user_lastname="<Last-name>"

export LICENSE_ID=$license_id

Source the environment:

source ./environment.sh

Standalone Installation with CryoSPARC Master and Worker

Follow the "standalone" installation instructions using the "Single Workstation (Master and Worker Combined)" tab - https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/downloading-and-installing-cryosparc

Verify TCP Port

Check the TCP ports 39000 - whether they are already being used (source cryoSparc forum):

netstat -tuplen | grep :3900

output:

(Not all processes could be identified, non-owned process info

will not be shown, you would have to be root to see it all.)

tcp 0 0 0.0.0.0:39001 0.0.0.0:* LISTEN 533150 33147754 7185/mongod

tcp 0 0 0.0.0.0:39002 0.0.0.0:* LISTEN 533150 32909929 7273/python

tcp 0 0 0.0.0.0:39003 0.0.0.0:* LISTEN 533150 33170486 7325/python

tcp 0 0 0.0.0.0:39005 0.0.0.0:* LISTEN 533150 32909946 7330/python

tcp 0 0 0.0.0.0:39006 0.0.0.0:* LISTEN 533150 33170504 7414/node

tcp 0 0 0.0.0.0:39007 0.0.0.0:* LISTEN 533150 33171485 7430/node

tcp 0 0 0.0.0.0:39000 0.0.0.0:* LISTEN 533150 32909961 7405/node

In general, try NOT to use ports under 1024. For your purposes, if your cryoSPARC instance A was hosted at port 39000 (with the installation flag "--port <port_number>") as showed in output above, given cryoSPARC only requires a 10-port range, you can safely instantiate your cryoSPARC instance B on port 39010.

CryoSPARC Installation

Download Tarball files:

curl -L https://get.cryosparc.com/download/master-latest/$LICENSE_ID -o cryosparc_master.tar.gz

curl -L https://get.cryosparc.com/download/worker-latest/$LICENSE_ID -o cryosparc_worker.tar.gz

Extract tar files. It may take several minutes

tar -xvf cryosparc_master.tar.gz cryosparc_master

tar -xvf cryosparc_worker.tar.gz cryosparc_worker

Change directory to cryosparc_master:

cd cryosparc_master/

Install cryosparc_master package. (Note: use different port with the flag "--port <port_number>" if the default port 39000 has already been used.

./install.sh --standalone --license $LICENSE_ID --worker_path $worker_path --cudapath $cuda_path --ssdpath $ssd_path --initial_email $user_email --initial_password $user_password --initial_username $user_name --initial_firstname $user_firstname --initial_lastname $user_lastname

Change Directory to cryosparc_worker:

cd ../cryosparc_worker/

Install cryosparc_worker package:

./install.sh --license $LICENSE_ID --cudapath $cuda_path

Check the Installation Status:

<path to cryosparc_master>/bin/cryosparcm status

output:

----------------------------------------------------------------------------

CryoSPARC System master node installed at

/usr/local/cryosparc/v3.2/cryosparc_master

Current cryoSPARC version: v3.2.0

----------------------------------------------------------------------------

CryoSPARC process status:

app RUNNING pid 9227, uptime 2:14:05

app_dev STOPPED Not started

command_core RUNNING pid 9089, uptime 2:14:25

command_rtp RUNNING pid 9147, uptime 2:14:15

command_vis RUNNING pid 9138, uptime 2:14:16

database RUNNING pid 9004, uptime 2:14:28

liveapp RUNNING pid 9258, uptime 2:14:04

liveapp_dev STOPPED Not started

webapp RUNNING pid 9210, uptime 2:14:07

webapp_dev STOPPED Not started

----------------------------------------------------------------------------

global config variables:

export CRYOSPARC_LICENSE_ID="xxxxx"

export CRYOSPARC_MASTER_HOSTNAME="hpc2"

export CRYOSPARC_DB_PATH="/usr/local/cryosparc/v3.2/cryosparc_database"

export CRYOSPARC_BASE_PORT=39000

export CRYOSPARC_DEVELOP=false

export CRYOSPARC_INSECURE=false

export CRYOSPARC_CLICK_WRAP=true

Run worker command "cryosparcw" from the worker gpu node to include it in a lane. This needs to be done once for each node (if the worker node has not been used previously)-

<path-to-cryosparc_worker>/bin/cryosparcw connect --worker <present-node> --master <present-node> --ssdpath /tmp --lane <name-of-the-lane> --newlane

Check the status of GPUs in one of the worker nodes:

<path-to-cryosparc_worker>/bin/cryosparcw gpulist

output:

Detected 2 CUDA devices.

id pci-bus name

---------------------------------------------------------------

0 0000:02:00.0 GeForce RTX 2080 Ti

1 0000:81:00.0 GeForce RTX 2080 Ti

---------------------------------------------------------------

Display the GPU node(s) in the lane

<path-to-cryosparc_master>/bin/cryosparcm cli "get_worker_nodes()"

output:

[{'cache_path': '/tmp', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 11554717696, 'name': 'GeForce RTX 2080 Ti'}, {'id': 1, 'mem': 11554717696, 'name': 'GeForce RTX 2080 Ti'}], 'hostname': '<gpu-node>', 'lane': 'gpu2080', 'monitor_port': None, 'name': '<gpu-node>', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], 'GPU': [0, 1], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}, 'ssh_str': '<caseID>@<gpu-node>', 'title': 'Worker node <gpu-node>', 'type': 'node', 'worker_bin_path': '/usr/local/cryosparc/v3.2/cryosparc_worker/bin/cryosparcw'},

Running CryoSPARC from the Portal

Open firefox browser from the compute node or use ssh tunneling. The ssh tunneling instead of the firefox on the HPC seems to be much more reliable and have a faster response.

SSH Tunnel to CryoSPARC:

ssh -N -L 39000:<ip-address>:39000 <caseID>@pioneer.case.edu

Open a web browser from your PC and navigate to http://localhost:39000

Firefox

From the compute node, open the firefox borwser:

firefox &

Access the cryosparc portal from the browser by typing http://localhost:39000/ and using your credentials that you used in the "environment.sh" file.

If you are accessing for the first time, Accept terms.

At the bottom, click on Resource Manager and find "Instance Information" tab:

gpu2080

Target 1: <gpu-node>NODE

Cores20

Memory128 GB

GPUs2

Worker bin path/usr/local/cryosparc/v3.2/cryosparc_worker/bin/cryosparcw

Hostname<gpu-node>

Name<gpu-node>

Cache path/tmp

Cache quota (MB)

SSH String<caseID>@<gpu-node>

Cache Reserve (MB)10000

Testing:

For testing, follow the instructions at Data Processing Introductory Tutorial - https://guide.cryosparc.com/processing-data/cryo-em-data-processing-in-cryosparc-introductory -tutorial

Once the job is completed, stop the cryosparc process.

<path to cryosparc_master>/bin/cryosparcm stop

After Installation and testing

When you request the GPU node, you may get different GPU node. So, you need to change the name of the master node in "config.sh" file at path to cryosparc_master accordingly.

export CRYOSPARC_MASTER_HOSTNAME="<gpu-node>"

Also, the default TCP port (39000) may have already been used in that server. Check the section "Verify TCP Ports" above and use the unused ports by including it in "config.sh file".

export CRYOSPARC_BASE_PORT=<port>

Start cryosparc process

<path to cryosparc_master>/bin/cryosparcm start

Follow the procedure above starting from running worker command "cryosparcw" from the new worker gpu node to include it in a lane

Troubleshooting

Please refer to cryosparc FAQ or contact cryosparc support [1] if you encounter issues with cryosparc. Some of the frequently encountered issues are included below:

If you accidentally leave the node while cryosparc is running, you will need to remove the *.sock file at /tmp before starting cryosparc process. You may also need to kill any other remaining processes that may be interfering using "kill <PID> after getting them using commands below:

ps -ax | grep "supervisord"

ps -ax | grep "cryosparc2_command"

ps -ax | grep "mongod"

Database Failure: Please kill the mongo processes, delete the lock files, and restart cryoSPARC.

ps -ax | grep “mongod”

kill <process_pid>

//delete the .lock file at <cryosparc-install-dir>/cryosparc_database

cryosparcm start

Database Spawn Error: To get around this, change the name of the cryosparc database directory in the install folder. Then start cryosparc and then immediately stop it. You can then merger the renamed database back into the newly spawned database. Cryosparc should then run normally and the database information should be preserved. It is a good idea to keep the backup database until you confirm everything is running smoothly and the projects are in place.

stop cryosparc
copy the database folder onto another folder:
- cp -rav cryosparc2_database cryosparc_database_backup
remove the original folder:
- rm -rf cryosparc2_database
start cryosparc --> this will recreate the database
stop cryosparc
sync the files
- rsync cryosparc2_database_backup cryosparc2_database --delete-after --progress
start cryosparc again

References:

Cryosparc Help: https://discuss.cryosparc.com/ -> Login -> New Topic -> Submit
CryoSparc installation - https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/downloading-and-installing-cryosparc
Guide; https://guide.cryosparc.com/
Command Line Guide: https://guide.cryosparc.com/setup-configuration-and-management/management-and-monitoring/cryosparcm
Data Processing Introductory Tutorial - https://guide.cryosparc.com/processing-data/cryo-em-data-processing-in-cryosparc-introductory-tutorial