CryoSPARC
What is CryoSPARC?
CryoSPARC is a state of the art scientific software platform for cryo-electron microscopy (cryo-EM) used in research and drug discovery pipelines. More information about CryoSPARC can be found on their website.
Building CryoSPARC on the Cluster
CryoSPARC needs to be built and run on the GPUs as it is designed with that accelerator as its main computational device. It also relies heavily on the SSD as its main storage for processing the data.
Requesting GPU node and Environment
Request a GPU node with appropriate partition or feature (see HPC resource view for GPU resources and GPU cards with SSD drives . Please use /tmp ($TMPDIR) as scratch space to use SSDs. That will be your cryosparc master node. Request a GPU node with all GPU cards available in that node (see HPC resource view). Here, 16gb of memory is requested. Please assign the memory value accordingly as per your job memory requirement.
srun -p gpu -C gpu2080 --gres=gpu:2 --x11 --mem=16gb --pty bash # using partition gpu and feature gpu2020
It is recommended to install cryosparc in another location (/mnt/pan or /mnt/vstor) rather than your /home directory. Create a cryosparc version directory in that location (e.g. /mnt/pan/cryoem/<user>/cryosparc/<cryosparc-version>) and change directory (cd) to it.
cd <path-to-cryosparc-version-dir>
Create "environment.sh" file with the following content and update the field accordingly:
#!/bin/bash
install_path="<path-to-the-version-of-cryosparc>"
license_id=< license ID from cryosparc - https://guide.cryosparc.com/licensing>
worker_path="<path-to-the-version-of-cryosparc>/cryosparc2_worker"
cuda_path=/usr/local/cuda
ssd_path=/tmp # use appropriate SSD path that you have access to
user_email="<first-name>.<last-name>@case.edu"
user_name="<caseID>" # your actual caseID
user_password=<password>
user_firstname="<first-name>"
user_lastname="<Last-name>"
export LICENSE_ID=$license_id
Source the environment:
source ./environment.sh
Standalone Installation with CryoSPARC Master and Worker
Follow the "standalone" installation instructions using the "Single Workstation (Master and Worker Combined)" tab - https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/downloading-and-installing-cryosparc
Verify TCP Port
Check the TCP ports 39000 - whether they are already being used (source cryoSparc forum):
netstat -tuplen | grep :3900
output:
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:39001 0.0.0.0:* LISTEN 533150 33147754 7185/mongod
tcp 0 0 0.0.0.0:39002 0.0.0.0:* LISTEN 533150 32909929 7273/python
tcp 0 0 0.0.0.0:39003 0.0.0.0:* LISTEN 533150 33170486 7325/python
tcp 0 0 0.0.0.0:39005 0.0.0.0:* LISTEN 533150 32909946 7330/python
tcp 0 0 0.0.0.0:39006 0.0.0.0:* LISTEN 533150 33170504 7414/node
tcp 0 0 0.0.0.0:39007 0.0.0.0:* LISTEN 533150 33171485 7430/node
tcp 0 0 0.0.0.0:39000 0.0.0.0:* LISTEN 533150 32909961 7405/node
In general, try NOT to use ports under 1024. For your purposes, if your cryoSPARC instance A was hosted at port 39000 (with the installation flag "--port <port_number>") as showed in output above, given cryoSPARC only requires a 10-port range, you can safely instantiate your cryoSPARC instance B on port 39010.
CryoSPARC Installation
Download Tarball files:
curl -L https://get.cryosparc.com/download/master-latest/$LICENSE_ID -o cryosparc_master.tar.gz
curl -L https://get.cryosparc.com/download/worker-latest/$LICENSE_ID -o cryosparc_worker.tar.gz
Extract tar files. It may take several minutes
tar -xvf cryosparc_master.tar.gz cryosparc_master
tar -xvf cryosparc_worker.tar.gz cryosparc_worker
Change directory to cryosparc_master:
cd cryosparc_master/
Install cryosparc_master package. (Note: use different port with the flag "--port <port_number>" if the default port 39000 has already been used.
./install.sh --standalone --license $LICENSE_ID --worker_path $worker_path --cudapath $cuda_path --ssdpath $ssd_path --initial_email $user_email --initial_password $user_password --initial_username $user_name --initial_firstname $user_firstname --initial_lastname $user_lastname
Change Directory to cryosparc_worker:
cd ../cryosparc_worker/
Install cryosparc_worker package:
./install.sh --license $LICENSE_ID --cudapath $cuda_path
Check the Installation Status:
<path to cryosparc_master>/bin/cryosparcm status
output:
----------------------------------------------------------------------------
CryoSPARC System master node installed at
/usr/local/cryosparc/v3.2/cryosparc_master
Current cryoSPARC version: v3.2.0
----------------------------------------------------------------------------
CryoSPARC process status:
app RUNNING pid 9227, uptime 2:14:05
app_dev STOPPED Not started
command_core RUNNING pid 9089, uptime 2:14:25
command_rtp RUNNING pid 9147, uptime 2:14:15
command_vis RUNNING pid 9138, uptime 2:14:16
database RUNNING pid 9004, uptime 2:14:28
liveapp RUNNING pid 9258, uptime 2:14:04
liveapp_dev STOPPED Not started
webapp RUNNING pid 9210, uptime 2:14:07
webapp_dev STOPPED Not started
----------------------------------------------------------------------------
global config variables:
export CRYOSPARC_LICENSE_ID="xxxxx"
export CRYOSPARC_MASTER_HOSTNAME="hpc2"
export CRYOSPARC_DB_PATH="/usr/local/cryosparc/v3.2/cryosparc_database"
export CRYOSPARC_BASE_PORT=39000
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_INSECURE=false
export CRYOSPARC_CLICK_WRAP=true
Run worker command "cryosparcw" from the worker gpu node to include it in a lane. This needs to be done once for each node (if the worker node has not been used previously)-
<path-to-cryosparc_worker>/bin/cryosparcw connect --worker <present-node> --master <present-node> --ssdpath /tmp --lane <name-of-the-lane> --newlane
Check the status of GPUs in one of the worker nodes:
<path-to-cryosparc_worker>/bin/cryosparcw gpulist
output:
Detected 2 CUDA devices.
id pci-bus name
---------------------------------------------------------------
0 0000:02:00.0 GeForce RTX 2080 Ti
1 0000:81:00.0 GeForce RTX 2080 Ti
---------------------------------------------------------------
Display the GPU node(s) in the lane
<path-to-cryosparc_master>/bin/cryosparcm cli "get_worker_nodes()"
output:
[{'cache_path': '/tmp', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 11554717696, 'name': 'GeForce RTX 2080 Ti'}, {'id': 1, 'mem': 11554717696, 'name': 'GeForce RTX 2080 Ti'}], 'hostname': '<gpu-node>', 'lane': 'gpu2080', 'monitor_port': None, 'name': '<gpu-node>', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], 'GPU': [0, 1], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}, 'ssh_str': '<caseID>@<gpu-node>', 'title': 'Worker node <gpu-node>', 'type': 'node', 'worker_bin_path': '/usr/local/cryosparc/v3.2/cryosparc_worker/bin/cryosparcw'},
Running CryoSPARC from the Portal
Open firefox browser from the compute node or use ssh tunneling. The ssh tunneling instead of the firefox on the HPC seems to be much more reliable and have a faster response.
SSH Tunnel to CryoSPARC:
ssh -N -L 39000:<ip-address>:39000 <caseID>@pioneer.case.edu
Open a web browser from your PC and navigate to http://localhost:39000
OR
Firefox
From the compute node, open the firefox borwser:
firefox &
Access the cryosparc portal from the browser by typing http://localhost:39000/ and using your credentials that you used in the "environment.sh" file.
If you are accessing for the first time, Accept terms.
At the bottom, click on Resource Manager and find "Instance Information" tab:
gpu2080
Target 1: <gpu-node>NODE
Cores20
Memory128 GB
GPUs2
Worker bin path/usr/local/cryosparc/v3.2/cryosparc_worker/bin/cryosparcw
Hostname<gpu-node>
Name<gpu-node>
Cache path/tmp
Cache quota (MB)
SSH String<caseID>@<gpu-node>
Cache Reserve (MB)10000
Testing:
For testing, follow the instructions at Data Processing Introductory Tutorial - https://guide.cryosparc.com/processing-data/cryo-em-data-processing-in-cryosparc-introductory -tutorial
Once the job is completed, stop the cryosparc process.
<path to cryosparc_master>/bin/cryosparcm stop
After Installation and testing
When you request the GPU node, you may get different GPU node. So, you need to change the name of the master node in "config.sh" file at path to cryosparc_master accordingly.
export CRYOSPARC_MASTER_HOSTNAME="<gpu-node>"
Also, the default TCP port (39000) may have already been used in that server. Check the section "Verify TCP Ports" above and use the unused ports by including it in "config.sh file".
export CRYOSPARC_BASE_PORT=<port>
Start cryosparc process
<path to cryosparc_master>/bin/cryosparcm start
Follow the procedure above starting from running worker command "cryosparcw" from the new worker gpu node to include it in a lane
Troubleshooting
Please refer to cryosparc FAQ or contact cryosparc support [1] if you encounter issues with cryosparc. Some of the frequently encountered issues are included below:
If you accidentally leave the node while cryosparc is running, you will need to remove the *.sock file at /tmp before starting cryosparc process. You may also need to kill any other remaining processes that may be interfering using "kill <PID> after getting them using commands below:
ps -ax | grep "supervisord"
ps -ax | grep "cryosparc2_command"
ps -ax | grep "mongod"
Database Failure: Please kill the mongo processes, delete the lock files, and restart cryoSPARC.
ps -ax | grep “mongod”
kill <process_pid>
//delete the .lock file at <cryosparc-install-dir>/cryosparc_database
cryosparcm start
Database Spawn Error: To get around this, change the name of the cryosparc database directory in the install folder. Then start cryosparc and then immediately stop it. You can then merger the renamed database back into the newly spawned database. Cryosparc should then run normally and the database information should be preserved. It is a good idea to keep the backup database until you confirm everything is running smoothly and the projects are in place.
stop cryosparc
copy the database folder onto another folder:
cp -rav cryosparc2_database cryosparc_database_backup
remove the original folder:
rm -rf cryosparc2_database
start cryosparc --> this will recreate the database
stop cryosparc
sync the files
rsync cryosparc2_database_backup cryosparc2_database --delete-after --progress
start cryosparc again
References:
Cryosparc Help: https://discuss.cryosparc.com/ -> Login -> New Topic -> Submit
CryoSparc installation - https://guide.cryosparc.com/setup-configuration-and-management/how-to-download-install-and-configure/downloading-and-installing-cryosparc
Guide; https://guide.cryosparc.com/
Command Line Guide: https://guide.cryosparc.com/setup-configuration-and-management/management-and-monitoring/cryosparcm
Data Processing Introductory Tutorial - https://guide.cryosparc.com/processing-data/cryo-em-data-processing-in-cryosparc-introductory-tutorial