Cluster/HPC configuration

In computer clusters or High-Performance Computing environments CONN can automatically distribute your processing/analyses across multiple nodes/CPUs. This can result in a very significant reduction in processing time, and, given enough available nodes/CPUs, it allows the processing and analysis of hundreds or thousands of subjects in the time it would normally take to process a single or a few subjects. CONN automatically handles all of the complexities associated with dividing your processing/analysis step into multiple jobs, submitting and tracking each job, and merging their results, independently of the underlying architecture or job-scheduler technology used (e.g. distributed cluster environments, or a single multiple-processor machine)

To configure CONN in a distributed cluster, HPC, or multi-processor environment follow the steps described in the sections below.

To use your cluster computing resources from CONN's graphical interface simply select the 'distributed processing' option when starting your analysis/processing steps. To access the same functionality from batch scripts simply include a batch.Setup.parallel field in your batch structure and describe there the number of jobs and desired parallelization profile.

Basic configuration settings in distributed cluster or multi-processor environments

Pre-defined cluster configuration options are available for the following schedulers/environments:

 Grid Engine : for Sun/Oracle Grid Engine, StarCluster, Open Grid Scheduler, or compatible
               e.g. Amazon EC2, NITRC-CE, Boston University SCC, UCLA Hoffman

 PBS/Torque  : for Portable Batch System, or compatible
               e.g. MIT Mindhive, MGH Launchpad, Yale Omega

 LSF         : for Platform Load Sharing Facility, or compatible
               e.g. HMS Orchestra, Yale Grace

 Slurm       : for Simple Linux Utility for Resource Management, or compatible
               e.g. MIT Openmind, NIH Biowulf, Berkeley Savio, Princeton Tiger, Stanford Sherlock

 Background  : for running multiple single-processor background jobs in your local machine
               e.g. any Mac or Linux multi-processor system (no cluster environment required)

After installing CONN (either Matlab- or standalone- releases), configure your cluster/HPC settings using one of the following methods:

Method 1: using the GUI (recommended)

In CONN's gui, select the Tools.Cluster/HPC Settings menu

Then select one of the default configuration profiles (GridEngine, PBS, LSF, Slurm, or Background) and click 'Test profile'. During a test CONN will attempt to submit simple jobs using the specified configuration options. CONN will then track the jobs evolution, and will evaluate whether they finish correctly. This test may take up to a few minutes to complete. If you see a 'Test finished correctly' message, the configuration options are working correctly.

note: if you see a 'failed' message, select 'Advanced options' and 'See log' to look at the different logs recorded by CONN during this test and evaluate the potential reason for the failure.

After successfully testing the cluster configuration settings, simply select this as your default profile and click 'Save' to save these settings for future sessions and/or for other users.

Method 2: using Matlab or Linux commands (for text-only environments)

Type the following command to configure your system to use the pre-defined configuration options for Grid Engine clusters, and have those settings apply to all CONN users:

conn jobmanager setdefault 'Grid Engine' save all

note: if installing CONN Matlab release type the above syntax in the Matlab command-window; if installing CONN standalone release type the same syntax in your linux terminal command-line

The "save all" option in the command above will store configuration settings in your CONN installation folder (this requires write permissions into this folder). This configuration settings will then be available to all users that use CONN in your cluster. Using "save current" instead will store configuration settings in the current user home folder ~/ and these configuration settings will then be available only to this user. See help conn_jobmanager for additional options.

Advanced configuration settings in distributed cluster / HPC environments

Use the field 'in-line additional submit options' to define optional job specifications

In-line additional submit options can be used by system administrators to facilitate user-based edits to a general-purpose parallelization profile. For example the default command to submit a job on a 'Grid Engine' environment is:

qsub -N JOBLABEL -e STDERR -o STDOUT OPTS SCRIPT

The OPTS entry can be then defined separately through the cmd_submitoptions field (batch.parallel.cmd_submitoptions when using batch commands; or 'in-line additional submit settings' field when using the GUI). While this field is generally left empty, individual users may alter it to add their desired configuration options to the qsub command. For example, a user may enter in the cmd_submitoptions field the line

-l h_rt=24:00:00 -m ae

in order to request 24 hours walltime and email notification for his/her jobs.

In addition, administrators may also define directly the cmd_submitoptions field and add a '?' symbol at the end in order to request user-input as part of the qsub command. For example, entering in the cmd_submitoptions field the line:

-q [queue name:myqueue] -A [account name:myaccount]?

will have CONN query users for their queue and account names before submitting jobs, and then automatically include the user responses as part of the qsub command.

Use the field 'in-file additional submit options' for additional configuration and/or initialization steps

Typically, an example script submitted by CONN (Matlab release) to your job scheduler and then run by an individual node/CPU may look like this:

#!/bin/bash
/usr/local/apps/matlab-2013a/bin/matlab -nodesktop -nodisplay -nosplash -singleCompThread -logfile '/projectnb/busplab/connectomedb/conn_vol/conn_hcp.qlog/161013132422408/node.0001161013132422408.stdlog' -r "addpath /project/busplab/software/spm12; addpath /project/busplab/software/conn; cd /projectnb/busplab/connectomedb/conn_vol/conn_hcp.qlog/161013132422408; conn_jobmanager('rexec','/projectnb/busplab/connectomedb/conn_vol/conn_hcp.qlog/161013132422408/node.0001161013132422408.mat'); exit"
echo _NODE END_

while an example script submitted by CONN (standalone release) may look like this:

#!/bin/bash
/share/pkg/conn_standalone/R2017a/run_conn.sh /share/pkg/mcr/9.2/install/v92 jobmanager rexec '/projectnb/busplab/connectomedb/conn_vol/conn_hcp.qlog/170605215826702/node.0001170605215826702.mat'
echo _NODE END_

Any commands defined in the in-file additional submit options field will be automatically added to these scripts right after the #!/bin/bash line (and before the line calling Matlab- or standalone- conn). This can be used for a variety of purposes. Some job schedulers are able to automatically interpret configuration settings from these beginning lines: for example, adding the lines

#$ -m ae 
#$ -M myname@gmail.com 

may be used in a Grid Engine environment to request the job scheduler to send an email when jobs are aborted or end normally.

In addition, you may also simply add your own initialization routines in these lines, such as:

module load mcr/9.2
module load conn_standalone/R2017a

to have the individual cluster nodes add the MCR and CONN modules before executing the requested CONN jobs.

Use user-specific cluster-configuration settings for added flexibility

User-specific configuration settings (those saved to a user home folder, e.g. using the "save current" option) take precedence over global configuration settings (those saved to the conn installation folder, e.g. using the "save all" option). Administrators may define global cluster-configuration settings that users may then fine-tune to their specific requirements (e.g. a user may change a general "-A [account]?" value to "-A brainlab" and save the new settings using the "save current" option in order to stop CONN from asking an account name when submitting jobs)

Mixed Matlab/standalone environments

If both Matlab- and standalone- CONN releases are installed in your system, by default CONN will have individual cluster nodes use the same release as the job-submitting node. For example, preprocesing your data using parallelization options from CONN's standalone release will have all involved distributed cluster nodes using the same standalone release, while running the same procedure from a Matlab release will have all nodes using the same Matlab release. If you prefer submitted jobs to always use the standalone release, independently of the release used to start the parallelization procedure (e.g to avoid depletion of available Matlab licenses, while still allowing users to use CONN's Matlab release interactively), simply check the 'nodes use pre-compiled CONN only' field when defining your cluster configuration settings.

www.conn-toolbox.org

Visit the conn toolbox public forum for any questions about CONN