Independent R Tasks Example

If your jobs can run independently on different nodes, the example below may give you a fast start

Here we will implement: Multiple independent jobs on different nodes. Parallel execution within each node. 

Slurm provides a job array mechanism.

* submit.sbatch

#!/bin/bash#SBATCH --job-name=my-job#SBATCH --nodes=1#SBATCH --cpus-per-task=4#SBATCH --mem=5GB#SBATCH --time=24:00:00#SBATCH --array=1-8# env. variable SLURM_ARRAY_TASK_ID will be availablemodule purgemodule load r/gcc/4.1.0R --no-save -q -f main.R

This will submit 8 jobs, each using 4 cpus and 1 node

* main.R

example based on https://www.glennklockwood.com/data-intensive/r/foreach-parallelism.html

data <- read.csv('dataset.csv')library(future)n_cores <- availableCores()library(foreach)library(doMC)registerDoMC(n_cores)taskId <- Sys.getenv("SLURM_ARRAY_TASK_ID")results <- foreach( i = c(25,25,25,25) ) %dopar% {    set.seed(as.numeric(taskId))    kmeans( x=data, centers=4, nstart=i )}write.csv(results, file = file.path(paste0("output_", taskID), "results.RData"))

Let us assume your job files are inside directory R-test. We will use the following script, which will copy your whole project directory, and then run an sbatch job:

* bash script submitting multiple jobs (say 'run_experiment.sh')

########################################################################### * create file run_experiment.sh in directory containing R-test directory# * make it executable:# chmod +x run_experiment.sh# * run desired job number# ./run_experiment.sh 2########################################################################### experiment numberexperiment=$1if [ -z "$experiment" ]; then    echo "Please provide number of the experiment after the command call"    exit 0fi## check if directory already existsif [ -d "R-test-$experiment" ];then    echo "Directory R-test-$experiment already exists. Exiting"    exit 0ficp -r R-test R-test-$experimentcd R-test-$experimentsbatch submit.sbatch

* batch file ('submit.sbatch') with parameter

#!/bin/bash#SBATCH --job-name=my_job#SBATCH --nodes=1#SBATCH --cpus-per-task=4#SBATCH --mem=5GB#SBATCH --time=24:00:00module purgemodule load r/gnu/3.5.1R --no-save -q -f main.R

* R script accepting parameter

The above scripts were based on this example.

Make sure your R script uses relative paths, when you are using this approach.

(You actually want to use relative paths all the time, to make your projects movable/reproducible/better_manageable with R projects, renv, github/bitbucket, etc.)

data <- read.csv('dataset.csv')library(future)n_cores <- availableCores()library(foreach)library(doMC)registerDoMC(n_cores)results <- foreach( i = c(25,25,25,25) ) %dopar% {    kmeans( x=data, centers=4, nstart=i )}write.csv(results, file = "results.RData")