Independent R Tasks Example

If your jobs can run independently on different nodes, the example below may give you a fast start

Here we will implement: Multiple independent jobs on different nodes. Parallel execution within each node.

* submit.sbatch

#!/bin/bash#SBATCH --job-name=my-job#SBATCH --nodes=1#SBATCH --cpus-per-task=4#SBATCH --mem=5GB#SBATCH --time=24:00:00#SBATCH --array=1-8# env. variable SLURM_ARRAY_TASK_ID will be availablemodule purgemodule load r/gcc/4.1.0R --no-save -q -f main.R

This will submit 8 jobs, each using 4 cpus and 1 node

* main.R

example based on https://www.glennklockwood.com/data-intensive/r/foreach-parallelism.html

data <- read.csv('dataset.csv')library(future)n_cores <- availableCores()library(foreach)library(doMC)registerDoMC(n_cores)taskId <- Sys.getenv("SLURM_ARRAY_TASK_ID")results <- foreach( i = c(25,25,25,25) ) %dopar% { set.seed(as.numeric(taskId)) kmeans( x=data, centers=4, nstart=i )}write.csv(results, file = file.path(paste0("output_", taskID), "results.RData"))

We will run multiple independent jobs on different nodes. Parallel execution within each node.
Some of our users have to have independent directory structure for every job, and thus we will do that in this example

Let us assume your job files are inside directory R-test. We will use the following script, which will copy your whole project directory, and then run an sbatch job:

* bash script submitting multiple jobs (say 'run_experiment.sh')

########################################################################### * create file run_experiment.sh in directory containing R-test directory# * make it executable:# chmod +x run_experiment.sh# * run desired job number# ./run_experiment.sh 2########################################################################### experiment numberexperiment=$1if [ -z "$experiment" ]; then echo "Please provide number of the experiment after the command call" exit 0fi## check if directory already existsif [ -d "R-test-$experiment" ];then echo "Directory R-test-$experiment already exists. Exiting" exit 0ficp -r R-test R-test-$experimentcd R-test-$experimentsbatch submit.sbatch

* batch file ('submit.sbatch') with parameter

#!/bin/bash#SBATCH --job-name=my_job#SBATCH --nodes=1#SBATCH --cpus-per-task=4#SBATCH --mem=5GB#SBATCH --time=24:00:00module purgemodule load r/gnu/3.5.1R --no-save -q -f main.R

* R script accepting parameter

The above scripts were based on this example.

Make sure your R script uses relative paths, when you are using this approach.

(You actually want to use relative paths all the time, to make your projects movable/reproducible/better_manageable with R projects, renv, github/bitbucket, etc.)

data <- read.csv('dataset.csv')library(future)n_cores <- availableCores()library(foreach)library(doMC)registerDoMC(n_cores)results <- foreach( i = c(25,25,25,25) ) %dopar% { kmeans( x=data, centers=4, nstart=i )}write.csv(results, file = "results.RData")

Page updated

Report abuse