Independent R Tasks Example
If your jobs can run independently on different nodes, the example below may give you a fast start
Here we will implement: Multiple independent jobs on different nodes. Parallel execution within each node.
Slurm provides a job array mechanism.
* submit.sbatch
#!/bin/bash#SBATCH --job-name=my-job#SBATCH --nodes=1#SBATCH --cpus-per-task=4#SBATCH --mem=5GB#SBATCH --time=24:00:00#SBATCH --array=1-8# env. variable SLURM_ARRAY_TASK_ID will be availablemodule purgemodule load r/gcc/4.1.0R --no-save -q -f main.RThis will submit 8 jobs, each using 4 cpus and 1 node
* main.R
example based on https://www.glennklockwood.com/data-intensive/r/foreach-parallelism.html
We will run multiple independent jobs on different nodes. Parallel execution within each node.
Some of our users have to have independent directory structure for every job, and thus we will do that in this example
Let us assume your job files are inside directory R-test. We will use the following script, which will copy your whole project directory, and then run an sbatch job:
* bash script submitting multiple jobs (say 'run_experiment.sh')
########################################################################### * create file run_experiment.sh in directory containing R-test directory# * make it executable:# chmod +x run_experiment.sh# * run desired job number# ./run_experiment.sh 2########################################################################### experiment numberexperiment=$1if [ -z "$experiment" ]; then echo "Please provide number of the experiment after the command call" exit 0fi## check if directory already existsif [ -d "R-test-$experiment" ];then echo "Directory R-test-$experiment already exists. Exiting" exit 0ficp -r R-test R-test-$experimentcd R-test-$experimentsbatch submit.sbatch* batch file ('submit.sbatch') with parameter
#!/bin/bash#SBATCH --job-name=my_job#SBATCH --nodes=1#SBATCH --cpus-per-task=4#SBATCH --mem=5GB#SBATCH --time=24:00:00module purgemodule load r/gnu/3.5.1R --no-save -q -f main.R* R script accepting parameter
The above scripts were based on this example.
Make sure your R script uses relative paths, when you are using this approach.
(You actually want to use relative paths all the time, to make your projects movable/reproducible/better_manageable with R projects, renv, github/bitbucket, etc.)
data <- read.csv('dataset.csv')library(future)n_cores <- availableCores()library(foreach)library(doMC)registerDoMC(n_cores)results <- foreach( i = c(25,25,25,25) ) %dopar% { kmeans( x=data, centers=4, nstart=i )}write.csv(results, file = "results.RData")