R is a language and environment for statistical computing and graphics. It is similar to the S language and environment developed by John Chambers and colleagues at Bell Labs. Much code written for S runs under R. R provides a variety of statistical and graphical techniques, and is highly extensible.
Users can install the specific R modules (packages) in their home directory if it is not available in the installed versions of R. Follow the instructions at Software Installation Guide - R Packages. If you encounter any issue, contact hpc-supportATcase.edu.
All the available versions of R for use can be viewed by issuing the following command. This applies for other applications as well.
module spider R
output:
---------------------------------------------------------------------
R:
---------------------------------------------------------------------
Description:
R is a free software environment for statistical computing and graphics.
Versions:
R/4.0.3-foss-2020b
R/4.1.0-foss-2021a
R/4.1.2-foss-2021b
R/4.1.3-foss-2021a
R/4.2.0-foss-2021b
R/4.2.1-foss-2021a
R/4.2.1-foss-2021b
R/4.2.1-foss-2022a
R/4.2.2-foss-2022a
R/4.2.2-foss-2022b
---------------------------------------------------------------
As usual, for detailed information about a specific "R" package (including how to load the modules) use the module's full name.
Load the R version that is desired directly,. Consider whether to purge already load modules if in an ongoing session:
module purge # optional
module load R/4.2.1-foss-2022a
To use R interactively for a serial job, first request interactive use of a processor in a compute node
srun --x11 -N 1 -c 1 --time=1:00:00 --pty /bin/bash
and wait until you are connected to a shell in a compute node. (The default length of an interactive session is ten hours.) Then load the R module, and potentially identify the path to any custom modules that you may have installed for this version of R:
$ module load R/4.2.1-foss-2022a
$ export R_LIBS_USER=/home/mrd20/.local/pioneer/R/4.2.1
and then invoke R using the shell command "R":
[<caseID>@<computeNode> $ R
R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
Type 'q()' to quit R.
>x <- c(10.4, 5.6, 3.1, 6.4, 21.7)
> x
[1] 10.4 5.6 3.1 6.4 21.7
> 1/x
[1] 0.09615385 0.17857143 0.32258065 0.15625000 0.04608295
> q()
Save workspace image? [y/n/c]: n
You can also check the libraries available in R
Packages in library ‘/home/mrd20/.local/pioneer/R/4.2.1’:
fontBitstreamVera Fonts with 'Bitstream Vera Fonts' License
fontLiberation Liberation Fonts
fontquiver Set of Installed Fonts
gfonts Offline 'Google' Fonts for 'Markdown' and 'Shiny'
Packages in library ‘/usr/local/easybuild_allnodes/software/R/4.2.1-foss-2022a/lib64/R/library’:
abc Tools for Approximate Bayesian Computation (ABC)
abc.data Data Only: Tools for Approximate Bayesian Computation (ABC)
abe Augmented Backward Elimination
abind Combine Multidimensional Arrays
...
If you want to install the custom libraries not included with the system-installed R, please refer to Software Installation Guide - R Packages.
You can run R script from the command line using source command:
> source("R-script")
You can also run the script from the terminal by typing:
R CMD BATCH <scriptfile.r>
For interactive use of R that requires multiple processors, modify the call to srun requesting 4 processors:
srun --x11 -N 1 -n 4 --time=1:00:00 --pty /bin/bash
After getting connected to the shell on the assigned compute node, load the needed modules; OpenMPI has been loaded by default.
module load R/4.2.1-foss-2022a
Then start R and load the 'parallel' package:
>library(parallel)
>
After exiting from R, remember to also exit from the shell on the compute node that was allocated using the "logout" or "exit" command.
To run an R batch job, prepare a SLURM script that contains the command to load the R module. A simple single-processor example is as follows:
#!/bin/bash
#SBATCH -t 1:00:00
#SBATCH -c 1
#SBATCH -o serial-R.out%j # capture jobid in output file name
# load the R module
module load R/4.2.1-foss-2022a
# copy files to scratch space for job execution
# (not strictly necessary unless data size is large)
cp R-example/* $PFSDIR
cd $PFSDIR
# Run R script
R CMD BATCH <scriptfile.r>
# copy all work from scratch-space back to submit directory
cp * $SLURM_SUBMIT_DIR
In the above sample script, note that 'input.file' should be replaced by the name of your input file containing R commands to be carried out.
There are different flavors of parallelism in R. Spend time with the primary R documentation [1] to understand the methods before trying R scripts authored by others.
Allocating resources depends on using mpi protocol, or parallel process based methods
#!/bin/bash
#SBATCH --time=10:00:00
#SBATCH -N 2 -n 4 # -n for mpi tasks, usually reserved when requiring more than one node
#SBATCH -N 1 -c 8 # -c for non-mpi processes
#
RStudio is a free and open-source integrated development environment for R. For more information visit the HPC RStudio Guide.
[1] https://cran.r-project.org/doc/manuals/r-release/R-intro.html