R Project
Under Development
R
R is a language and environment for statistical computing and graphics. It is similar to the S language and environment developed by John Chambers and colleagues at Bell Labs. Much code written for S runs under R. R provides a variety of statistical and graphical techniques, and is highly extensible.
Important Notes
Users can install the specific R modules (packages) in their home directory if it is not available in the installed versions of R. Follow the instructions at Software Installation Guide - R Packages. If you encounter any issue, contact hpc-supportATcase.edu.
Installed Versions
All the available versions of R for use can be viewed by issuing the following command. This applies for other applications as well.
module spider R
output:
---------------------------------------------------------------------
R:
---------------------------------------------------------------------
Description:
R is a free software environment for statistical computing and graphics.
Versions:
R/4.0.3-foss-2020b
R/4.1.0-foss-2021a
R/4.1.2-foss-2021b
R/4.1.3-foss-2021a
R/4.2.0-foss-2021b
R/4.2.1-foss-2021a
R/4.2.1-foss-2021b
R/4.2.1-foss-2022a
R/4.2.2-foss-2022a
R/4.2.2-foss-2022b
---------------------------------------------------------------
As usual, for detailed information about a specific "R" package (including how to load the modules) use the module's full name.
Load the R version that is desired directly,. Consider whether to purge already load modules if in an ongoing session:
module purge # optional
module load R/4.2.1-foss-2022a
Interactive Job Submission
Interactive Serial
To use R interactively for a serial job, first request interactive use of a processor in a compute node
srun --x11 -N 1 -c 1 --time=1:00:00 --pty /bin/bash
and wait until you are connected to a shell in a compute node. (The default length of an interactive session is ten hours.) Then load the R module, and potentially identify the path to any custom modules that you may have installed for this version of R:
$ module load R/4.2.1-foss-2022a
$ export R_LIBS_USER=/home/mrd20/.local/pioneer/R/4.2.1
and then invoke R using the shell command "R":
[<caseID>@<computeNode> $ R
R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
Type 'q()' to quit R.
>x <- c(10.4, 5.6, 3.1, 6.4, 21.7)
> x
[1] 10.4 5.6 3.1 6.4 21.7
> 1/x
[1] 0.09615385 0.17857143 0.32258065 0.15625000 0.04608295
> q()
Save workspace image? [y/n/c]: n
You can also check the libraries available in R
Packages in library ‘/home/mrd20/.local/pioneer/R/4.2.1’:
fontBitstreamVera Fonts with 'Bitstream Vera Fonts' License
fontLiberation Liberation Fonts
fontquiver Set of Installed Fonts
gfonts Offline 'Google' Fonts for 'Markdown' and 'Shiny'
Packages in library ‘/usr/local/easybuild_allnodes/software/R/4.2.1-foss-2022a/lib64/R/library’:
abc Tools for Approximate Bayesian Computation (ABC)
abc.data Data Only: Tools for Approximate Bayesian Computation (ABC)
abe Augmented Backward Elimination
abind Combine Multidimensional Arrays
...
If you want to install the custom libraries not included with the system-installed R, please refer to Software Installation Guide - R Packages.
You can run R script from the command line using source command:
> source("R-script")
You can also run the script from the terminal by typing:
R CMD BATCH <scriptfile.r>
Interactive Parallel
For interactive use of R that requires multiple processors, modify the call to srun requesting 4 processors:
srun --x11 -N 1 -n 4 --time=1:00:00 --pty /bin/bash
After getting connected to the shell on the assigned compute node, load the needed modules; OpenMPI has been loaded by default.
module load R/4.2.1-foss-2022a
Then start R and load the 'parallel' package:
>library(parallel)
>
After exiting from R, remember to also exit from the shell on the compute node that was allocated using the "logout" or "exit" command.
Batch Job Submission
Serial Job
To run an R batch job, prepare a SLURM script that contains the command to load the R module. A simple single-processor example is as follows:
#!/bin/bash
#SBATCH -t 1:00:00
#SBATCH -c 1
#SBATCH -o serial-R.out%j # capture jobid in output file name
# load the R module
module load R/4.2.1-foss-2022a
# copy files to scratch space for job execution
# (not strictly necessary unless data size is large)
cp R-example/* $PFSDIR
cd $PFSDIR
# Run R script
R CMD BATCH <scriptfile.r>
# copy all work from scratch-space back to submit directory
cp * $SLURM_SUBMIT_DIR
In the above sample script, note that 'input.file' should be replaced by the name of your input file containing R commands to be carried out.
Parallel Job
There are different flavors of parallelism in R. Spend time with the primary R documentation [1] to understand the methods before trying R scripts authored by others.
Allocating resources depends on using mpi protocol, or parallel process based methods
#!/bin/bash
#SBATCH --time=10:00:00
#SBATCH -N 2 -n 4 # -n for mpi tasks, usually reserved when requiring more than one node
#SBATCH -N 1 -c 8 # -c for non-mpi processes
#
RStudio
RStudio is a free and open-source integrated development environment for R. For more information visit the HPC RStudio Guide.
Documentation
[1] https://cran.r-project.org/doc/manuals/r-release/R-intro.html