R Project

Under Development

R

R is a language and environment for statistical computing and graphics. It is similar to the S language and environment developed by John Chambers and colleagues at Bell Labs. Much code written for S runs under R. R provides a variety of statistical and graphical techniques, and is highly extensible. 

Important Notes

Users can install the specific R modules (packages) in their home directory if it is not available in the installed versions of R. Follow the instructions at Software Installation Guide - R Packages. If you encounter any issue, contact hpc-supportATcase.edu.

Installed Versions

All the available versions of R for use can be viewed by issuing the following command. This applies for other applications as well.

module spider R

output:

---------------------------------------------------------------------

    R:

---------------------------------------------------------------------

    Description:

      R is a free software environment for statistical computing and graphics.


     Versions:

        R/4.0.3-foss-2020b

        R/4.1.0-foss-2021a

        R/4.1.2-foss-2021b

        R/4.1.3-foss-2021a

        R/4.2.0-foss-2021b

        R/4.2.1-foss-2021a

        R/4.2.1-foss-2021b

        R/4.2.1-foss-2022a

        R/4.2.2-foss-2022a

        R/4.2.2-foss-2022b


---------------------------------------------------------------

  As usual, for detailed information about a specific "R" package (including how to load the modules) use the module's full name.

Load the R version that is desired directly,. Consider whether to purge already load modules if in an ongoing session:

module purge   # optional

module load R/4.2.1-foss-2022a

Interactive Job Submission

Interactive Serial

To use R interactively for a serial job, first request interactive use of a processor in a compute node

srun --x11 -N 1 -c 1 --time=1:00:00 --pty /bin/bash

and wait until you are connected to a shell in a compute node. (The default length of an interactive session is ten hours.) Then load the R module, and potentially identify the path to any custom modules that you may have installed for this version of R:

$ module load R/4.2.1-foss-2022a

$ export R_LIBS_USER=/home/mrd20/.local/pioneer/R/4.2.1

and then invoke R using the shell command "R":

[<caseID>@<computeNode> $ R


R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"

Copyright (C) 2022 The R Foundation for Statistical Computing

Platform: x86_64-pc-linux-gnu (64-bit)

 Type 'q()' to quit R.

>x <- c(10.4, 5.6, 3.1, 6.4, 21.7)

> x

[1] 10.4  5.6  3.1  6.4 21.7

> 1/x

[1] 0.09615385 0.17857143 0.32258065 0.15625000 0.04608295

> q()

Save workspace image? [y/n/c]: n

You can also check the libraries available in R

Packages in library ‘/home/mrd20/.local/pioneer/R/4.2.1’:

fontBitstreamVera       Fonts with 'Bitstream Vera Fonts' License

fontLiberation          Liberation Fonts

fontquiver              Set of Installed Fonts

gfonts                  Offline 'Google' Fonts for 'Markdown' and 'Shiny'


Packages in library ‘/usr/local/easybuild_allnodes/software/R/4.2.1-foss-2022a/lib64/R/library’:

abc                     Tools for Approximate Bayesian Computation (ABC)

abc.data                Data Only: Tools for Approximate Bayesian Computation (ABC)

abe                     Augmented Backward Elimination

abind                   Combine Multidimensional Arrays

...

If you want to install the custom libraries not included with the system-installed R, please refer to Software Installation Guide - R Packages.

You can run R script from the command line using source command:

> source("R-script")

You can also run the script from the terminal by typing:

R CMD BATCH <scriptfile.r>

Interactive Parallel

For interactive use of R that requires multiple processors, modify the call to srun requesting 4 processors:

srun --x11 -N 1 -n 4 --time=1:00:00 --pty /bin/bash

After getting connected to the shell on the assigned compute node, load the needed modules; OpenMPI has been loaded by default.

module load R/4.2.1-foss-2022a

Then start R and load the 'parallel' package:

>library(parallel)

>

After exiting from R, remember to also exit from the shell on the compute node that was allocated using the "logout" or "exit" command.

Batch Job Submission

Serial Job

To run an R batch job, prepare a SLURM script that contains the command to load the R module. A simple single-processor example is as follows:

#!/bin/bash

#SBATCH -t 1:00:00

#SBATCH -c 1

#SBATCH -o serial-R.out%j # capture jobid in output file name


# load the R module

module load R/4.2.1-foss-2022a


# copy files to scratch space for job execution

# (not strictly necessary unless data size is large)

cp R-example/* $PFSDIR

cd $PFSDIR


# Run R script

R CMD BATCH <scriptfile.r>


# copy all work from scratch-space back to submit directory

cp * $SLURM_SUBMIT_DIR

In the above sample script, note that 'input.file' should be replaced by the name of your input file containing R commands to be carried out.

Parallel Job

There are different flavors of parallelism in R. Spend time with the primary R documentation [1] to understand the methods before trying R scripts authored by others.

Allocating resources depends on using mpi protocol, or parallel process based methods 

#!/bin/bash

#SBATCH --time=10:00:00 

#SBATCH -N 2 -n 4  # -n for mpi tasks, usually reserved when requiring more than one node 

#SBATCH -N 1 -c 8  # -c for non-mpi processes

#

RStudio

RStudio is a free and open-source integrated development environment for R. For more information visit the HPC RStudio Guide.


Documentation

[1] https://cran.r-project.org/doc/manuals/r-release/R-intro.html