R Statistics

R

R is a language and environment for statistical computing and graphics. It is similar to the S language and environment developed by John Chambers and colleagues at Bell Labs. Much code written for S runs under R. R provides a variety of statistical and graphical techniques, and is highly extensible. 

Important Notes

Installed Versions

All the available versions of R for use can be viewed by issuing the following command. This applies for other applications as well.

module spider R

output:

---------------------------------------------------------------------

  R:

---------------------------------------------------------------------

    Description:

      R is a free software environment for statistical computing and graphics.

     Versions:

...     R/4.0.2

        R/4.1.1


Load the R version that is desired:

module load gcc

module load R/4.1.1

Interactive Job Submission

Interactive Serial

To use R interactively for a serial job, first request interactive use of a processor in a compute node

srun --x11 -N 1 -c 1 --time=1:00:00 --pty /bin/bash

and wait until you are connected to a shell in a compute node. (The default length of an interactive session is ten hours.) Then load the R module

> module load R

and then invoke R using the shell command "R":

[<caseID>@<computeNode> ~]$ R

R version 2.9.2 (2009-08-24)

 Type 'q()' to quit R.

>x <- c(10.4, 5.6, 3.1, 6.4, 21.7)

> x

[1] 10.4 5.6 3.1 6.4 21.7

> 1/x

[1] 0.09615385 0.17857143 0.32258065 0.15625000 0.04608295

> q()

Save workspace image? [y/n/c]: n

You can also check the libraries available in R

> library()

Packages in library ‘/home/sxg125/R/x86_64-unknown-linux-gnu-library/3.2’:

mnormt                  The Multivariate Normal and t Distributions

psych                   Procedures for Psychological, Psychometric, and

                        Personality Research

Packages in library ‘/usr/local/R/3.2.0/lib64/R/library’:

acepack                 ace() and avas() for selecting regression

                        transformations

ade4                    Analysis of Ecological Data : Exploratory and

...

Here, the psych library is only available for /home/sxg125. If you want to install the required libraries, please refer to Software Installation Guide - R Packages.

You can run R script from the command line using source command:

> source("R-script")

You can also run the script from the terminal by typing:

R CMD BATCH <scriptfile.r>

Interactive Parallel

For interactive use of R that requires multiple processors, modify the call to srun requesting 4 processors:

srun --x11 -N 1 -n 4 --time=1:00:00 --pty /bin/bash

After getting connected to the shell on the assigned compute node, load the needed modules; OpenMPI has been loaded by default.

module load R

Then start R and load the Rmpi package:

>library(Rmpi)

>

After exiting from R, remember to also exit from the shell on the compute node that was allocated using the "logout" or "exit" command.

Batch Job Submission

Serial Job

To run an R batch job, prepare a SLURM script that contains the command to load the R module. A simple single-processor example is as follows:

#!/bin/bash

#SBATCH -t 1:00:00

#SBATCH -N 1

#SBATCH -o serial-R.out%j # capture jobid in output file name


# load the R module

module load R


# copy files to scratch space for job execution

# (not strictly necessary unless data size is large)

cp R-example/* $PFSDIR

cd $PFSDIR


# Run R script

R CMD BATCH <scriptfile.r>


# copy all work from scratch-space back to submit directory

cp * $SLURM_SUBMIT_DIR

In the above sample script, note that 'input.file' should be replaced by the name of your input file containing R commands to be carried out.

Sample Example: 

Copy R-example directory from /usr/local/doc/R to your home directory and cd to the R-example

 cp -r /usr/local/doc/R/R-example .

 cd R-example

The directory contains five sets of data in sub-directories Data1-Data5, a SLURM script file logplot.sh, and other R parallel sample files.

Use the SLURM script to submit the job.

sbatch logplot.sh

You can find the plots in PDF format in the directory Data1.

View the plot. After making sure that X-forwarding is enabled, issue the commands (refer to Graphical Access):

evince Data[1-5].Rplots.pdf

which will open one window for each plot generated.

Parallel Job

There are different flavors of parallelism in R - Rmpi, SNOW, Multicore, etc but we’ll cover some frequently used packages here.

Rmpi

For multiple nodes, you can use Rmpi. It is less convenient than snow, but is more flexible. 

Find the parallel Rmpi job file "RmpiParallel.job" at /usr/local/doc/R/R-example. Make sure that you have copied R-example directory to your home directory and cd to the R-example.

#!/bin/bash

#SBATCH --time=10:00:00

#SBATCH -N 1 -n 4

#

# Load modules

module load R

#

# cd to the directory where the job was submitted

cp -r RmpiTest.r $PFSDIR

cd $PFSDIR

nproc=$SLURM_NPROCS

echo "Number of Processors = $nproc"

# Run R

Rscript RmpiTest.r # In the R batch file be sure to load the Rmpi package.

cp * $SLURM_SUBMIT_DIR

Submit your job

sbatch RmpiParallel.job

See the output in slurm-<jobid>.out

Number of Processors = 4

        4 slaves are spawned successfully. 0 failed.

master (rank 0, comm 1) of size 5 is running on: comp147t 

slave1 (rank 1, comm 1) of size 5 is running on: comp147t 

...

[386] -1.213096e-03  6.216035e-05 -2.230406e-03  7.407010e-04  7.331668e-04

[391] -1.188971e-04 -1.096079e-04 -5.707893e-04 -3.259391e-04  6.037836e-04

[396]  1.031454e-03  8.025910e-04 -1.313067e-03 -7.767290e-04 -4.051707e-04

Parallel Cluster

The parallel cluster examples utilize the SNOW library, and were adapted from scripts authored by Jean-Eudes DAZARD, PhD. Find the job file "snowMakeCluster.job" and R script "snowMakeClusterTest.R" in /usr/local/doc/R/R-example and run the job from this location. Make sure that you have copied R-example directory to your home directory and cd to the R-example. Submit the job.

sbatch snowMakeCluster.job

See the output in slurm-<jobid>.out

Number of Processors = 4

Loading required package: Rmpi

Replicate: 1

[1] -0.6808924

Replicate: 2

[1] -0.7905958

Replicate: 3

[1] 0.4070604

SNOW with Rmpi

Find the job file "snow.job" and R script "snowTest.r" in /usr/local/doc/R/R-example and run the job from this location. Make sure that you have copied R-example directory to your home directory and cd to the R-example. Submit the job.

sbatch snow.job

See the output in slurm-<jobid>.out

Number of Processors = 4

4 slaves are spawned successfully. 0 failed.

[1] "Hello from comp110t with CPU type x86_64"

[2] "Hello from comp110t with CPU type x86_64"

[3] "Hello from comp110t with CPU type x86_64"

[4] "Hello from comp110t with CPU type x86_64"

[1] -8213.361

user system elapsed

4.400 1.214 5.618

[1] 1

[1] "Detaching Rmpi. Rmpi cannot be used unless relaunching R."

Multicore

Parallelism using cores in a single node. Find the job file "multiCore.job" and R script "multiCoreTest.R" in /usr/local/doc/R/R-example and run the job from this location. Make sure that you have copied R-example directory to your home directory and cd to the R-example. Submit the job.

sbatch multiCore.job

See the output in slurm-<jobid>.out

Number of Processors = 4

Loading required package: foreach

Loading required package: iterators

Loading required package: parallel

[1] "4"

elapsed

  9.787

RStudio

RStudio is a free and open-source integrated development environment for R. For more information visit the HPC RStudio Guide.

GPU R (needs revision)

(In RHEL6 only)

This package [3] implements a general framework for utilizing R to harness the power of NVIDIA GPU's. The "gmatrix" and "gvector" classes allow for easy management of the separate device and host memory spaces. 

Request a GPU node:

qsub –q gpufermi –I

Load the R module:

module load R/3.0.2

Load CUDA Module:

module load cuda

Run R interactively by typing: R

Set the environmental variables in R through the R commands:

Sys.setenv(CUDA_LIB_PATH="/usr/local/cuda-5.0/lib64")

Sys.setenv(R_INC_PATH="/usr/local/R/3.0.2/lib64/R/include")

Sys.setenv(NVCC_ARCH="-gencode arch=compute_30,code=sm_30") 

Install the GPU R package. The command install.packages can install a source package from a local .tar.gz file by setting argument repos to NULL: this will be selected automatically if the name given is a single .tar.gz file.

download.file("http://solomon.case.edu/gmatrix/gmatrix_0.1.tar.gz", "gmatrix.tar.gz")

install.packages("gmatrix.tar.gz", repos = NULL)

file.remove("gmatrix.tar.gz")

 Testing:

Load the Library: 

library(gmatrix)

Output:

Now using device 0 - "Tesla M2090"

Starting cublas on device 0.

Creating new states on device 0.

 Issue test command:

gtest()

Output:

Checking matrix multiplication, crossprod and tcrossprod...

Checking outer product and kronecker product...

Checking Binary Operations... * + == != & | - / ^ > < >= <=

Checking Unary Operations/special functions... sqrt exp expm1 log log2 log10 log1p sin cos tan asin acos atan sinh cosh tanh asinh acosh atanh abs lgamma gamma sign round ceiling floor is.na is.nan is.finite is.infinite ! - +

...

No errors or warnings

[1] TRUE

 Getting Started:

References:

[1] An Introduction to R

[2] Parallel Programming Guide

[3] GPU R: https://github.com/njm18/gmatrix

[4] Flavors of R: UChicago

[5] GitHub site of Jean-Eudes DAZARD, PhD