R Statistics
R
R is a language and environment for statistical computing and graphics. It is similar to the S language and environment developed by John Chambers and colleagues at Bell Labs. Much code written for S runs under R. R provides a variety of statistical and graphical techniques, and is highly extensible.
Important Notes
Users can install the specific R modules (packages) in their home directory if it is not available in the installed versions of R. Follow the instructions at Software Installation Guide - R Packages. If you encounter any issue, contact hpc-supportATcase.edu.
Installed Versions
All the available versions of R for use can be viewed by issuing the following command. This applies for other applications as well.
module spider R
output:
---------------------------------------------------------------------
R:
---------------------------------------------------------------------
Description:
R is a free software environment for statistical computing and graphics.
Versions:
... R/4.0.2
R/4.1.1
Load the R version that is desired:
module load gcc
module load R/4.1.1
Interactive Job Submission
Interactive Serial
To use R interactively for a serial job, first request interactive use of a processor in a compute node
srun --x11 -N 1 -c 1 --time=1:00:00 --pty /bin/bash
and wait until you are connected to a shell in a compute node. (The default length of an interactive session is ten hours.) Then load the R module
> module load R
and then invoke R using the shell command "R":
[<caseID>@<computeNode> ~]$ R
R version 2.9.2 (2009-08-24)
Type 'q()' to quit R.
>x <- c(10.4, 5.6, 3.1, 6.4, 21.7)
> x
[1] 10.4 5.6 3.1 6.4 21.7
> 1/x
[1] 0.09615385 0.17857143 0.32258065 0.15625000 0.04608295
> q()
Save workspace image? [y/n/c]: n
You can also check the libraries available in R
> library()
Packages in library ‘/home/sxg125/R/x86_64-unknown-linux-gnu-library/3.2’:
mnormt The Multivariate Normal and t Distributions
psych Procedures for Psychological, Psychometric, and
Personality Research
Packages in library ‘/usr/local/R/3.2.0/lib64/R/library’:
acepack ace() and avas() for selecting regression
transformations
ade4 Analysis of Ecological Data : Exploratory and
...
Here, the psych library is only available for /home/sxg125. If you want to install the required libraries, please refer to Software Installation Guide - R Packages.
You can run R script from the command line using source command:
> source("R-script")
You can also run the script from the terminal by typing:
R CMD BATCH <scriptfile.r>
Interactive Parallel
For interactive use of R that requires multiple processors, modify the call to srun requesting 4 processors:
srun --x11 -N 1 -n 4 --time=1:00:00 --pty /bin/bash
After getting connected to the shell on the assigned compute node, load the needed modules; OpenMPI has been loaded by default.
module load R
Then start R and load the Rmpi package:
>library(Rmpi)
>
After exiting from R, remember to also exit from the shell on the compute node that was allocated using the "logout" or "exit" command.
Batch Job Submission
Serial Job
To run an R batch job, prepare a SLURM script that contains the command to load the R module. A simple single-processor example is as follows:
#!/bin/bash
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH -o serial-R.out%j # capture jobid in output file name
# load the R module
module load R
# copy files to scratch space for job execution
# (not strictly necessary unless data size is large)
cp R-example/* $PFSDIR
cd $PFSDIR
# Run R script
R CMD BATCH <scriptfile.r>
# copy all work from scratch-space back to submit directory
cp * $SLURM_SUBMIT_DIR
In the above sample script, note that 'input.file' should be replaced by the name of your input file containing R commands to be carried out.
Sample Example:
Copy R-example directory from /usr/local/doc/R to your home directory and cd to the R-example
cp -r /usr/local/doc/R/R-example .
cd R-example
The directory contains five sets of data in sub-directories Data1-Data5, a SLURM script file logplot.sh, and other R parallel sample files.
Use the SLURM script to submit the job.
sbatch logplot.sh
You can find the plots in PDF format in the directory Data1.
View the plot. After making sure that X-forwarding is enabled, issue the commands (refer to Graphical Access):
evince Data[1-5].Rplots.pdf
which will open one window for each plot generated.
Parallel Job
There are different flavors of parallelism in R - Rmpi, SNOW, Multicore, etc but we’ll cover some frequently used packages here.
Rmpi
For multiple nodes, you can use Rmpi. It is less convenient than snow, but is more flexible.
Find the parallel Rmpi job file "RmpiParallel.job" at /usr/local/doc/R/R-example. Make sure that you have copied R-example directory to your home directory and cd to the R-example.
#!/bin/bash
#SBATCH --time=10:00:00
#SBATCH -N 1 -n 4
#
# Load modules
module load R
#
# cd to the directory where the job was submitted
cp -r RmpiTest.r $PFSDIR
cd $PFSDIR
nproc=$SLURM_NPROCS
echo "Number of Processors = $nproc"
# Run R
Rscript RmpiTest.r # In the R batch file be sure to load the Rmpi package.
cp * $SLURM_SUBMIT_DIR
Submit your job
sbatch RmpiParallel.job
See the output in slurm-<jobid>.out
Number of Processors = 4
4 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 5 is running on: comp147t
slave1 (rank 1, comm 1) of size 5 is running on: comp147t
...
[386] -1.213096e-03 6.216035e-05 -2.230406e-03 7.407010e-04 7.331668e-04
[391] -1.188971e-04 -1.096079e-04 -5.707893e-04 -3.259391e-04 6.037836e-04
[396] 1.031454e-03 8.025910e-04 -1.313067e-03 -7.767290e-04 -4.051707e-04
Parallel Cluster
The parallel cluster examples utilize the SNOW library, and were adapted from scripts authored by Jean-Eudes DAZARD, PhD. Find the job file "snowMakeCluster.job" and R script "snowMakeClusterTest.R" in /usr/local/doc/R/R-example and run the job from this location. Make sure that you have copied R-example directory to your home directory and cd to the R-example. Submit the job.
sbatch snowMakeCluster.job
See the output in slurm-<jobid>.out
Number of Processors = 4
Loading required package: Rmpi
Replicate: 1
[1] -0.6808924
Replicate: 2
[1] -0.7905958
Replicate: 3
[1] 0.4070604
SNOW with Rmpi
Find the job file "snow.job" and R script "snowTest.r" in /usr/local/doc/R/R-example and run the job from this location. Make sure that you have copied R-example directory to your home directory and cd to the R-example. Submit the job.
sbatch snow.job
See the output in slurm-<jobid>.out
Number of Processors = 4
4 slaves are spawned successfully. 0 failed.
[1] "Hello from comp110t with CPU type x86_64"
[2] "Hello from comp110t with CPU type x86_64"
[3] "Hello from comp110t with CPU type x86_64"
[4] "Hello from comp110t with CPU type x86_64"
[1] -8213.361
user system elapsed
4.400 1.214 5.618
[1] 1
[1] "Detaching Rmpi. Rmpi cannot be used unless relaunching R."
Multicore
Parallelism using cores in a single node. Find the job file "multiCore.job" and R script "multiCoreTest.R" in /usr/local/doc/R/R-example and run the job from this location. Make sure that you have copied R-example directory to your home directory and cd to the R-example. Submit the job.
sbatch multiCore.job
See the output in slurm-<jobid>.out
Number of Processors = 4
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
[1] "4"
elapsed
9.787
RStudio
RStudio is a free and open-source integrated development environment for R. For more information visit the HPC RStudio Guide.
GPU R (needs revision)
(In RHEL6 only)
This package [3] implements a general framework for utilizing R to harness the power of NVIDIA GPU's. The "gmatrix" and "gvector" classes allow for easy management of the separate device and host memory spaces.
Request a GPU node:
qsub –q gpufermi –I
Load the R module:
module load R/3.0.2
Load CUDA Module:
module load cuda
Run R interactively by typing: R
Set the environmental variables in R through the R commands:
Sys.setenv(CUDA_LIB_PATH="/usr/local/cuda-5.0/lib64")
Sys.setenv(R_INC_PATH="/usr/local/R/3.0.2/lib64/R/include")
Sys.setenv(NVCC_ARCH="-gencode arch=compute_30,code=sm_30")
Install the GPU R package. The command install.packages can install a source package from a local .tar.gz file by setting argument repos to NULL: this will be selected automatically if the name given is a single .tar.gz file.
download.file("http://solomon.case.edu/gmatrix/gmatrix_0.1.tar.gz", "gmatrix.tar.gz")
install.packages("gmatrix.tar.gz", repos = NULL)
file.remove("gmatrix.tar.gz")
Testing:
Load the Library:
library(gmatrix)
Output:
Now using device 0 - "Tesla M2090"
Starting cublas on device 0.
Creating new states on device 0.
Issue test command:
gtest()
Output:
Checking matrix multiplication, crossprod and tcrossprod...
Checking outer product and kronecker product...
Checking Binary Operations... * + == != & | - / ^ > < >= <=
Checking Unary Operations/special functions... sqrt exp expm1 log log2 log10 log1p sin cos tan asin acos atan sinh cosh tanh asinh acosh atanh abs lgamma gamma sign round ceiling floor is.na is.nan is.finite is.infinite ! - +
...
No errors or warnings
[1] TRUE
Getting Started:
Load the library for each sessesion using: library(gmatrix)
To list available gpu devices use: listDevices()
To set the device use: setDevice()
To move object to the device use: g()
To move object to the host use: h()
Object on the device can be manipulated in much the same way other R objects can.
A list of help topics may be optained using: help(package="gmatrix")
References:
[2] Parallel Programming Guide
[3] GPU R: https://github.com/njm18/gmatrix
[4] Flavors of R: UChicago
[5] GitHub site of Jean-Eudes DAZARD, PhD