Step 0: To use R on the FASRC cluster, load the appropriate version available via our module system. See the modules list for available versions.
You should first have taken our Introduction to the FASRC training and be familiar with running jobs on the cluster.
An interactive job is the best way to provide a test environment while we are still working with our scripts.
salloc -p test --mem 1000 -t 30
To submit R jobs to the cluster via SLURM, the R command in your SLURM batch file should be in the format:
R CMD BATCH --quiet --no-restore --no-save scriptfile outputfile
where
--quiet
silences the startup messages so that they won’t appear in your output
--no-restore
does not restore the R workspace at startup
--no-save
does not save your R batch environment at exit
scriptfile
is your R script
outputfile
is where all output will be sent
If you wish to pass along command line arguments in your SLURM batch script, you need to use the format:
R CMD BATCH --no-save --no-restore '--args a=1 b=c(2,5,6)' test.R test.out
and include the following lines in your R script:
##First read in the arguments listed at the command line
args=(commandArgs(TRUE))
##args is now a list of character vectors
## First check to see if arguments are passed.
## Then cycle through each element of the list and evaluate the expressions.
if(length(args)==0){
print("No arguments supplied.")
##supply default values
a = 1
b = c(1,1,1)
}else{
for(i in 1:length(args)){
eval(parse(text=args[[i]]))
}
}
print(a)
print(b)
Your output file test.out should have the following lines in it:
> print (a)
[1] 1
> print (b)
[1] 2 5 6
More examples and detail can be found at this helpful Stack Overflow webpage and the R doc pages.
You can also use the Rscript command. Please consult the the O’Reilly book R Cookbook for the difference between R CMD BATCH and RScript at O’Reilly Books Online for Harvard (valid Harvard ID required).
To run R script as SBATCH script use the following template. Create R.batch file using the given template. You should make the requested changes to runtime, memory etc based on your needs.
#!/bin/bash
#SBATCH -c 1 # Number of cores
#SBATCH -t 0-00:10 # Runtime in D-HH:MM, minimum of 10 minutes
#SBATCH -p shared # Partition to submit to
#SBATCH --mem=1000 # Memory pool for all cores (see also --mem-per-cpu)
#SBATCH -o myRjob_%j.out # File to which STDOUT will be written, %j inserts jobid
#SBATCH -e myRjob_%j.err # File to which STDERR will be written, %j inserts jobid
module load R #Load R module
R CMD BATCH --quiet --no-restore --no-save scriptfile outputfile
To submit the created script using sbatch R.batch
If you need to submit a large number of files (e.g. varying the parameters for jobs submitted), please see our documentation on Submitting Large Numbers of Files to the Cluster.
Step 0: To use R on the cluster, load the appropriate version available via our module system. See the modules list for available versions.
When loading R from the Lmod system, 100s of common packages have already been installed. The list is available here. However, if you need to install new packages locally, the process is fairly straight-forward.
See also: R – Basics
Before attempting to install your own R packages, you will first need to create a directory for your local R package installs to live in. You’ll only need to do this once for each version of R you use. This is the path you will then point the R_LIBS_USER variable to.
mkdir -pv ~/apps/R_version
It’s highly recommended that you “tag” your package folder with the specific version of R you are using to install them, so that you don’t risk in future to forget and accidentally use the packages you are installing with a different version of R.
The R_LIBS_USER environment variable is used by R to determine where packages you install should be located when the install.packages() function is called and when you later use them. It is set using:
export R_LIBS_USER=$HOME/apps/R_version:$R_LIBS_USER
Note: You can also add this to you .bashrc if you wish, but we recommend calling this directly after loading the module in your scripts or when running R interactively. This ensures that your local library is the first one checked by R for installs and libraries..
To install packages, you will need to load an R module, set your R_LIBS_USER variable, and run R. We recommend choosing a specific R module rather than simply using module load R. Look up available R modules here: https://portal.rc.fas.harvard.edu/apps/modules/R. Example:
module load R/3.5.1-fasrc01
export R_LIBS_USER=$HOME/apps/R_3.5.1:$R_LIBS_USER
R
Now when you use R’s install.packages() function, the package will be installed in the specified directory.
Examples:
install.packages("ape") (You will be asked to pick a mirror site to download from)
install.packages("ape", repos="http://cran.r-project.org") (You can also specify a mirror)
Example
In this example, submit an interactive job, load modules, link the appropriate path for your R packages, start the R shell, and finally install R packages.
[user@rclogin ~]$ salloc -p test -t 60 -n1 --mem 4000
[user@computenode ~]$ module load R/3.5.1-fasrc01
[user@computenode ~]$ mkdir -pv ~/apps/R_3.5.1
mkdir: created directory ‘/n/home00/user/apps’
mkdir: created directory ‘/n/home00/user/apps/R_3.5.1’
[user@computenode ~]$ export R_LIBS_USER=$HOME/apps/R_3.5.1:$R_LIBS_USER
[user@computenode ~]$ R --quiet
> install.packages('ape',repos="http://cran.r-project.org")
Installing package into ‘/n/home00/user/apps/R_3.5.1’
trying URL 'http://cran.r-project.org/src/contrib/ape_5.2.tar.gz'
Content type 'application/x-gzip' length 790069 bytes (771 KB)
==================================================downloaded 771 KB
* installing *source* package ‘ape’ ...
... omitted output ...
** testing if installed package can be loaded
* DONE (ape)
Installing sp, rgdal, rgeos, and sf
For the packages sp, rgdal, rgeos, and sf, refer to our documentation on FASRC Github.
Available packages
List of available packages in the R-project repository.
Installed packages
To see the installed packages in the R shell:
> installed.packages()
...
Version Priority
ADGofTest "0.3" NA
AnDE "1.0" NA
BB "2014.1-1" NA
Brobdingnag "1.2-4" NA
CpGassoc "2.11" NA
DBI "0.2-7" NA
DEoptimR "1.0-1" NA
Defaults "1.1-1" NA
FNN "1.1" NA
Formula "1.1-1" NA
... omitted output ...
spatial "7.3-8" "recommended"
splines "3.1.0" "base"
stats "3.1.0" "base"
stats4 "3.1.0" "base"
survival "2.37-7" "recommended"
tcltk "3.1.0" "base"
tools "3.1.0" "base"
utils "3.1.0" "base"
R parallel packages
In the FASRC Github documentation, you can find a brief explanation about parallel R packages and a few examples of:
Hard-to-install packages
Some R packages have lots of dependencies and/or require additional software to be installed in the cluster (e.g. protobuf, geojsonio). Properly configuring these installs with R can become problematic. To overcome that, we documented how to install R packages within a Singularity container.
Inside your R session you can interact with the module system by using the module function provided by the script in /n/helmod/apps/lmod/7.7.32/init/R.
For example:
[user@boslogin02 ~]$ srun --pty -t 60 --mem 2000 -p test /bin/bash
[user@holy7c19316 scratch]$ module load R/3.5.1-fasrc01
[user@holy7c19316 scratch]$ R --quiet
\> source("/n/helmod/apps/lmod/7.7.32/init/R") > module('load','bcftools')
\> ...omitted lot of output ...
\> system('bcftools --version')
\> bcftools 1.5
\> Using htslib 1.5