The HPC environment includes a library of software installed system wide that can be manipulated through the module system. This library generally focuses on providing up to date versions of lower level scientific software dependencies, such as FFTW3, BLAS or SciPy, upon which many higher level software systems rely. Thus, situations arise where a software required for a research task must be installed by the researchers. While the operating system directories of the HPC systems are read-only, there are a surprising number of ways to install software that support batch and interactive work on the cluster. The following guide is meant to outline the different approaches, and when they may be most appropriate, including example commands that can be used as templates or starting points.
Commands are blue
File system paths are red
Variable text (changes relative to task) is underlined
Building software can be intensive, and it is generally better to build in the same environment on a compute node where you expect to run*. For Markov (class cluster), use the appropraite partition (e.g. -p markov_cpu) and class account (e.g. -A csds438)
salloc -N 1 -c 4 --mem=4g srun --pty /bin/bash
* An exception is addressed under the R package section where it is sometimes necessary to build on a login node if OpenSSL development files are required.
This is particularly important for installing software that is very new, or software that has not had a new release within the last 3 years. We should review any information provided by the software authors to determine what dependencies are missing/satisfied to guide decisions on how best to install the software.
No dependencies or all dependencies satisfied = Try to install directly
Unmet software dependencies = Try to provide dependencies through modules and/or virtual environment
Unmet operating system dependencies = Try to install using container (don’t mix this with modules)
EasyBuild - EasyBuild (EB) is a software build and installation framework that allows you to manage (scientific) software on High Performance Computing (HPC) systems in an efficient way.
Binary Files - The application is distributed in a machine readable form that is invoked directly on the command line (like typing ls in the terminal)
Java - The application is distributed as one or more JAR files that are invoked via java -jar filename.jar
Modules - We want to add our own module that can be managed via module load modulename
Python - Application is distributed as a Python package available via PyPi or source
R - Application is distributed as an R package available via URL or CRAN