Apptainer
Introduction
Apptainer -- formerly called Singularity -- is used to create portable applications. Red Hen makes use of HPC facilities in three locations: UCLA, Case and RRZE. To facilitate the exchange of pipelines and to minimize setup time, software that is to be run at multiple HPC centers and/or is difficult to install and/or has esoteric dependencies (e.g. old versions of libraries; libraries so large that they cannot be sensibly installed on HPC systems, etc.) can be installed within Apptainer containers. Coders in Red Hen Google Summer of Code are expected to use house their final product in an Apptainer container. Please leave time to put everything on CWRU HPC in your gallina home. The purpose of this document is to describe how to set up such a container.
Singularity images can be created on Singularity Hub or on a desktop computer; both methods are described below.
Related pages
Beware: The documentation below here may be out of date.
Building Apptainer Images with Docker and GitHub tooling
The current recommended way of creating an Apptainer image is to use a Dockerfile, GitHub actions, and the GitHub container registry. We will:
Create a Dockerfile specifying the base for our container and the steps to build it.
Create a GitHub actions configuration file to build the image and push it to GitHub container registry (this file is boilerplate which must be copy-pasted for each project).
Pull and run the image with Singularity. It is possible to use Docker to build and Singularity to run container images because Docker images are actually OCI containers which is an open standard supported by many container runtimes including Singularity.
(Previously it was possible to automatically build Singularity images using Singularity definition files and SingularityHub, however this is no longer the case. More information justifying the use of Dockerfiles and OCI containers is given here. )
The advantage of using a Dockerfile is that it ensures your container can be built replicably enabling close collaborate with others. People with different areas of expertise can rapidly contribute to a shared project. In addition, the images are built automatically with Github actions and can be downloaded to your Linux server with a simple command. It is possible to also run Singularity images on Mac and Windows laptops, but the procedure is still convoluted. Native support for OS X is in the planning stage. However, you may be able to run the resulting images with Docker instead for local usage.
Create a Dockerfile with the recipe to build the container
First create a GitHub repository with a Dockerfile recipe. If there is already a Dockerfile in the repository you can name it something else e.g. Dockerfile.deepspeech or Dockerfile.shotsegmentor. The complete reference for Dockerfiles is here.
Here is an example of installing some common Python dependencies for Red Hen projects in a container starting from a Debian base. It assumes you have a Python package mypackage with a script script.py and sets this as an entrypoint so that it is run when is run with singularity run .
FROM debian:bullseye-slim
RUN apt-get update
RUN apt-get install --assume-yes --no-install-recommends --quiet \
python3 \
python3-pip \
ffmpeg
RUN pip install --no-cache --upgrade pip setuptools
RUN pip3 install numpy \
scikit-learn \
torch \
nltk \
matplotlib \
h5py \
opencv-python
ADD . /mypackage/
ENTRYPOINT ["python", "-m", "mypackage.script"]
As more Red Hen projects begin to adopt Dockerfiles, you can use the existing ones as examples to start from. If you can edit this page, please link to any examples you know.
Setting up GitHub actions and GitHub container registry
Next create a file in your GitHub repository .github/workflows/docker-publish.yml containing the following:
name: Docker
on:
push:
pull_request:
jobs:
build_publish_containers:
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v2
with:
submodules: recursive
- name: Build/push
uses: whoan/docker-build-with-cache-action@v5
with:
registry: ghcr.io
image_name: ${{ github.actor }}/$__IMAGE_NAME__$
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
Replace $__IMAGE_NAME__$ with whatever you want the image to be called. If you already have a file called docker-publish.yml you can add an extra job to it, or add another file with a different name in the same directory. If you called your Dockerfile something else then you need to add e.g. dockerfile: Dockerfile.shotsegmentor under Build/push -> with:.
After you commit and push the Dockerfile and GitHub actions yml file you can see how the build goes by looking in the Actions tab in the GitHub interface:
In the actions you can see your build, and its status: yellow = running; green = has run successfully; or red = has run and failed. Once you have got it running successfully you should see the Docker image in the Packages section of the right sidebar of your GitHub repo:
Download the image
If you are running on HPC make sure to load the newest version of Singularity as your first step:
$ module load singularity/3.8.1
To download an automatically built image to your user on a laptop or desktop, you can use either Singularity or Docker. For a high-performance computing cluster node you can only use Singularity. For Docker instructions, see elsewhere. For Singularity: issue
$ singularity pull image.sif docker://ghcr.io/$__USER__$/$__IMAGE_NAME__$
On the Case high-performance computing cluster, make sure you use Gallina, Red Hen's server inside the HPC, and always reachable from your home directory on the login nodes. The Singularity image should never be downloaded to a login node.
$ cd /mnt/rds/redhen/gallina/Singularity/
$ singularity pull image.sif docker://ghcr.io/$__USER__$/$__IMAGE_NAME__$
Progress |===================================| 100.0%
Done. Container is at:
/mnt/rds/redhen/gallina/Singularity/image.sif
On the Case HPC cluster, use the Gallina server (/mnt/rds/redhen/gallina/Singularity) for images—we have massive amounts of storage available there, but a strictly limited quota in our home directories.
Run the image
To enter a shell within the Singularity container, use the -e switch to avoid importing your local environment and the -H switch to set the home directory to the current directory:
$ singularity shell -e -H `pwd` image.sif
Singularity: Invoking an interactive shell within container...
On the Case HPC, we have a limited storage quota in home directories, but more or less unlimited space on gallina.
Run the image in an HPC job
1. Request a node with a GPU:
$ srun -p gpu -C gpuk40 --gres=gpu:1 --pty bash
2. Load the Singularity software module (the version number may be important; the latest version should load by default):
$ module load singularity/3.8.1
3. Run the container:
$ singularity run image.sif
Other Singularity tips
This tutorial has showed how to build Singularity containers. This blog post outlines a couple of other things you might like to do to use Singularity efficiently, like using bind mounts to avoid needless container rebuilds and using monolithic containers to simplify your workflow management.
Desktop Build Procedure
This section is out of date. Later it will be updated to fit with the new recommended Dockerfile based build approach.
Prerequisites
You will need a machine on which you have root rights and sufficient free space on the hard disk - 5 Gigabytes plus whatever your application needs should do for most tasks - and sufficient RAM to build the software (2 Gigs is not enough for Tensorflow; 6 Gigs is). This can be a virtual machine, for which a 20 Gig hard disk is recommended.
This document assumes you are running Ubuntu 16.04 LTS.
Installing Singularity
Follow the instructions for Linux or for Mac. If you have Windows, you will have to use a virtual machine with Linux installed. [Todo: provide virtual machine]
Copy-Paste for this step in Linux:
VERSION=2.5.1
wget https://github.com/singularityware/singularity/releases/download/$VERSION/singularity-$VERSION.tar.gz
tar xvf singularity-$VERSION.tar.gz
cd singularity-$VERSION
./configure --prefix=/usr/local
make
sudo make install
Alternatively, you can install the packaged version, in Debian/Raspbian called singularity-container.
Creating an image
In fact, the Quickstart Guide is just perfect, so there is no need to repeat it here. The only issue that you may encounter is that SquashFS-containers (the default format in Singularity 2.4 and above) cannot be created on NFS mounts (e.g. your home directory in the HPC system). Instead, you will have to build it on a physical disk (try /tmp).
The following command is outdated and is here only for documentation of our earlier processes.
sudo singularity create --size 10000 mycontainer.img
Creates an empty image of roughly 10 Gigabytes. Note that this is a sparse file, so it will not take up as much space initially, even if it looks like it:
redhen@server:~$ sudo singularity create --size 10000 singularity_containers/mycontainer.img
Creating a new image with a maximum size of 10000MiB...
Executing image create helper
Formatting image with ext3 file system
Done.
redhen@server:~$ ls -lh singularity_containers/mycontainer.img
-rwxr-xr-x 1 root root 9,8G Dez 22 15:18 singularity_containers/mycontainer.img
redhen@server:~$ du -h singularity_containers/mycontainer.img
289M singularity_containers/mycontainer.img
redhen@server:~$
Unfortunately, sparse files can inflate when they are being copied (use rsync, not scp) or when they are stored on NFS mounts. Thus, basically, you should expect your sparse file to end up at its full size eventually and thus avoid unneccessarily large containers.
Filling the image
To create a Debian or Ubuntu image, you have to install debootstrap:
sudo apt-get install debootstrap
Next you will have to create a Bootstrap definition. You can use the one attached to this page, which is based on an example found at the singularity webpage. The relevant parts read as follows; explanations are given in the comments:
BootStrap: debootstrap # The bootstrap command used
OSVersion: xenial # This is Ubuntu 16.04 LTS, codename Xenial Xerus
MirrorURL: http://us.archive.ubuntu.com/ubuntu/ # This is where the process can find the data needed. If you are in Europe, you can choose another mirror at https://launchpad.net/ubuntu/+archivemirrors or just use the archive provided by FAU: http://ftp.fau.de/ubuntu/
%runscript
echo "This is what happens when you run the container..." # Replace this with the default command including all necessary options, so that running the image will directly run the software for which you built the container.
%post
# Everything in this section will be executed after bootstrapping the operating system is complete
sed -i 's/$/ universe/' /etc/apt/sources.list # Some software is only available via the universe repository, so this will be added to apt's sources.list
apt-get -y update # This makes sure the sources from the universe repository are loaded
apt-get -y install nano git # Change this at your leisure. Neither of them is necessary.
# Add all commands needed to install the software
While it is possible to install software interactively, it is generally recommended to do so via the definition file to make transparent and reproducible what was actually done to arrive at the image.
Example basic solidarity recipe, assuming you have root access
sudo singularity build --sandbox abstract_pipeline_v1 abstract_pipeline_v1.def
This creates a directory abstract_pipeline_v1, which is the “container.” abstract_pipeline_v1.def is a file whose contents are:
BootStrap: debootstrap
OSVersion: xenial
#MirrorURL: http://ftp.fau.de/ubuntu/
# Use the following URL in the US:
MirrorURL: http://us.archive.ubuntu.com/ubuntu/
%runscript
#!/bin/bash
cd /localdata/abstracts/
./annotate_all.sh "$@"
%post
/bin/bash
echo "Hello from inside the container"
sed -i 's/$/ universe/' /etc/apt/sources.list
apt-get update
apt-get upgrade -y
apt-get install -y --force-yes git wget nano autoconf automake bzip2 python3-dev unzip python3-pip subversion python
apt-get -y autoremove
mkdir /localdata
mkdir /localdata/abstracts
#cd /localdata/abstracts
#git clone WHATEVER
#wget WHATEVER
#./configure
#make
#make install
chmod a+w /.singularity.d/runscript
You can shell into the container as follows (-w means “writable):
sudo singularity shell -w abstract_pipeline_v1
To create the final portable unchangeable image (after training etc.):
sudo singularity build abstract_pipeline_v1_production.simg abstract_pipeline_v1
To run the image:
./abstract_pipeline_v1_production.simg FILE_TO_BE_ANNOTATED > OUTFILE.TXT