Apptainer

Introduction

Apptainer -- formerly called Singularity -- is used to create portable applications.  Red Hen makes use of HPC facilities in three locations: UCLA, Case and RRZE. To facilitate the exchange of pipelines and to minimize setup time, software that is to be run at multiple HPC centers and/or is difficult to install and/or has esoteric dependencies (e.g. old versions of libraries; libraries so large that they cannot be sensibly installed on HPC systems, etc.) can be installed within Apptainer containers. Coders in Red Hen Google Summer of Code are expected to use house their final product in an Apptainer container.  Please leave time to put everything on CWRU HPC in your gallina home. The purpose of this document is to describe how to set up such a container.  

Singularity images can be created on Singularity Hub or on a desktop computer; both methods are described below.

Related pages

Beware: The documentation below here may be out of date.

Building Apptainer Images with Docker and GitHub tooling

The current recommended way of creating an Apptainer image is to use a Dockerfile, GitHub actions, and the GitHub container registry. We will:


(Previously it was possible to automatically build Singularity images using Singularity definition files and SingularityHub, however this is no longer the case. More information justifying the use of Dockerfiles and OCI containers is given here. )

The advantage of using a Dockerfile is that it ensures your container can be built replicably enabling close collaborate with others. People with different areas of expertise can rapidly contribute to a shared project. In addition, the images are built automatically with Github actions and can be downloaded to your Linux server with a simple command. It is possible to also run Singularity images on Mac and Windows laptops, but the procedure is still convoluted. Native support for OS X is in the planning stage. However, you may be able to run the resulting images with Docker instead for local usage.

Create a Dockerfile with the recipe to build the container

First create a GitHub repository with a Dockerfile recipe. If there is already a Dockerfile in the repository you can name it something else e.g. Dockerfile.deepspeech or Dockerfile.shotsegmentor. The complete reference for Dockerfiles is here. 

Here is an example of installing some common Python dependencies for Red Hen projects in a container starting from a Debian base. It assumes you have a Python package mypackage with a script script.py and sets this as an entrypoint so that it is run when is run with singularity run .

FROM debian:bullseye-slim


RUN apt-get update


RUN apt-get install --assume-yes --no-install-recommends --quiet \

    python3 \

    python3-pip \

    ffmpeg


RUN pip install --no-cache --upgrade pip setuptools


RUN pip3 install numpy \

    scikit-learn \

    torch \

    nltk \

    matplotlib \

    h5py \

    opencv-python


ADD . /mypackage/


ENTRYPOINT ["python", "-m", "mypackage.script"]

As more Red Hen projects begin to adopt Dockerfiles, you can use the existing ones as examples to start from. If you can edit this page, please link to any examples you know.

Setting up GitHub actions and GitHub container registry

Next create a file in your GitHub repository .github/workflows/docker-publish.yml containing the following:

name: Docker


on:

  push:

  pull_request:


jobs:

  build_publish_containers:

    runs-on: ubuntu-latest


    steps:

      - name: Checkout repo

        uses: actions/checkout@v2

with:

  submodules: recursive


      - name: Build/push

        uses: whoan/docker-build-with-cache-action@v5

        with:

          registry: ghcr.io

          image_name: ${{ github.actor }}/$__IMAGE_NAME__$

          username: ${{ github.actor }}

          password: ${{ secrets.GITHUB_TOKEN }}

Replace $__IMAGE_NAME__$ with whatever you want the image to be called. If you already have a file called docker-publish.yml you can add an extra job to it, or add another file with a different name in the same directory. If you called your Dockerfile something else then you need to add e.g. dockerfile: Dockerfile.shotsegmentor under Build/push -> with:.

After you commit and push the Dockerfile and GitHub actions yml file you can see how the build goes by looking in the Actions tab in the GitHub interface:

In the actions you can see your build, and its status: yellow = running; green = has run successfully; or red = has run and failed. Once you have got it running successfully you should see the Docker image in the Packages section of the right sidebar of your GitHub repo:

Download the image

If you are running on HPC make sure to load the newest version of Singularity as your first step:

$ module load singularity/3.8.1

To download an automatically built image to your user on a laptop or desktop, you can use either Singularity or Docker. For a high-performance computing cluster node you can only use Singularity. For Docker instructions, see elsewhere. For Singularity: issue

$ singularity pull image.sif docker://ghcr.io/$__USER__$/$__IMAGE_NAME__$

On the Case high-performance computing cluster, make sure you use Gallina, Red Hen's server inside the HPC, and always reachable from your home directory on the login nodes. The Singularity image should never be downloaded to a login node.

$ cd /mnt/rds/redhen/gallina/Singularity/

$ singularity pull image.sif docker://ghcr.io/$__USER__$/$__IMAGE_NAME__$

Progress |===================================| 100.0% 

Done. Container is at: 

/mnt/rds/redhen/gallina/Singularity/image.sif

On the Case HPC cluster, use the Gallina server (/mnt/rds/redhen/gallina/Singularity) for images—we have massive amounts of storage available there, but a strictly limited quota in our home directories. 

Run the image

To enter a shell within the Singularity container, use the -e switch to avoid importing your local environment and the -H switch to set the home directory to the current directory:

$ singularity shell -e -H `pwd` image.sif

Singularity: Invoking an interactive shell within container...

On the Case HPC, we have a limited storage quota in home directories, but more or less unlimited space on gallina.

Run the image in an HPC job

1. Request a node with a GPU: 

$ srun -p gpu -C gpuk40 --gres=gpu:1 --pty bash

2. Load the Singularity software module (the version number may be important; the latest version should load by default):

$ module load singularity/3.8.1

3. Run the container:

$ singularity run image.sif

Other Singularity tips

This tutorial has showed how to build Singularity containers. This blog post outlines a couple of other things you might like to do to use Singularity efficiently,  like using bind mounts to avoid needless container rebuilds and using monolithic containers to simplify your workflow management.

Desktop Build Procedure

This section is out of date. Later it will be updated to fit with the new recommended Dockerfile based build approach.

Prerequisites

You will need a machine on which you have root rights and sufficient free space on the hard disk - 5 Gigabytes plus whatever your application needs should do for most tasks - and sufficient RAM to build the software (2 Gigs is not enough for Tensorflow; 6 Gigs is). This can be a virtual machine, for which a 20 Gig hard disk is recommended.

This document assumes you are running Ubuntu 16.04 LTS.

Installing Singularity

Follow the instructions for Linux or for Mac. If you have Windows, you will have to use a virtual machine with Linux installed. [Todo: provide virtual machine]

Copy-Paste for this step in Linux:

VERSION=2.5.1

wget https://github.com/singularityware/singularity/releases/download/$VERSION/singularity-$VERSION.tar.gz

tar xvf singularity-$VERSION.tar.gz

cd singularity-$VERSION

./configure --prefix=/usr/local

make

sudo make install

Alternatively, you can install the packaged version, in Debian/Raspbian called singularity-container.

Creating an image

In fact, the Quickstart Guide is just perfect, so there is no need to repeat it here. The only issue that you may encounter is that SquashFS-containers (the default format in Singularity 2.4 and above) cannot be created on NFS mounts (e.g. your home directory in the HPC system). Instead, you will have to build it on a physical disk (try /tmp).

The following command is outdated and is here only for documentation of our earlier processes.

sudo singularity create --size 10000 mycontainer.img

Creates an empty image of roughly 10 Gigabytes. Note that this is a sparse file, so it will not take up as much space initially, even if it looks like it:

redhen@server:~$ sudo singularity create --size 10000 singularity_containers/mycontainer.img

Creating a new image with a maximum size of 10000MiB...

Executing image create helper

Formatting image with ext3 file system

Done.

redhen@server:~$ ls -lh singularity_containers/mycontainer.img

-rwxr-xr-x 1 root root 9,8G Dez 22 15:18 singularity_containers/mycontainer.img

redhen@server:~$ du -h singularity_containers/mycontainer.img

289M    singularity_containers/mycontainer.img

redhen@server:~$

Unfortunately, sparse files can inflate when they are being copied (use rsync, not scp) or when they are stored on NFS mounts. Thus, basically, you should expect your sparse file to end up at its full size eventually and thus avoid unneccessarily large containers.

Filling the image

To create a Debian or Ubuntu image, you have to install debootstrap:

sudo apt-get install debootstrap

Next you will have to create a Bootstrap definition. You can use the one attached to this page, which is based on an example found at the singularity webpage. The relevant parts read as follows; explanations are given in the comments:

BootStrap: debootstrap # The bootstrap command used

OSVersion: xenial # This is Ubuntu 16.04 LTS, codename Xenial Xerus

MirrorURL: http://us.archive.ubuntu.com/ubuntu/ # This is where the process can find the data needed. If you are in Europe, you can choose another mirror at https://launchpad.net/ubuntu/+archivemirrors or just use the archive provided by FAU: http://ftp.fau.de/ubuntu/


%runscript

    echo "This is what happens when you run the container..." # Replace this with the default command including all necessary options, so that running the image will directly run the software for which you built the container.


%post

# Everything in this section will be executed after bootstrapping the operating system is complete

    sed -i 's/$/ universe/' /etc/apt/sources.list # Some software is only available via the universe repository, so this will be added to apt's sources.list

    apt-get -y update # This makes sure the sources from the universe repository are loaded

    apt-get -y install nano git # Change this at your leisure. Neither of them is necessary.

   # Add all commands needed to install the software

While it is possible to install software interactively, it is generally recommended to do so via the definition file to make transparent and reproducible what was actually done to arrive at the image.

Example basic solidarity recipe, assuming you have root access

sudo singularity build --sandbox abstract_pipeline_v1 abstract_pipeline_v1.def

This creates a directory abstract_pipeline_v1, which is the “container.” abstract_pipeline_v1.def is a file whose contents are:

BootStrap: debootstrap

OSVersion: xenial

#MirrorURL: http://ftp.fau.de/ubuntu/

# Use the following URL in the US:

MirrorURL: http://us.archive.ubuntu.com/ubuntu/


%runscript

#!/bin/bash

cd /localdata/abstracts/

./annotate_all.sh "$@"


%post

/bin/bash

echo "Hello from inside the container"

sed -i 's/$/ universe/' /etc/apt/sources.list

apt-get update

apt-get upgrade -y

apt-get install -y --force-yes git wget nano autoconf automake bzip2 python3-dev unzip python3-pip subversion python

apt-get -y autoremove 

mkdir /localdata

mkdir /localdata/abstracts

#cd /localdata/abstracts

#git clone WHATEVER

#wget WHATEVER

#./configure

#make

#make install

chmod a+w /.singularity.d/runscript

You can shell into the container as follows (-w means “writable):

sudo singularity shell -w abstract_pipeline_v1

To create the final portable unchangeable image (after training etc.):

sudo singularity build abstract_pipeline_v1_production.simg abstract_pipeline_v1

To run the image:

./abstract_pipeline_v1_production.simg FILE_TO_BE_ANNOTATED > OUTFILE.TXT