howto Make a docker image the easy way using a base image

IMPORTANT: This is the legacy GATK documentation. This information is only valid until Dec 31st 2019. For latest documentation and forum click here

created by Geraldine_VdAuwera

on 2017-04-22

Making a docker image to run something in FireCloud doesn't have to be horribly difficult. The main thing to avoid is reading the actual Docker documentation, because it's mostly aimed at experienced people who are trying to do much more complicated things. In contrast, this tutorial assumes you have minimal to no experience with programming or configuring computers, and will start by setting the stage with the minimum amount of jargon possible.

Just remember that when you put software on a docker and publish it, you're responsible for checking that you are complying with the licenses of everything you put on there!

Contents

    1. Prerequisites
    2. The basic concepts
    3. Make a Dockerfile
    4. Build the docker
    5. Test that it works
    6. Share your Docker image
    7. Wrap-up

1. Prerequisites

    • Get and install Docker You'll need to have the Docker program itself installed on your local machine (laptop etc.); see these instructions for guidance.
    • Get the Picard toolkit We're going to use this java command line program as an example of a piece of software you might want to "dockerize", i.e. for which you might want to build a container image. Get the latest version of the Picard toolkit and save the picard.jar file (anywhere you want -- we'll put it in the right place later).

2. The basic concepts

When you're making a docker image, you're basically setting up a computer system in a box (called the "container") that you can then copy and run on other machines without worrying about what kind of system they're running. It's not exactly the same thing as a virtual machine (or VM) but for the purposes of our tutorial that doesn't really matter, so if it helps you to think of it as a VM, go right ahead. The point is that your docker will need to include everything that is required to run your program; at the minimum it has to have an operating system, and typically you need accessory software like Java, Python and/or R, with various libraries installed.

So how do I build it without having to know a lot about system configuration?

Think about how you interact with your laptop. When you get a new laptop, it already has an operating system and a bunch of software preinstalled. Then you can add more by either copying programs (like java JAR files) or installing them if they have to be compiled in place (like Samtools). Well, in the Docker world you can get something like a new laptop with stuff preinstalled: that's what we call a base image. There are all sorts of base images available, with more or less software already bundled in there, that are designed specifically for that purpose. But ultimately any docker can serve as a base image on which you can add your own special thing, by either running an installation command, or copying program files. Oh, and the docker has a filesystem just like your laptop, so you can create directories and put program files or dependencies in specific locations as appropriate.

No really, how do I build it? How do I install things on a machine that doesn't exist?

That's where the Dockerfile comes in. The Dockerfile is a recipe for building the docker that essentially recapitulates every step that a system administrator would take if they were installing and configuring this virtual laptop for you. You just write out, line by line, what the virtual sysadmin should do. Then you tell the Docker program (which you do have to install first on your real-world laptop) to build the docker container as specified. Then you can run it, share it, and bask in the joy of having done something complicated really quite easily.

Enough talk -- let's make a docker!

Specifically, let's make a docker that has Java 8, Picard tools and R with the ggplot2 library installed.

3. Make a Dockerfile

    1. Make a new working directory for this tutorial project; in your terminal, navigate to that directory.
    2. Copy the picard.jar file (see prerequisites above) to this directory.
    3. Create the Dockerfile. This is a text file named Dockerfile (no extension), containing the following:

````

Specify the base image -- here we're using one that bundles the OpenJDK version of Java 8 on top of a generic Debian Linux OS

FROM openjdk:8-jdk-slim

Set the working directory to be used when the docker gets run

WORKDIR /usr

Do a few updates of the base system and install R (via the r-base package)

RUN apt-get update && \ apt-get upgrade -y && \ apt-get install -y r-base

Install the ggplot2 library and a few other dependencies we want to have available

RUN echo "r <- getOption('repos'); r['CRAN'] <- 'http://cran.us.r-project.org'; options(repos = r);" > ~/.Rprofile RUN Rscript -e "install.packages('reshape')" RUN Rscript -e "install.packages('gplots')" RUN Rscript -e "install.packages('ggplot2')"

Add the Picard jar file (assumes the jar file is in the same directory as the Dockerfile, but you could provide a path to another location)

COPY picard.jar /usr/picard.jar

````

4. Build the docker

Run the following command:

docker build -t <username>/<repo>:<tag> .

where <username> is your Dockerhub user name, <repo> is the name of the repository where you will store the docker (which can be an existing repo or the name of a repo that will be created when you push the docker to Dockerhub), and <tag> is a keyword or version number that you want to attach to identify a specific image. Don't forget the . at the end of the command, which tells Docker to build in the working directory.

This will run for a few minutes and output a lot of logs to the terminal. Most of it is what you'd see if you were installation a Linux operating system on your machine, then R and the libraries specified in the Dockerfile. Eventually the mad scrolling gobbledygook will stop and you should see something like Successfully built 084e949b60cb.

5. Test that it works

At this point, you've technically achieved your goal of building a docker image -- but let's test that it works before celebrating. Run this:

docker run -it <username>/<repo>:<tag>

where the i makes it in interactive mode; if you omit it nothing will happen because your docker isn't set to actually do anything by default. If everything worked, your terminal command prompt will change to root@ca9af9b92f3d:/usr# (but with a different number after the @). At this point you are at the helm of your shiny new virtual laptop!

You can further test that you can use Picard tools in your docker by running this command inside your docker session:

java -jar picard.jar

which should output the list of tools available in that release of Picard. When you're done you can shut it down by running exit. For more detailed instructions on how to use tools inside a Docker container (including mounting a volume to be able to access the filesystem, see this tutorial.

6. Share your Docker image

Sharing is caring, so let's push your shiny new container image to Dockerhub by running:

docker push <username>/<repo>:<tag>

Again this may take a few minutes as the size of the docker is around 400 Mb. The good thing is that if you make additional dockers with some of the same components, those components will be copied over from what you've already pushed. So updating dockers is usually faster than the initial process.

See this tutorial for instructions to push the image to the Google Container Repository (GCR) instead.

7. Wrap-up

And that's it! You can see the docker that I made for this tutorial, vdauwera/tutorial_example:picard-2.9.0, at https://hub.docker.com/r/vdauwera/tutorial_example/.

There are lots of options to refine your docker's setup (including adding labels, environment variables, making it run commands when it boots up) which you can read about in the Docker documentation -- or you can search for them on Google or Stackoverflow, which I find generally more helpful. As they say, YMMV.

Updated on 2019-11-07