created by shlee
on 2017-08-15
Document is in BETA. It may be incomplete and/or inaccurate. Post suggestions to the Comments section and be sure to read about updates also within the Comments section.
Install Docker for your system from https://docs.docker.com/engine/installation/, e.g. for Mac, Windows or Linux servers. There is also a program called Docker Toolbox and I have this installed but I don't think it's necessary for running Docker containers locally or on a server.
On my Mac, I just double-click on the Docker whale icon to start the application. Check that Docker is running in the Mac menu bar at top by clicking on the icon that looks like a whale-container-ship.
See the Docker version with docker --version.
$ docker --version Docker version 17.06.0-ce, build 02c1d87
If you have trouble, you may need to run one or a number of the following commands.
docker-machine restart default
docker-machine regenerate-certs
docker-machine env
In Docker, an image is the original from which we launch containers. We pull images from Dockerhub (https://hub.docker.com/), using Git like lingo. For example, the following command downloads a GATK4 docker image.
docker pull broadinstitute/gatk:4.beta.3
The part after the colon is the version of the container we pull. You can see which images you have locally with docker image ls. Here we see I have two different versions of broadinstitute/gatk, v4.beta.3 and v4.beta.2.
$ docker image ls REPOSITORY TAG IMAGE ID CREATED SIZE broadinstitute/gatk 4.beta.3 5c138c493794 2 weeks ago 2.87GB broadinstitute/gatk 4.beta.2 507406cb4d85 3 weeks ago 2.88GB
There are two ways to inspect an image. One is with docker inspect 5c138c493794. The other is to launch a container off the image and root around within it much like you would a file system.
broadinstitute/gatk image is built automatically from a script documented at https://github.com/broadinstitute/gatk/blob/master/scripts/docker/. For tools that the script installs, see https://github.com/broadinstitute/gatk/blob/master/scripts/docker/gatkbase/Dockerfile.Launch a container with its tag or image ID. Whichever you use to launch a container, the tag or image ID, it becomes the image name.
docker run -i -t 5c138c493794
or
docker run -i -t broadinstitute/gatk:4.beta.3
We see then our bash opens into a location in the container preset by those who built the image.
root@f944f81ff6d7:/gatk#
We can check the contents of the current directory and the java version.
root@f944f81ff6d7:/gatk# ls -ltrh total 148K drwxr-xr-x 4 root root 4.0K Jul 26 15:49 docs -rw-r--r-- 1 root root 428 Jul 26 15:49 codecov.yml -rwxr-xr-x 1 root root 4.5K Jul 26 15:49 build_docker.sh -rw-r--r-- 1 root root 21K Jul 26 15:49 build.gradle -rw-r--r-- 1 root root 33K Jul 26 15:49 README.md -rw-r--r-- 1 root root 1.5K Jul 26 15:49 LICENSE.TXT -rw-r--r-- 1 root root 690 Jul 26 15:49 Dockerfile -rw-r--r-- 1 root root 775 Jul 26 15:49 AUTHORS drwxr-xr-x 1 root root 4.0K Jul 26 15:49 src -rw-r--r-- 1 root root 26 Jul 26 15:49 settings.gradle drwxr-xr-x 10 root root 4.0K Jul 26 15:49 scripts drwxr-xr-x 2 root root 4.0K Jul 26 15:49 resources_for_CI -rwxr-xr-x 1 root root 5.2K Jul 26 15:49 gradlew drwxr-xr-x 3 root root 4.0K Jul 26 15:49 gradle -rwxr-xr-x 1 root root 19K Jul 26 15:49 gatk-launch drwxr-xr-x 9 root root 4.0K Jul 26 15:53 build -rw-r--r-- 1 root root 40 Jul 26 15:55 run_unit_tests.sh lrwxrwxrwx 1 root root 25 Jul 26 15:55 gatk.jar -> /gatk/build/libs/gatk.jar -rw-r--r-- 1 root root 1017 Jul 26 15:55 install_R_packages.R root@96d91017226e:/gatk#
root@f944f81ff6d7:/gatk# java -version openjdk version "1.8.0_131" OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11) OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode) root@f944f81ff6d7:/gatk#
When we exit out of the container, by typing exit, we exit out of it and also stop it from running. We can check all the stopped container instances that docker saves automatically with docker ps -a.
$ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 28035a3b71f1 broadinstitute/gatk:4.beta.3 "bash" About a minute ago Exited (0) 8 seconds ago silly_davinci f944f81ff6d7 5c138c493794 "bash" 6 minutes ago Exited (0) 4 minutes ago fervent_wing 62fb9991a939 5c138c493794 "bash" 6 minutes ago Exited (0) 6 minutes ago tender_mirzakhani 96d91017226e 5c138c493794 "bash" 3 days ago Exited (0) 2 days ago vigilant_montalcini
As you can see, I have multiple containers launched from the same image. Notice, however, each container has a unique ID (under CONTAINER ID) and name (under NAMES). Whatever changes I make within a container get saved to that container. We can remove containers with docker container rm using either the container ID or name.
$ docker container rm silly_davinci silly_davinci $ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f944f81ff6d7 5c138c493794 "bash" 11 minutes ago Exited (0) 9 minutes ago fervent_wing 62fb9991a939 5c138c493794 "bash" 11 minutes ago Exited (0) 11 minutes ago tender_mirzakhani 96d91017226e 5c138c493794 "bash" 3 days ago Exited (0) 2 days ago vigilant_montalcini
$ docker container rm f944f81ff6d7 f944f81ff6d7 $ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 62fb9991a939 5c138c493794 "bash" 12 minutes ago Exited (0) 12 minutes ago tender_mirzakhani 96d91017226e 5c138c493794 "bash" 3 days ago Exited (0) 2 days ago vigilant_montalcini
We can run one of these containers with docker start.
docker start 96d91017226e
It may take a minute for a container to start up. We can see the running containers with docker container ls.
$ docker container ls CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 96d91017226e 5c138c493794 "bash" 3 days ago Up About a minute vigilant_montalcini
Finally, we can reattach to the running container.
docker attach vigilant_montalcini
On my local Mac, there is a glitch and I must press enter twice to show the docker container's bash prompt. You can also use the container ID instead of the name in the command. To exit out of a running container without stopping it, use Ctrl+P+Q.
There are two ways to do this, from within the container and from outside the container. I only know how to copy files from outside the container. The container can be stopped or running.
docker cp file_you_want_to_copy <container_id>:<file_path_to_target_dirctory>
For example,
docker cp tumor.seg 96d91017226e:/gatk
Copies the file tumor.seg into the container 96d91017226e's /gatk directory.
If you will modify a container to save, then remember that environmental variables, e.g. in bashrc, do not work in Docker containers. However, symlinks work well and you should create these in, e.g. /usr/bin with the ln -s path/to/item short_cut_name.
First, log into your Dockerhub account with docker login. If you don't have one, create one at https://hub.docker.com. My account is called spacecade7. For the container you have modified and wish to save a snapshot image of, use the following command.
docker commit 96d91017226e spacecade7/mygatk:versioning_tag1
Where the string that follows commit is the container ID. The last part points to my Dockerhub account followed by what I would like to call the image and an image version tag. This saves the image locally.
To save the image to Dockerhub, use docker push spacecade7/mygatk:versioning_tag1. The image should appear in your Dockerhub account.
Updated on 2017-08-17
From EADG on 2017-08-18
Hi @shlee,
nice tutorial! Two short suggestions from my side and experience from working with Docker/GATK
First instead of copying single files/dirs to the container you can mount a directory from the host inside the container with the run -v option:
```run -v, —volume=[host-src:]container-dest[:]```
See manual-page for more information: [Docker run manual](https://docs.docker.com/v1.10/engine/reference/commandline/run/)
For security reason (mostly) you should not be on the road with root-privilege all the time. To change this you can easily add a new user to the container when you are inside. And then save the image on DockerHub or locally as described.
To start the container with this user add:
```—user docker_user userName```
to your run command.
Greetings EADG
From shlee on 2017-08-18
Thank you @EADG for the compliment and the additional information! The community will appreciate your instructions on mounting a local directory to the container. I was hoping someone would add this.
From Tiffany_at_Broad on 2017-09-21
I’ve come back to this doc a few times to remind myself how to do this so – THANK YOU!
My typical use case is to figure out what version tools are. One command I found handy is ‘cat Dockerfile‘
When I did this for the genomes in the cloud docker, I got this output which was exactly what I needed:
LABEL GOTC_PICARD_VER=1.1150
LABEL GOTC_GATK34_VER=3.4-g3c929b0
LABEL GOTC_GATK35_VER=3.5-0-g36282e4
LABEL GOTC_GATK36_VER=3.6-44-ge7d1cd2
LABEL GOTC_GATK4_VER=4.beta.1
LABEL GOTC_SAMTOOLS_VER=1.3.1
LABEL GOTC_BWA_VER=0.7.15.r1140
LABEL GOTC_TABIX_VER=0.2.5_r1005
LABEL GOTC_BGZIP_VER=1.3
LABEL GOTC_SVTOOLKIT_VER=2.00-1650
Just passing along in case others find it helpful!
From Tiffany_at_Broad on 2017-09-21
Interesting, version info is not provided is you run ‘cat Dockerfile’ in this GATK image.
From shlee on 2017-10-02
Thanks @Tiffany_at_Broad, I’ll request that we be able to get versioning with the command you shared.
From moxu on 2018-05-31
Very good docker tutorial! Thanks, @shlee !
From shlee on 2018-05-31
Thanks, @moxu.
From lcarvalho on 2019-02-25
Hello, I already installed docker and the tests were ok. I’m trying to run BaseRecalibrator on docker, but I fail to link dbSNP file as —know-sites. The problem is that I already used “docker run -v options” with my input files and the reference genome. Unfortunately, dbSNP file is too big (more than 10Gb), so I can not link to docker using -v option. This is a required file, so I could not run without it.
From NicolasK on 2019-08-08
@lcarvalho
Maybe my answer is to late, as you already some time ago.
Try to link the folder witch your dbsnp file.
In my case I copied all the files I need to the folder I linked.
Here is the command I used to link the folder:
docker run -v /media/data/analysis:/gatk/my_data -it 9e737a9f562c