created by shlee
on 2017-08-15
Document is in BETA
. It may be incomplete and/or inaccurate. Post suggestions to the Comments section and be sure to read about updates also within the Comments section.
Install Docker for your system from https://docs.docker.com/engine/installation/, e.g. for Mac, Windows or Linux servers. There is also a program called Docker Toolbox and I have this installed but I don't think it's necessary for running Docker containers locally or on a server.
On my Mac, I just double-click on the Docker whale icon to start the application. Check that Docker is running in the Mac menu bar at top by clicking on the icon that looks like a whale-container-ship.
See the Docker version with docker --version
.
$ docker --version Docker version 17.06.0-ce, build 02c1d87
If you have trouble, you may need to run one or a number of the following commands.
docker-machine restart default
docker-machine regenerate-certs
docker-machine env
In Docker, an image is the original from which we launch containers. We pull images from Dockerhub (https://hub.docker.com/), using Git like lingo. For example, the following command downloads a GATK4 docker image.
docker pull broadinstitute/gatk:4.beta.3
The part after the colon is the version of the container we pull. You can see which images you have locally with docker image ls
. Here we see I have two different versions of broadinstitute/gatk
, v4.beta.3 and v4.beta.2.
$ docker image ls REPOSITORY TAG IMAGE ID CREATED SIZE broadinstitute/gatk 4.beta.3 5c138c493794 2 weeks ago 2.87GB broadinstitute/gatk 4.beta.2 507406cb4d85 3 weeks ago 2.88GB
There are two ways to inspect an image. One is with docker inspect 5c138c493794
. The other is to launch a container off the image and root around within it much like you would a file system.
broadinstitute/gatk
image is built automatically from a script documented at https://github.com/broadinstitute/gatk/blob/master/scripts/docker/. For tools that the script installs, see https://github.com/broadinstitute/gatk/blob/master/scripts/docker/gatkbase/Dockerfile.Launch a container with its tag or image ID. Whichever you use to launch a container, the tag or image ID, it becomes the image name.
docker run -i -t 5c138c493794
or
docker run -i -t broadinstitute/gatk:4.beta.3
We see then our bash opens into a location in the container preset by those who built the image.
root@f944f81ff6d7:/gatk#
We can check the contents of the current directory and the java version.
root@f944f81ff6d7:/gatk# ls -ltrh total 148K drwxr-xr-x 4 root root 4.0K Jul 26 15:49 docs -rw-r--r-- 1 root root 428 Jul 26 15:49 codecov.yml -rwxr-xr-x 1 root root 4.5K Jul 26 15:49 build_docker.sh -rw-r--r-- 1 root root 21K Jul 26 15:49 build.gradle -rw-r--r-- 1 root root 33K Jul 26 15:49 README.md -rw-r--r-- 1 root root 1.5K Jul 26 15:49 LICENSE.TXT -rw-r--r-- 1 root root 690 Jul 26 15:49 Dockerfile -rw-r--r-- 1 root root 775 Jul 26 15:49 AUTHORS drwxr-xr-x 1 root root 4.0K Jul 26 15:49 src -rw-r--r-- 1 root root 26 Jul 26 15:49 settings.gradle drwxr-xr-x 10 root root 4.0K Jul 26 15:49 scripts drwxr-xr-x 2 root root 4.0K Jul 26 15:49 resources_for_CI -rwxr-xr-x 1 root root 5.2K Jul 26 15:49 gradlew drwxr-xr-x 3 root root 4.0K Jul 26 15:49 gradle -rwxr-xr-x 1 root root 19K Jul 26 15:49 gatk-launch drwxr-xr-x 9 root root 4.0K Jul 26 15:53 build -rw-r--r-- 1 root root 40 Jul 26 15:55 run_unit_tests.sh lrwxrwxrwx 1 root root 25 Jul 26 15:55 gatk.jar -> /gatk/build/libs/gatk.jar -rw-r--r-- 1 root root 1017 Jul 26 15:55 install_R_packages.R root@96d91017226e:/gatk#
root@f944f81ff6d7:/gatk# java -version openjdk version "1.8.0_131" OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11) OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode) root@f944f81ff6d7:/gatk#
When we exit out of the container, by typing exit
, we exit out of it and also stop it from running. We can check all the stopped container instances that docker saves automatically with docker ps -a
.
$ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 28035a3b71f1 broadinstitute/gatk:4.beta.3 "bash" About a minute ago Exited (0) 8 seconds ago silly_davinci f944f81ff6d7 5c138c493794 "bash" 6 minutes ago Exited (0) 4 minutes ago fervent_wing 62fb9991a939 5c138c493794 "bash" 6 minutes ago Exited (0) 6 minutes ago tender_mirzakhani 96d91017226e 5c138c493794 "bash" 3 days ago Exited (0) 2 days ago vigilant_montalcini
As you can see, I have multiple containers launched from the same image. Notice, however, each container has a unique ID (under CONTAINER ID
) and name (under NAMES
). Whatever changes I make within a container get saved to that container. We can remove containers with docker container rm
using either the container ID or name.
$ docker container rm silly_davinci silly_davinci $ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f944f81ff6d7 5c138c493794 "bash" 11 minutes ago Exited (0) 9 minutes ago fervent_wing 62fb9991a939 5c138c493794 "bash" 11 minutes ago Exited (0) 11 minutes ago tender_mirzakhani 96d91017226e 5c138c493794 "bash" 3 days ago Exited (0) 2 days ago vigilant_montalcini
$ docker container rm f944f81ff6d7 f944f81ff6d7 $ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 62fb9991a939 5c138c493794 "bash" 12 minutes ago Exited (0) 12 minutes ago tender_mirzakhani 96d91017226e 5c138c493794 "bash" 3 days ago Exited (0) 2 days ago vigilant_montalcini
We can run one of these containers with docker start
.
docker start 96d91017226e
It may take a minute for a container to start up. We can see the running containers with docker container ls
.
$ docker container ls CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 96d91017226e 5c138c493794 "bash" 3 days ago Up About a minute vigilant_montalcini
Finally, we can reattach to the running container.
docker attach vigilant_montalcini
On my local Mac, there is a glitch and I must press enter twice to show the docker container's bash prompt. You can also use the container ID instead of the name in the command. To exit out of a running container without stopping it, use Ctrl+P+Q
.
There are two ways to do this, from within the container and from outside the container. I only know how to copy files from outside the container. The container can be stopped or running.
docker cp file_you_want_to_copy <container_id>:<file_path_to_target_dirctory>
For example,
docker cp tumor.seg 96d91017226e:/gatk
Copies the file tumor.seg
into the container 96d91017226e
's /gatk
directory.
If you will modify a container to save, then remember that environmental variables, e.g. in bashrc, do not work in Docker containers. However, symlinks work well and you should create these in, e.g. /usr/bin
with the ln -s path/to/item short_cut_name
.
First, log into your Dockerhub account with docker login
. If you don't have one, create one at https://hub.docker.com. My account is called spacecade7. For the container you have modified and wish to save a snapshot image of, use the following command.
docker commit 96d91017226e spacecade7/mygatk:versioning_tag1
Where the string that follows commit is the container ID. The last part points to my Dockerhub account followed by what I would like to call the image and an image version tag. This saves the image locally.
To save the image to Dockerhub, use docker push spacecade7/mygatk:versioning_tag1
. The image should appear in your Dockerhub account.
Updated on 2017-08-17
From EADG on 2017-08-18
Hi @shlee,
nice tutorial! Two short suggestions from my side and experience from working with Docker/GATK
First instead of copying single files/dirs to the container you can mount a directory from the host inside the container with the run -v option:
```run -v, —volume=[host-src:]container-dest[:]```
See manual-page for more information: [Docker run manual](https://docs.docker.com/v1.10/engine/reference/commandline/run/)
For security reason (mostly) you should not be on the road with root-privilege all the time. To change this you can easily add a new user to the container when you are inside. And then save the image on DockerHub or locally as described.
To start the container with this user add:
```—user docker_user userName```
to your run command.
Greetings EADG
From shlee on 2017-08-18
Thank you @EADG for the compliment and the additional information! The community will appreciate your instructions on mounting a local directory to the container. I was hoping someone would add this.
From Tiffany_at_Broad on 2017-09-21
I’ve come back to this doc a few times to remind myself how to do this so – THANK YOU!
My typical use case is to figure out what version tools are. One command I found handy is ‘cat Dockerfile‘
When I did this for the genomes in the cloud docker, I got this output which was exactly what I needed:
LABEL GOTC_PICARD_VER=1.1150
LABEL GOTC_GATK34_VER=3.4-g3c929b0
LABEL GOTC_GATK35_VER=3.5-0-g36282e4
LABEL GOTC_GATK36_VER=3.6-44-ge7d1cd2
LABEL GOTC_GATK4_VER=4.beta.1
LABEL GOTC_SAMTOOLS_VER=1.3.1
LABEL GOTC_BWA_VER=0.7.15.r1140
LABEL GOTC_TABIX_VER=0.2.5_r1005
LABEL GOTC_BGZIP_VER=1.3
LABEL GOTC_SVTOOLKIT_VER=2.00-1650
Just passing along in case others find it helpful!
From Tiffany_at_Broad on 2017-09-21
Interesting, version info is not provided is you run ‘cat Dockerfile’ in this GATK image.
From shlee on 2017-10-02
Thanks @Tiffany_at_Broad, I’ll request that we be able to get versioning with the command you shared.
From moxu on 2018-05-31
Very good docker tutorial! Thanks, @shlee !
From shlee on 2018-05-31
Thanks, @moxu.
From lcarvalho on 2019-02-25
Hello, I already installed docker and the tests were ok. I’m trying to run BaseRecalibrator on docker, but I fail to link dbSNP file as —know-sites. The problem is that I already used “docker run -v options” with my input files and the reference genome. Unfortunately, dbSNP file is too big (more than 10Gb), so I can not link to docker using -v option. This is a required file, so I could not run without it.
From NicolasK on 2019-08-08
@lcarvalho
Maybe my answer is to late, as you already some time ago.
Try to link the folder witch your dbsnp file.
In my case I copied all the files I need to the folder I linked.
Here is the command I used to link the folder:
docker run -v /media/data/analysis:/gatk/my_data -it 9e737a9f562c