created by GATK_Team
on 2017-12-24
The GATK resource bundle is a collection of standard files for working with human sequencing data. We provide several versions of the bundle corresponding to the various reference builds, but be aware that we no longer support very old versions (b36/hg18). In addition, we are currently transitioning to support the Grch38/hg38 reference build, which will eventually become the default, while b37/hg19 will be considered legacy and eventually phased out. See the Dictionary entry on human genome reference builds for more information.
We do not currently provide any non-human resources in the resource bundle.
See the Resource Bundle page. In a nutshell, there's a Google Cloud bucket and an FTP server. These resources are also available through FireCloud, our cloud-based analysis portal, in workspaces that are preconfigured for the major Best Practices analysis use cases.
This contains all the resource files needed for Best Practices germline short variant discovery in whole-genome sequencing data (WGS). Exome files and itemized resource list coming soon(ish). Somatic resources are in development.
Note that many of these resources are out of date and will eventually be retired. All new development is being done against Grch38/hg38.
Additionally, these files all have supplementary indices, statistics, and other QC data available.
All resources below this are available only on the FTP server, not on the cloud.
Includes the UCSC-style hg19 reference along with all lifted over VCF files.
Includes the UCSC-style hg18 reference along with all lifted over VCF files. The refGene track and BAM files are not available. We only provide data files for this genome-build that can be lifted over "easily" from our master b37 repository. Sorry for whatever inconvenience that this might cause.
Also includes a chain file to lift over to b37.
Includes the 1000 Genomes pilot b36 formatted reference sequence (humanb36both.fasta) along with all lifted over VCF files. The refGene track and BAM files are not available. We only provide data files for this genome-build that can be lifted over "easily" from our master b37 repository. Sorry for whatever inconvenience that this might cause.
Also includes a chain file to lift over to b37.
From Bekir on 2018-09-10
Hello,
There was a problem with the file “1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf” which prevented me using it for annotation for chromosomes chr16-chr22. Lengths of chr16 and chr17 look the same in the header:
> contig=
> contig=
I am guessing that positions of chr16-chr22 were all shifted due to a concatenation issue. Can you please fix this issue?
Best, Bekir
From oneillkza on 2018-11-26
As I discovered, the hg19 resources actually are available on the cloud:
https://console.cloud.google.com/storage/browser/gatk-legacy-bundles/b37
From pwaltman on 2018-11-27
> @oneillkza said:
> As I discovered, the hg19 resources actually are available on the cloud:
>
> https://console.cloud.google.com/storage/browser/gatk-legacy-bundles/b37
Nice catch!! Wish I saw this before I spent multiple hours downloading this from the ftp site last night, but hopefully someone else will see it!