created by Geraldine_VdAuwera
on 2012-07-31
A GATKReport is simply a text document that contains well-formatted, easy to read representation of some tabular data. Many GATK tools output their results as GATKReports, so it's important to understand how they are formatted and how you can use them in further analyses.
Here's a simple example:
#:GATKReport.v1.0:2 #:GATKTable:true:2:9:%.18E:%.15f:; #:GATKTable:ErrorRatePerCycle:The error rate per sequenced position in the reads cycle errorrate.61PA8.7 qualavg.61PA8.7 0 7.451835696110506E-3 25.474613284804366 1 2.362777171937477E-3 29.844949954504095 2 9.087604507451836E-4 32.875909752547310 3 5.452562704471102E-4 34.498999090081895 4 9.087604507451836E-4 35.148316651501370 5 5.452562704471102E-4 36.072234352256190 6 5.452562704471102E-4 36.121724890829700 7 5.452562704471102E-4 36.191048034934500 8 5.452562704471102E-4 36.003457059679770 #:GATKTable:false:2:3:%s:%c:; #:GATKTable:TableName:Description key column 1:1000 T 1:1001 A 1:1002 C
This report contains two individual GATK report tables. Every table begins with a header for its metadata and then a header for its name and description. The next row contains the column names followed by the data.
We provide an R library called gsalib
that allows you to load GATKReport files into R for further analysis. Here are four simple steps to getting gsalib
, installing it and loading a report.
1. Start R (or open RStudio)
$ R R version 2.11.0 (2010-04-22) Copyright (C) 2010 The R Foundation for Statistical Computing ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.
2. Get the gsalib
library from CRAN
The gsalib
library is available on the Comprehensive R Archive Network, so you can just do:
> install.packages("gsalib")
From within R (we use RStudio for convenience).
In some cases you need to explicitly tell R where to find the library; you can do this as follows:
$ cat .Rprofile .libPaths("/path/to/Sting/R/")
3. Load the gsalib library
> library(gsalib)
4. Finally, load the GATKReport file and have fun
> d = gsa.read.gatkreport("/path/to/my.gatkreport") > summary(d) Length Class Mode CountVariants 27 data.frame list CompOverlap 13 data.frame list
Updated on 2015-08-14
From bioinfo_89 on 2014-07-05
Hi Geraldine!!
I might sound silly, but I wanted ask where is the Gatkreport stored as in, after I run any command, it says the Gatkreport is generated!! Where exactly is it stored? How do I find the report generated for my data?
Thanx
From Geraldine_VdAuwera on 2014-07-06
Hi @bioinfo_89 ,
The gatkreport you are referring to is related to the Phone Home function, which sends us a brief report about the tools you ran and whether they worked ok (see the FAQ for more details). This report is not stored on your machine and would not be useful to you as it does not contain any actual data about your analysis.
From bioinfo_89 on 2014-07-07
Ok! So the one which you are referring to above is which Gatkreport then?
From Geraldine_VdAuwera on 2014-07-07
The example shown above is a GATKreport generated by the BaseRecalibrator. Other tools may produce other types of GATKreport. This article aims to explain the general format.
From bioinfo_89 on 2014-07-08
OK!! Thnx!
From Hasani on 2015-01-25
Hello Geraldine,
I’m using BaseRecalibrator and trying to wrape my head around the report it generated. Would you please clarify the difference between RecalTable0, RecalTable1, RecalTable2? If I want to show simply the Qualityscores before and after, what should I use? I’m afraid, I could not find on the web, anything helpful!
Many thanks in advance!
Hasani
From Sheila on 2015-01-26
@Hasani
Hi Hasani,
This article should help you: http://gatkforums.broadinstitute.org/discussion/44/base-quality-score-recalibration-bqsr
GATK does not output a table that shows the quality scores before and after for each site. The only way you can get them is by extracting them from the BAM files (eg with some kind of script).
-Sheila
From mcvu on 2017-08-14
Hello,
I am trying to install gsalib to be able to run AnalyzeCovariates. However, I am getting the error:
Warning: unable to access index for repository https://ftp.heanet.ie/mirrors/cran.r-project.org/src/contrib
Warning message:
In getDependencies(pkgs, dependencies, available, lib) : package ‘gsalib’ is not available (for R version 2.13.1)
Thoughts?
Thanks!
From Sheila on 2017-08-21
@mcvu
Hi,
Do you get the same error if you use the latest version? It looks like they are up to version 3.4.1 now. I just tried with version 3.3.1 and it works for me.
-Sheila
From myourshaw on 2018-03-21
H Geraldine—
I ported the R gsalib to Python, which will load a GATK report file into a dict-like object containing a pandas DataFrame for each table. Install with `pip install gsalib`.
Source at [https://github.com/myourshaw/gsalib](https://github.com/myourshaw/gsalib “https://github.com/myourshaw/gsalib”)
From shlee on 2018-03-21
Hi @myourshaw,
Did everything with your installation go alright? To ping Geraldine, remember to include the `@` sign with Geraldine’s handle, e.g. @Geraldine_VdAuwera.
From Geraldine_VdAuwera on 2018-03-21
(Thanks @shlee!)
@myourshaw, that’s great! I expect that will be quite useful for people who prefer to use Python over R. Thanks!