created by Geraldine_VdAuwera
on 2013-03-13
Most GATK tools apply several read filters by default. You can look up exactly what are the defaults for each tool in their respective Technical Documentation pages.
But sometimes you want to specify additional filters yourself (and before you ask, no, you cannot disable the default read filters used by a given tool). This is how you do it:
The --read-filter
argument (or -rf
for short) allows you to apply whatever read filters you'd like. For example, to add the MaxReadLengthFilter
filter above to PrintReads
, you just add this to your command line:
--read_filter MaxReadLength
Notice that when you specify a read filter, you need to strip the Filter part of its name off!
The read filter will be applied with its default value (which you can also look up in the Tech Docs for that filter). Now, if you want to specify a different value from the default, you pass the relevant argument by adding this right after the read filter:
--read_filter MaxReadLength -maxReadLength 76
It's important that you pass the argument right after the filter itself, otherwise the command line parser won't know that they're supposed to go together.
And of course, you can add as many filters as you like by using multiple copies of the --read_filter
parameter:
--read_filter MaxReadLength --maxReadLength 76 --read_filter ZeroMappingQualityRead
Updated on 2015-12-19
From everestial007 on 2016-09-12
I want to select the aligned read that have mapQ below 40. I have this script
java -jar GenomeAnalysisTK.jar \ -T PrintReads \ -R reference.fasta \ -I input1.bam \ -o output.bam \ —read_filter MappingQuality -mmq 40
How can I inverse the —read_filter flag?
Thanks,
From everestial007 on 2016-09-12
Sheila
Geraldine_VdAuwera
I want to select the aligned read that have mapQ below 40. I have this script
java -jar GenomeAnalysisTK.jar \
-T PrintReads \
-R reference.fasta \
-I input1.bam \
-o output.bam \
—read_filter MappingQuality -mmq 40
How can I inverse the —read_filter flag/value?
Thanks,
From shlee on 2016-09-12
Hi @everestial007 ,
The `-mmq` option retains higher mapping quality reads from the value you provide. I’m not aware of any way to directly inverse this feature within GATK tools. Can I ask you why you want to select the lower mapping quality reads? I imagine you can do this using an awk command.
From Sheila on 2016-09-12
@everestial007
Hi,
Soo Hee is right. There is no simple way to do it. However, you can run PrintReads twice, once with -mmq 40 and again with -mmq 0. Then you can use [DiffObjects](https://software.broadinstitute.org/gatk/documentation/tooldocs/org_broadinstitute_gatk_tools_walkers_diffengine_DiffObjects.php) to get a list of the reads that are different between the two. Please note I have never used that particular tool so I cannot comment much about it. You can also use AWK or some other tool like Soo Hee suggested.
-Sheila
From everestial007 on 2016-09-13
Sheila
shlee
Hi I want to retain the reads below mapQ40 for downstream purposes – mainly to compare low mapQ reads from different samples, blast search and/or de novo assembly (if need be).
I realized that awk would do it and already have a script, but its just taking too much time and resources for so many samples I have.
You need to convert the file to sam – remove header info – filter based on 5th column (mapQ flag) – reattach header – convert to bam. For just 1 sample it took about a day. Lol. Samtools and Picard both don’t it, haven’t found other programs too. Probably GATK can add such functionality :smiley:
From shlee on 2016-09-13
@everestial007, you should be able to stream in part some of the processes you list by taking advantage of [Samtools functionality](http://www.htslib.org/doc/samtools.html) as shown in the following set of commands. I came up with these, so you should double check the output is as expected. It’s my impression that this kind of usage of Samtools is fairly standard.
````
set -o pipefail
samtools view file.bam | awk ‘{if ($5 < 40) print $0;}’ > lessthanforty.sam
samtools view -H file.bam > headeronly.txt
cat headeronly.txt lessthanforty.sam > headerandlessthanforty.sam
samtools view -b headerandlessthanforty.sam > headerandlessthanforty.bam
````
If you need to process many BAMs in the same manner, and are interested in scripting this, then you should check out our [WDL documentation](https://software.broadinstitute.org/wdl/). You could write a WDL script that takes an array of BAM files as input and parallelizes processing of these files over these multiple commands. I imagine even as a newbie to WDL scripts, it will take you ~ a day, perhaps two, to learn the convention, write up a script and get it running over your samples.
From Geraldine_VdAuwera on 2016-09-13
It actually would be fairly trivial to add this functionality. I can see how that would be useful. @Sheila, care to put in a feature request? This would be a good training task for a new dev (I think we got a new person recently, possibly today?).
From everestial007 on 2016-09-13
@shlee
Actually, I had almost the same command lines you provided.
I will check the WDL documentation, as it might become helpful.
Thanks !
From everestial007 on 2017-01-31
>
Geraldine_VdAuwera said: > It actually would be fairly trivial to add this functionality. I can see how that would be useful.
Sheila, care to put in a feature request? This would be a good training task for a new dev (I think we got a new person recently, possibly today?).
- I was wondering if this functionality was added. Thanks !
From Sheila on 2017-02-01
@everestial007
Hi,
Not yet. I put in a feature request for GATK4.
-Sheila
From Geraldine_VdAuwera on 2017-03-08
@everestial007 This has just now been added in GATK4.
From everestial007 on 2017-04-06
@Geraldine_VdAuwera : Thanks for the update. I am now revisiting this issue after a long time. Needs some wrapping up. :confused: