created by shlee
on 2015-11-23
Here we outline how to generate an unmapped BAM (uBAM) from either a FASTQ or aligned BAM file. We use Picard's FastqToSam to convert a FASTQ (Option A) or Picard's RevertSam to convert an aligned BAM (Option B).
Jump to a section on this page
(A) Convert FASTQ to uBAM and add read group information using FastqToSam (B) Convert aligned BAM to uBAM and discard problematic records using RevertSam
Tools involved
Prerequisites
Download example data
Tutorial data reads were originally aligned to the advanced tutorial bundle's humang1kv37_decoy.fasta reference and to 10:91,000,000-92,000,000.
Related resources
Picard's FastqToSam transforms a FASTQ file to an unmapped BAM, requires two read group fields and makes optional specification of other read group fields. In the command below we note which fields are required for GATK Best Practices Workflows. All other read group fields are optional.
java -Xmx8G -jar picard.jar FastqToSam \ FASTQ=6484_snippet_1.fastq \ #first read file of pair FASTQ2=6484_snippet_2.fastq \ #second read file of pair OUTPUT=6484_snippet_fastqtosam.bam \ READ_GROUP_NAME=H0164.2 \ #required; changed from default of A SAMPLE_NAME=NA12878 \ #required LIBRARY_NAME=Solexa-272222 \ #required PLATFORM_UNIT=H0164ALXX140820.2 \ PLATFORM=illumina \ #recommended SEQUENCING_CENTER=BI \ RUN_DATE=2014-08-20T00:00:00-0400
Some details on select parameters:
FASTQ
and FASTQ2
for the first read file and the second read file, respectively. Records in each file must be queryname sorted as the tool assumes identical ordering for pairs. The tool automatically strips the /1
and /2
read name suffixes and adds SAM flag values to indicate reads are paired. Do not provide a single interleaved fastq file, as the tool will assume reads are unpaired and the SAM flag values will reflect single ended reads.FASTQ
.QUALITY_FORMAT
is detected automatically if unspecified.SORT_ORDER
by default is queryname.PLATFORM_UNIT
is often in run_barcode.lane format. Include if sample is multiplexed.RUN_DATE
is in Iso8601 date format.Paired reads will have SAM flag values that reflect pairing and the fact that the reads are unmapped as shown in the example read pair below.
Original first read
@H0164ALXX140820:2:1101:10003:49022/1 ACTTTAGAAATTTACTTTTAAGGACTTTTGGTTATGCTGCAGATAAGAAATATTCTTTTTTTCTCCTATGTCAGTATCCCCCATTGAAATGACAATAACCTAATTATAAATAAGAATTAGGCTTTTTTTTGAACAGTTACTAGCCTATAGA + -FFFFFJJJJFFAFFJFJJFJJJFJFJFJJJ<<FJJJJFJFJFJJJJ<JAJFJJFJJJJJFJJJAJJJJJJFFJFJFJJFJJFFJJJFJJJFJJFJJFJAJJJJAJFJJJJJFFJJ<<<JFJJAFJAAJJJFFFFFJJJAJJJF<AJFFFJ
Original second read
@H0164ALXX140820:2:1101:10003:49022/2 TGAGGATCACTAGATGGGGGAGGGAGAGAAGAGATGTGGGCTGAAGAACCATCTGTTGGGTAATATGTTTACTGTCAGTGTGATGGAATAGCTGGGACCCCAAGCGTCAGTGTTACACAACTTACATCTGTTGATCGACTGTCTATGACAG + AA<FFJJJAJFJFAFJJJJFAJJJJJ7FFJJ<F-FJFJJJFJJFJJFJJF<FJJA<JF-AFJFAJFJJJJJAAAFJJJJJFJJF-FF<7FJJJJJJ-JA<<J<F7-<FJFJJ7AJAF-AFFFJA--J-F######################
After FastqToSam
H0164ALXX140820:2:1101:10003:49022 77 * 0 0 * * 0 0 ACTTTAGAAATTTACTTTTAAGGACTTTTGGTTATGCTGCAGATAAGAAATATTCTTTTTTTCTCCTATGTCAGTATCCCCCATTGAAATGACAATAACCTAATTATAAATAAGAATTAGGCTTTTTTTTGAACAGTTACTAGCCTATAGA -FFFFFJJJJFFAFFJFJJFJJJFJFJFJJJ<<FJJJJFJFJFJJJJ<JAJFJJFJJJJJFJJJAJJJJJJFFJFJFJJFJJFFJJJFJJJFJJFJJFJAJJJJAJFJJJJJFFJJ<<<JFJJAFJAAJJJFFFFFJJJAJJJF<AJFFFJ RG:Z:H0164.2 H0164ALXX140820:2:1101:10003:49022 141 * 0 0 * * 0 0 TGAGGATCACTAGATGGGGGAGGGAGAGAAGAGATGTGGGCTGAAGAACCATCTGTTGGGTAATATGTTTACTGTCAGTGTGATGGAATAGCTGGGACCCCAAGCGTCAGTGTTACACAACTTACATCTGTTGATCGACTGTCTATGACAG AA<FFJJJAJFJFAFJJJJFAJJJJJ7FFJJ<F-FJFJJJFJJFJJFJJF<FJJA<JF-AFJFAJFJJJJJAAAFJJJJJFJJF-FF<7FJJJJJJ-JA<<J<F7-<FJFJJ7AJAF-AFFFJA--J-F###################### RG:Z:H0164.2
We use Picard's RevertSam to remove alignment information and generate an unmapped BAM (uBAM). For our tutorial file we have to call on some additional parameters that we explain below. This illustrates the need to cater the tool's parameters to each dataset. As such, it is a good idea to test the reversion process on a subset of reads before committing to reverting the entirety of a large BAM. Follow the directions in this How to to create a snippet of aligned reads corresponding to a genomic interval.
We use the following parameters.
java -Xmx8G -jar /path/picard.jar RevertSam \ I=6484_snippet.bam \ O=6484_snippet_revertsam.bam \ SANITIZE=true \ MAX_DISCARD_FRACTION=0.005 \ #informational; does not affect processing ATTRIBUTE_TO_CLEAR=XT \ ATTRIBUTE_TO_CLEAR=XN \ ATTRIBUTE_TO_CLEAR=AS \ #Picard release of 9/2015 clears AS by default ATTRIBUTE_TO_CLEAR=OC \ ATTRIBUTE_TO_CLEAR=OP \ SORT_ORDER=queryname \ #default RESTORE_ORIGINAL_QUALITIES=true \ #default REMOVE_DUPLICATE_INFORMATION=true \ #default REMOVE_ALIGNMENT_INFORMATION=true #default
To process large files, also designate a temporary directory.
TMP_DIR=/path/shlee #sets environmental variable for temporary directory
We invoke or change multiple RevertSam parameters to generate an unmapped BAM
ATTRIBUTE_TO_CLEAR
option. Standard tags cleared by default are NM, UQ, PG, MD, MQ, SA, MC, and AS tags (AS for Picard releases starting 9/2015). Additionally, the OQ tag is removed by the default RESTORE_ORIGINAL_QUALITIES
parameter. Remove all other nonstandard tags by specifying each with the ATTRIBUTE_TO_CLEAR
option. For example, we clear the XT
tag using this option for our tutorial file so that it is free for use by other tools, e.g. MarkIlluminaAdapters. To list all tags within a BAM, use the command below.SANITIZE
option to remove reads that cause problems for certain tools, e.g. MarkIlluminaAdapters. Downstream tools will have problems with paired reads with missing mates, duplicated records, and records with mismatches in length of bases and qualities. Any paired reads file subset for a genomic interval requires sanitizing to remove reads with lost mates that align outside of the interval.MAX_DISCARD_FRACTION
to a more strict threshold of 0.005 instead of the default 0.01. Whether or not this fraction is reached, the tool informs you of the number and fraction of reads it discards. This parameter asks the tool to additionally inform you of the discarded fraction via an exception as it finishes processing.Some comments on options kept at default:
SORT_ORDER
=queryname For paired read files, because each read in a pair has the same query name, sorting results in interleaved reads. This means that reads in a pair are listed consecutively within the same file. We make sure to alter the previous sort order. Coordinate sorted reads result in the aligner incorrectly estimating insert size from blocks of paired reads as they are not randomly distributed.RESTORE_ORIGINAL_QUALITIES
=true Restoring original base qualities to the QUAL field requires OQ tags listing original qualities. The OQ tag uses the same encoding as the QUAL field, e.g. ASCII Phred-scaled base quality+33 for tutorial data. After restoring the QUAL field, RevertSam removes the tag.REMOVE_ALIGNMENT_INFORMATION
=true will remove program group records and alignment flag and tag information. For example, flags reset to unmapped values, e.g. 77 and 141 for paired reads. The parameter also invokes the default ATTRIBUTE_TO_CLEAR
parameter which removes standard alignment tags. RevertSam ignores ATTRIBUTE_TO_CLEAR
when REMOVE_ALIGNMENT_INFORMATION
=false.Below we show below a read pair before and after RevertSam from the tutorial data. Notice the first listed read in the pair becomes reverse-complemented after RevertSam. This restores how reads are represented when they come off the sequencer--5' to 3' of the read being sequenced.
For 6484_snippet.bam, SANITIZE
removes 2,202 out of 279,796 (0.787%) reads, leaving us with 277,594 reads.
Original BAM
H0164ALXX140820:2:1101:10003:23460 83 10 91515318 60 151M = 91515130 -339 CCCATCCCCTTCCCCTTCCCTTTCCCTTTCCCTTTTCTTTCCTCTTTTAAAGAGACAAGGTCTTGTTCTGTCACCCAGGCTGGAATGCAGTGGTGCAGTCATGGCTCACTGCCGCCTCAGACTTCAGGGCAAAAGCAATCTTTCCAGCTCA :<<=>@AAB@AA@AA>6@@A:>,*@A@<@??@8?9>@==8?:?@?;?:><??@>==9?>8>@:?>>=>;<==>>;>?=?>>=<==>>=>9<=>??>?>;8>?><?<=:>>>;4>=>7=6>=>>=><;=;>===?=>=>>?9>>>>??==== MC:Z:60M91S MD:Z:151 PG:Z:MarkDuplicates RG:Z:H0164.2 NM:i:0 MQ:i:0 OQ:Z:<FJFFJJJJFJJJJJF7JJJ<F--JJJFJJJJ<J<FJFF<JAJJJAJAJFFJJJFJAFJAJJAJJJJJFJJJJJFJJFJJJJFJFJJJJFFJJJJJJJFAJJJFJFJFJJJFFJJJ<J7JJJJFJ<AFAJJJJJFJJJJJAJFJJAFFFFA UQ:i:0 AS:i:151 H0164ALXX140820:2:1101:10003:23460 163 10 91515130 0 60M91S = 91515318 339 TCTTTCCTTCCTTCCTTCCTTGCTCCCTCCCTCCCTCCTTTCCTTCCCCCCCCCCCCCCCCCTCCCCCCCCCCCCCCCCCTCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCTTCCCCTCTCCCACCCCTCTCTCCCCCCCTCCCACCC :0;.=;8?7==?794<<;:>769=,<;0:=<0=:9===/,:-==29>;,5,98=599;<=########################################################################################### SA:Z:2,33141573,-,37S69M45S,0,1; MC:Z:151M MD:Z:48T4T6 PG:Z:MarkDuplicates RG:Z:H0164.2 NM:i:2 MQ:i:60 OQ:Z:<-<-FA<F<FJF<A7AFAAJ<<AA-FF-AJF-FA<AFF--A-FA7AJA-7-A<F7<<AFF########################################################################################### UQ:i:49 AS:i:50
After RevertSam ```` H0164ALXX140820:2:1101:10003:23460 77 * 0 0 * * 0 0 TGAGCTGGAAAGATTGCTTTTGCCCTGAAGTCTGAGGCGGCAGTGAGCCATGACTGCACCACTGCATTCCAGCCTGGGTGACAGAACAAGACCTTGTCTCTTTAAAAGAGGAAAGAAAAGGGAAAGGGAAAGGGAAGGGGAAGGGGATGGG AFFFFAJJFJAJJJJJFJJJJJAFA
H0164ALXX140820:2:1101:10003:23460 141 * 0 0 * * 0 0 TCTTTCCTTCCTTCCTTCCTTGCTCCCTCCCTCCCTCCTTTCCTTCCCCCCCCCCCCCCCCCTCCCCCCCCCCCCCCCCCTCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCTTCCCCTCTCCCACCCCTCTCTCCCCCCCTCCCACCC <-<-FAback to top
Updated on 2017-06-23
From aneek on 2016-04-18
Hi,
How to perform FastqToSam function in picardtools when the fastq files are not interleaved i.e. forward (R1) and reverse (R2) fastq files separately. Picardtools is not taking two FASTQ inputs.
Error message: “Option ‘FASTQ’ cannot be specified more than once.”
From Sheila on 2016-04-18
@aneek
Hi,
If you look at the [FastqToSam documentation](http://broadinstitute.github.io/picard/command-line-overview.html#FastqToSam), you will find your answer.
-Sheila
From aneek on 2016-04-19
@Sheila
Thank you very much. I have to use FASTQ2 for the 2nd read.
Another thing I would like to ask is, whether there is any need to use FastQC and Trimmomatic to correct the quality of the raw data and remove the bad reads, if I use these steps of making a clean BAM from short read sequence recommended by GATK.
Also can I use threading option (-t) in all GATK commands to increase the speed of the tasks.
Thank you.
From Sheila on 2016-04-19
@aneek
Hi,
No, it is best to stick to the [Best Practices](https://www.broadinstitute.org/gatk/guide/best-practices). There should be no need to correct for anything before doing the Best Practices pre-processing steps.
You can find which tools accept -nt in the tool documentation.
-Sheila
From aneek on 2016-04-19
@Sheila
Hi, Thank you very much for all the informations.
From MUHAMMADSOHAILRAZA on 2016-07-11
@Sheila
Hi,
I cannot download the example data set “tutorial_6484_RevertSam.tar.gz”/“tutorial_6484_FastqToSam.tar.gz” from above links..
In china, Because we don’t have complete access to google, may be due to that reason i cannot download the data from google drive.. could you please make it available somewhere else? Like as you do for “bundle” as an “example” directory..
Thanks
From shlee on 2016-07-11
Hi @MUHAMMADSOHAILRAZA,
I’ve uploaded the tutorial datasets to the ftp site. You can find the data for this and other tutorials at: . Like with the bundle, remember to keep the password field blank.
From MUHAMMADSOHAILRAZA on 2016-07-12
Thanks! @shlee
From MUHAMMADSOHAILRAZA on 2016-07-12
@shlee
Hi,
I am curious Why RG tag is absent for the first read of above “After RevertSam” results section?
and could you please explain what does 77 and 141 values represent in column 2?
Thanks
From shlee on 2016-07-12
@MUHAMMADSOHAILRAZA,
You can check yourself why this may be with the command:
samtools view 6484_snippet_revertsam.bam | grep ‘H0164ALXX140820:2:1101:10003:23460’
Then you’ll see that both reads should have an RG tag. I’ve updated the example—thanks for pointing this out.
For an explanation of SAM flags, I refer you to the following:
- Picard’s [Explain SAM flags](https://broadinstitute.github.io/picard/explain-flags.html) webpage where you can type in a flag value to see their meaning
- Blogpost [Sam flags down a boat](https://www.broadinstitute.org/gatk/blog?id=7019) for an introduction
- [Sequence Alignment/Map Format Specification](https://samtools.github.io/hts-specs/SAMv1.pdf) for official specs
From huangk on 2016-09-15
Hi, I tried to convert my paired-end fastqs to SAM using FastqToSam, but got the error: “In paired mode, read name 1 (HWI-D00377:30:H8EJDADXX:1:2209:8491:93586) does not match read name 2 (HWI-D00377:30:H8EJDADXX:1:2104:20024:30303)”. Is there anyway to fix it? Thanks!
From shlee on 2016-09-26
This question has been answered [here](http://gatkforums.broadinstitute.org/gatk/discussion/8308/whats-the-version-of-bwa-being-implemented-in-gatk4-bwaspark-tool#latest).
From abor on 2016-11-28
Hi,
I am using the script reported in paragraph “(B) Convert aligned BAM to uBAM and discard problematic records using RevertSam“
As suggested, I set the SANITIZE option as true (indeed, the process has problems already at the revertsam—>revertmark step when setting it as false).
However, for some older bam files, I loose as much as half of the reads. Meaning that my interleaved fastq file as well as my remapped bam is much smaller in size and script number. Is there a way to circumvent such a big loss?
Thanks!
From shlee on 2016-11-28
Hi @abor,
When you say you lose up to half of the reads, do you mean alignment records or unique reads? I ask because an aligned BAM can represent the same read in multiple alignment records and the reversion to uBAM has the effect of paring these down to one unique read.
Can you also describe further the problem you are running into? It would be helpful to us if you can post the tool error message.
From abor on 2016-11-28
Hi, thanks. the err file reports:
“INFO 2016-11-25 15:41:50 RevertSam Discarded 23263512 out of 61936806 (37.560%) reads in order to sanitize output.”
Consistently, the read count in my original bam file is 61,955,464, in my interleaved fastq is 38673294 and in my final recalibrated bam (after realignment/bqsr) is 38690072
From shlee on 2016-11-30
Ok, @abor. So I think the question becomes why do you want to save the discarded reads? Consider what property of the discarded reads caused them to be discarded given SANITIZE discards paired reads with missing mates, duplicated records, and records with mismatches in length of bases and qualities. Then consider what impact these reads have on your analyses. E.g. do the remaining reads provide sufficient coverage or do you need all the coverage you can salvage? Do the discarded reads qualitatively differ, e.g. have lower base qualities, than the remaining reads?
I think if they are qualitatively equivalent, then only the paired reads with missing mates would be worth salvaging. If this is the case, then we’d have to think about how to retain such reads to the exclusion of the other sanitized reads. So you would first RevertSam without sanitizing. Then, I believe you can filter records with mismatches in length of bases and qualities using GATK’s [MalformedReadFilter](https://software.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_engine_filters_MalformedReadFilter.php)’s optional parameter `—filter_mismatching_base_and_quals`. Because your BAM at this point is unaligned, and therefore not coordinate-sorted, you have to run [PrintReads](https://software.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_readutils_PrintReads.php) in `—unsafe` mode and discard the bai index that the tool outputs. As for duplicated records, a UNIX `sort` and `uniq` should remove these.
For example, for your unaligned uBAM:
```
samtools view ubam.bam | sort | uniq > ubam_uniquify.sam
```
Will print the unique reads to a SAM format file. You’ll have to reattach your header before converted again to BAM format. Then:
```
java -jar GenomeAnalysisTK.jar -T PrintReads \ -I uniquify_ubam.bam \ -o ubam_filter_mismatching_base_and_quals.bam \ -R your_reference.fasta \ —filter_mismatching_base_and_quals \ —unsafe
```
Also, take a look at the [ValidateSamFile document](https://software.broadinstitute.org/gatk/guide/article?id=7571). I hope this is helpful.
From abor on 2017-01-17
Hi @shlee, sorry for the long delay.
I went through the steps you suggest: on my original BAM file I use RevertSam without SANITIZE, then use sort/uniq and then PrintReads with —filter_mismatching_base_and_quals —unsafe. The number of reads remains exactly the same I had in the original BAM; in addition, running ValidateSamFile MODE=SUMMARY gives me the following error: ERROR:MATE_NOT_FOUND, and the number of reads with this error is exactly the same number of reads that are discarded when I use RevertSam with SANITIZE=true. Turns out that the discarded reads are reads with missing mates (that were, however, regularly mapped in my original BAM), and that I would like to keep, given that they are as much as 15-45%of total reads, for some samples. However, the MarkIlluminaAdapters tool doesn’t process the uBam file unless they have been discarded. Do you think there may be a solution for this step to work properly also in the presence of reads with missing pairs?
Thanks
From abor on 2017-01-17
Plus, I wonder why this happens. When using Bam files in which the RealignerTargetCreator and BaseRecalibrator steps are performed through the whole genome, I can pass the Bam files again through the pipeline (RevertSam, MarkIlluminaAdapters, SamToFasq, BWA, MarkDuplicates and GATK RealignerTargetCreator/BaseRecalibrator) several times without loosing any reads; instead, I notice that when using the -L option on the bait regions at the RealignerTargetCreator and BaseRecalibrator steps, reads are discarded if the BAMs are subsequently re-passed through RevertSam.
Of course I don’t need to re-run RevertSam several times, but I could need to do it in the future. So the question is whether the -L option at RealignerTargetCreator and BaseRecalibrator steps introduces any tag in the read so that it is then discarded as “missing mate” when passed again through RevertSam. Hope to have been clear enough.
Thanks again for your help
From Geraldine_VdAuwera on 2017-01-17
Removing reads that cause validation errors is what SANITIZE=TRUE is for. If you don’t want to remove them you need to disable that option — but then some other tools may refuse to run on that data. Our supported pipelines all discard unpaired reads as a quality assurance measure.
When you use -L in the preprocessing steps, do you use it also with IndelRealigner and PrintReads?
From abor on 2017-01-17
Edited post: no, -L was used in RealignerTargetCreator, BaseRecalibrator and PrintReads, but not in IndelRealigner
From abor on 2017-01-17
The question is why reads that were previously successfully mapped (they are mapped in the BAM file I use as input for RevertSam) are detected as unpaired reads and discarded
From shlee on 2017-01-17
Hi @abor,
Since you mention that you are using older BAM files, I assume you don’t have access to the original FASTQ files and the missing mates are forever gone. Is this true?
It’s my understanding that aligners such as BWA align each read independently, even ends of mated reads. That GATK tools have built in quality control features that are different from that of aligners is to a user’s benefit. At the least, you become aware of the quality of the data you are working with and can come to a decision on how to proceed with the discrepant data.
In answer to your question,
> Do you think there may be a solution for this step to work properly also in the presence of reads with missing pairs?
my answer is yes, there is a solution. My understanding is that you wish to (i) process data through MarkIlluminaAdapters as well as (ii) keep the mate not found reads. Roughly, I think you can separate your paired reads and mate not found reads and process them as PE and SE reads separately, respectively. For example, because processing through MarkIlluminaAdapters only makes sense for PE reads (see [Tutorial#6483](http://gatkforums.broadinstitute.org/wdl/discussion/6483#step2) for why), you would process only the PE BAM through this step ~~and skip this step for the SE BAM.~~ `Sorry, it appears MarkIlluminaAdapters has a parameter to process SE reads (MIN_MATCH_BASES_SE).`
To merge the reads together for joint processing for downstream steps, well, here’s where you’d have to do some exploration of tools. I think you would use either [MergeSamFiles](https://broadinstitute.github.io/picard/command-line-overview.html#MergeSamFiles) or [MergeBamAlignment](https://broadinstitute.github.io/picard/command-line-overview.html#MergeBamAlignment). [Tutorial#6483](http://gatkforums.broadinstitute.org/wdl/discussion/6483/how-to-map-and-clean-up-short-read-sequence-data-efficiently) gives details on MergeBamAlignment. It will take as input multiple `ALIGNED_BAM` files, so you can provide it both the aligned PE BAM and aligned SE BAM.
I hope this is helpful. Let us know what ends up working for you and if you have additional questions.
From dnousome on 2017-01-17
Hi all,
I current have some RNA-Seq Tophat aligned files (no original fastq), both the Aligned bam and unmapped reads files and wanted to generate a fastq to align with another aligner (STAR or Salmon).
Do you think I should try to merge the Aligned and unmapped files and then perform the conversion?
I have a feeling that might not be correct because it would insert potential reads with no mate. Otherwise I’m a little stuck as to what to do with the unmapped file.
Thanks for any help!
Darryl
From Geraldine_VdAuwera on 2017-01-17
@abor When you use -L with PrintReads, you’re telling GATK to throw away any reads outside of those intervals. Then any reads inside the intervals that had mates outside become mate-not-founds.
From shlee on 2017-01-19
Hi Darryl ( @dnousome ),
Whichever file contains the complete set of read pairs, you can use this particular file. STAR takes FASTQ files and I assume Salmon also takes FASTQ files. If the file you start with is the aligned BAM, you can revert using RevertSam, convert to FASTQ and then align with your aligner of choice. If it is the uBAM, then you can proceed with converting to FASTQ and then align. If you wish to pipe conversion to FASTQ and the alignment step, which allows you to avoid storing a FASTQ file, then checkout the workflow detailed in [Tutorial#6483](http://gatkforums.broadinstitute.org/gatk/discussion/6483) for BWA alignment. I believe the concepts should still apply to your case. The tutorial then explains how and why you would use MergeBamAlignment.
I hope this is helpful.
From dnousome on 2017-01-19
@shlee
Most definitely! Thanks for your help.
From shlee on 2017-01-20
@abor, I’ve modified my last answer to you above.
From shlee on 2017-01-20
An issue described by this forum question has been fixed by a Picard repo code change documented here as a pull request and here as the related github issue. The effect of this code change is that when encountering mate-missing reads in a PE data file, RevertSam will now remove all mate information and thereby effectively turn mate-missing records into SE reads.
From inesentis on 2017-05-19
Dear all,
I have downloaded BAM files deposited in EGA from a study conducted some years ago. I do no have information of the “state” of the BAM file. My idea is to process them again and then do the somatic variant calling with Mutect2. I have checked a bit the BAM files and they are mapped and sorted by coordinates. Just to make sure, do I have to follow part B of this tutorial? If I directly convert the BAM files (without applying RevertSam) into fastq files do I have to unmapped the BAM file first and then convert them to fastq?
Thanks!
From shlee on 2017-05-19
Hi @inesentis,
There are many factors for you to consider, including:
1. How were the samples in the Panel of Normals (PoN) you will use preprocessed?
2. Do your BAM files contain multiple read groups? Remember that RevertSam can separate these out with the OUTPUT_BY_READGROUP option and that BQSR is performed at the lane level (different lanes must have different RG IDs). Alternatively, you could potentially queryname-sort your reads to identify the blocks that correspond to different read groups, if your BAMs follow the convention that read names correspond to the RG ID or RD PU fields.
3. How are the different readgroups organized? Is there a PU field that also distinguishes them?
3. Do your BAM files contain QC-failed reads (NON_PF marked by the 0×200 SAM flag)? You may consider removing these to ensure a conservative analysis.
4. Do your BAM files contain SE reads or mate-missing reads from a particular read group or mixed in with PE reads?
5. Are your tumor and normal pair samples named to differentiate them in the @RG SM field? MuTect2 requires these be different.
A first step towards answering these questions is to run your BAMs through ValidateSamFile as well as samtools flagstat.
If you will use MuTect2, if possible please be sure to use the GATK4 MuTect2, instead of the GATK3 MuTect2. Our developers say that GATK4’s MuTect2 (in beta) is now ready for testing. It has improvements to GATK3’s MuTect2. You can find the jar at . Remember that beta status means it is only for testing and not necessarily ready for production purposes.
From inesentis on 2017-05-22
1. I was not thinking of using a PoN since the data that I am downloading are tumor/normal pairs. Is it necessary? How do I built PoN for a study based on normal/tumor pair of samples?
2. The problem with this BAM files is that there are no different RG IDs. In other words, all the files (even form different patients and irrespective of tumor/normal) have the same RG ID. To me it seems that they have sequenced all in the same lane or maybe at some point they aggregate or collapse all the readgroups in the same BAM file and they lost the unique IDs. Not sure how to proceed in this case.
3. No RG PU tag found.
4. The QC -failed reads must have been removed since I do not have them when computing
```
>samtools flagstat anybam.bam
```
5. I do have some singletons. Is there a threshold from which one can not worry or be worried about?
6. Yes! They do have different sample names
From shlee on 2017-05-22
Hi @inesentis,
I highly recommend the use of a PoN for somatic variant calling. You can see how to make one in section 1 of the MuTect2 hands-on tutorial listed [here](https://software.broadinstitute.org/gatk/blog?id=9044). The tutorial’s PoN, I made it myself using publicly available 1000 Genomes Project data. I used forty WES datasets to match my tumor-normal pair WES dataset. The [MuTect2 tool documentation](https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_cancer_m2_MuTect2.php) also tells you how to go about making a PoN.
If you follow along the hands-on tutorial, using the provided example data, you see the value of using a PoN, albeit a minimal PoN made from 40 samples, in sections 5 and 6. I have [WDL scripts](https://software.broadinstitute.org/wdl/) for creating this PoN in the cloud that I can share with you. You’ll have to make your own Docker image however as the Docker I use is my personal private Docker container and is unshareable.
Even if there are no RGIDs, it may be possible to differentiate lane-level data from the read names themselves as I allude to in my previous post. This depends on if the read names are the original or if they were changed.
> Alternatively, you could potentially queryname-sort your reads to identify the blocks that correspond to different read groups, if your BAMs follow the convention that read names correspond to the RG ID or RD PU fields.
In addition to the QC-failed reads that you say are already removed, you should see if reads from short inserts, where there is adaptor sequence read-through, still remain in the file. Depending on the extent of these, you may wish to remove them or proceed knowing there is a minimal amount of these. You can use Picard’s MarkIlluminaAdapters for such metrics.
Singletons can still inform your analyses. Just make sure the unmapped mate is present in the file as well. Otherwise, this may cause problems with some tools, e.g. MarkIlluminaAdapters.
From jfiksel on 2017-06-28
I was successfully able to do this step for Tutorial #6483. However, when using RevertSam with the same settings, except with one of my own BAM files, I encountered the following error:
“CIGAR M operator maps off end of reference”
Is there a processing step I have to do to the BAM file before I use RevertSam to get a uBAM?
From shlee on 2017-06-29
Hi @jfiksel,
That doesn’t sound right. Sanitizing and reverting alignments to uBAM should not care about alignment information. We would want to keep these reads, since presumably we are reverting for fresh (and hopefully better) alignment results.
Would you mind submitting a bug report with a snippet of your data that allows us to recapitulate the error you are seeing? This will really help us. Instructions are in Article#1894: .
From jfiksel on 2017-06-29
Hi @shlee,
Thanks for the response. I believe that I have uploaded the necessary files to the ftp and they should be located in `jfiksel_revertsam_bug.zip`. If they’re not there, it would be great if you could link to how to upload files to the ftp server, since I have never used ftp file transfer before.
Jacob
From shlee on 2017-06-30
Hi @jfiksel,
I can recapitulate the error with your test data. I’ve requested that RevertSam, when REMOVE_ALIGNMENT_INFORMATION=true, not get hung up on odd alignments due to read filters.
The cause of the error was that the CIGAR string did not correctly represent the beyond-end-of-reference alignment. For the particular read in the error message, instead of a 100M, the CIGAR should be 74M26S.
To get you going to where you need to go, you can run CleanSam on your data before running RevertSam. CleanSam will soft-clip beyond-end-of-reference alignments and correct the CIGAR string.
Thanks again for the test data.
From jfiksel on 2017-06-30
Thanks for the quick response @shlee ! I successfully was able to complete these steps after first running CleanSam and then running AddOrReplaceReadGroups. I noticed that if I did this without adding in read groups (but after running CleanSam), I got a null pointer exception error. It would be great if you could keep me updated with the status of this issue, since running CleanSam can add a significant amount of processing time for some of the larger WGS files I’m working with.
From shlee on 2017-06-30
@jfiksel, you can track the status of this bug fix at https://github.com/broadinstitute/picard/issues/849. You can also comment in these issue tickets as Picard is an open-source repo. Depending on who picks up this work, someone here at the Broad or external, they may ask that we share your test data. Would you be okay with that?
From jfiksel on 2017-07-03
Thanks @shlee . And yes, that is no problem.
From dannykwells on 2017-07-21
Hi folks,
We’re running into a problem with Revert Sam.
When we run:
“java -Xmx8G -jar ${PICARD_PATH}/picard.jar RevertSam I=${SAMPLE_NAME}.bam O=${TMP_DIR}/${SAMPLE_NAME}_rev.bam ATTRIBUTE_TO_CLEAR=XT ATTRIBUTE_TO_CLEAR=XN ATTRIBUTE_TO_CLEAR=AS ATTRIBUTE_TO_CLEAR=OC \ ATTRIBUTE_TO_CLEAR=OP ATTRIBUTE_TO_CLEAR=X0 ATTRIBUTE_TO_CLEAR=AM ATTRIBUTE_TO_CLEAR=SM”
we are getting, one 2/22 files:
Running RevertSam
[Wed Jul 19 00:41:20 UTC 2017] picard.sam.RevertSam INPUT=R7495_N.bam OUTPUT=/TMP_R7495_N/R7495_N_rev.bam OUTPUT_BY_READGROUP=false OUTPUT_BY_READGROUP_FILE_FORMAT=dynamic SORT_ORDER=queryname RESTORE_ORIGINAL_QUALITIES=true REMOVE_DUPLICATE_INFORMATION=true REMOVE_ALIGNMENT_INFORMATION=true ATTRIBUTE_TO_CLEAR=[NM, UQ, PG, MD, MQ, SA, MC, AS] SANITIZE=false MAX_DISCARD_FRACTION=0.01 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Wed Jul 19 00:41:20 UTC 2017] Executing as root@135b4c4f0e37 on Linux 3.16.0-4-amd64 amd64; OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11; Picard version: 2.9.2-SNAPSHOT
…
Exception in thread “main” picard.PicardException: Two reads with same name but not correctly marked as 1st/2nd of pair: C4JL4ACXX140706:6:1101:10015:69192
at picard.illumina.MarkIlluminaAdapters.doWork(MarkIlluminaAdapters.java:224)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)
Any thoughts about what is going on, or how we can get around it? We aren’t using SANITIZE=true because of its potentially destructive nature, but I would be fine throwing single reads out (or weird pairs, like this) if we could avoid this issue.
All help will be great!
From shlee on 2017-07-21
Hi @dannykwells,
Any reason to suspect your BAM contains a mix of PE and SE reads? If so, try setting `OUTPUT_BY_READGROUP` to true. Or is this RNA data that have been split by Ns?
The error:
> Exception in thread “main” picard.PicardException: Two reads with same name but not correctly marked as 1st/2nd of pair: C4JL4ACXX140706:6:1101:10015:69192
> at picard.illumina.MarkIlluminaAdapters.doWork(MarkIlluminaAdapters.java:224)
occurs when the tool encounters reads with the same name as would typically occur for mates or for secondary/supplementary alignments. The tool checks for first and second of pair flags and if the two reads lack each of these, then it throws an exception you see.
From dannykwells on 2017-07-21
Hi,
These samples are DNA, and all were sequenced together (at the Broad, actually). 22 samples were done together, and 5 are throwing this error.
Any other thoughts on how to get around this?
From shlee on 2017-07-21
@dannykwells, can you post the records? That is, can we take a look at, e.g. `C4JL4ACXX140706:6:1101:10015:69192`?
From huangkang on 2017-09-29
Hi @shlee ,
I got an error
> Exception in thread “main” picard.PicardException: In paired mode, read name 1 (SRR4340020.1.1) does not match read name 2 (SRR4340020.1.2)
> at picard.sam.FastqToSam.getBaseName(FastqToSam.java:445)
> at picard.sam.FastqToSam.doPaired(FastqToSam.java:337)
> at picard.sam.FastqToSam.makeItSo(FastqToSam.java:308)
> at picard.sam.FastqToSam.doWork(FastqToSam.java:281)
> at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
> at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:96)
> at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:106)
I guess the reason is when I splited SRA format into two separate fastq files, they look like this:
```
SRR4340020.1.1 HISEQ:470:HHMLKBCXX:1:1101:1538:1978 length=100 CAGGCAGCAAGCAGTGGTATCAACGCAGAGTACCAGGCAGCAAGCAGTGGTATCAACGCAGAGTACCAGGCAGCAAGCAGTGGTATCAACGCAGAGTACC ``` ```
SRR4340020.1.2 HISEQ:470:HHMLKBCXX:1:1101:1538:1978 length=100
GTACTCTGCGTTGATACCACTGCTTGCTGCCTGGTACTCTGCGTTGATACCACTGCTTGCTGCCTGGTACTCTGCGTTGATACCACTGCTTGCTGCCTGG
```
Didn’t have the `/1` and `/2` read name suffixes, and they have different head head name `@SRR4340020.1.1` and `@SRR4340020.1.2`.
Have you ever met the same error? Thanks!
From Sheila on 2017-10-02
@huangkang
Hi,
Soo Hee is just getting back from vacation and getting caught up. She will respond asap :smiley:
-Sheila
From shlee on 2017-10-03
Hi @huangkang,
Yes, I have seen a similar error before, from another user. The short answer to your issue is that the tool expects /1
and /2
demarcations, if any, at the end of paired read names.
One thing is that the reads you list do not appear to me in the conventional FASTQ format. Here is an example pair from one of my files: ``` @H25T3CCXX150306:2:2122:7546:47369/1 AGCAATAGGAAAGTGCCTCCTGATGGTTTACAGTGTTCACCTGCTTCGGTAACTGCTAATTTTAAACCAGAACCTACAGTCCATATTCATTAAAGAAGAGCTAGCTTACCAACATCATTCAAATTCAGGAGATAAGATTGGCCAGAGAAAG +
==????==<========<===<====<<><=====<<=<=====<<7==>=<====>==<<<>==<===<=<=>>=>>>=>>>?>==>>=?>>>=>>=>>>?>>>=?=>>>=??>??==????>>???>???@??>@?@@@?@@??<==@ @H25T3CCXX150306:2:2122:7546:47369/2 GGAGAGAGACATCTATGGGCCATCATGTCATTGCTTTCCACAGCTGCAGAACAAATGTGTTTCTCCAGGTGAAAAACTATAAACCTGCTTTCTCTGGCCAATCTTATCTCCTGAATTTGAATGATGTTGGTAAGCTAGCTCTTCTTTAATG + <;<=>><=<<==<=>=====<==<===<<==<===<<<==<========<=<===========>=>>>>>>=>>>>=>?>?>>=>>5>;=:=>>??8>:??=8?=@?>.;??;>@??<0?@>=?@<>
So be sure the format of your FASTQ is compatible with FastqToSam.
Second, I want to point you to a section I wrote to update the read group document at https://gatkforums.broadinstitute.org/gatk/discussion/6472/read-groups. This section is titled Deriving ID and PU fields from read names. Notice that the example read names I give also indicate tile number, x-coordinate of cluster and y-coordinate of cluster. This information is important towards differentiating optical duplicates (which are omitted from estimating library complexity) and possibly for BQSR.
Given this, consider the following options.
HISEQ:470:HHMLKBCXX:1:1101:1538:1978
or portions thereof as the read name./1
and /2
demarcations presumably given by the SRR4340020.1.1
and SRR4340020.1.2
. It is important to distinguish which of the pair was read first and which second as that read second tends to have lower base qualities and you want to track this information.From Vzzarr on 2017-10-11
I am executing FastqToSam, but it seems that its execution is single threaded: is it possible to improve efficiency, with parameters similar to -nt or -nct ?
From Sheila on 2017-10-12
@Vzzarr
Hi,
I don’t think there is an easy way to multi-thread FastqToSam. Some Picard tools have an option NUM_PROCESSORS or THREAD_COUNT, but that tool does not.
-Sheila
From Vzzarr on 2017-10-14
@Sheila
I found a way to parallelize FastqToSam with Spark, but only with more input samples.
Thank you very much for your answer,
Nicholas
From mzabidi on 2017-10-16
samtools view input.bam | cut f 12 | tr ‘\t’ ‘\n’ | cut -d ‘:’ -f 1 | awk ‘{ if(!x[$1]++) { print }}’
could be simplified to
samtools view input.bam | cut f 12 | tr ‘\t’ ‘\n’ | cut -d ‘:’ -f 1 | awk ‘!x[$1]++’
From Sheila on 2017-10-18
@mzabidi
Hi,
Thanks for the tip :smiley:
-Sheila
From bwray on 2017-10-26
Hi brilliant GATK team, I’ve run into an issue with converting some aligned bam files to unaligned, and I thought that maybe you could help me.
I’ve got some bam files that were aligned to a custom reference which includes hg19 in addition to sequences from cancer-related viruses as well as [ERCC spike-ins](http://tools.thermofisher.com/downloads/ERCC92.fa “ERCC spike-ins”) and some other stuff. I don’t have access to the raw reads, though I do have access to the reference that the reads were aligned to when making the bam files.
My bam files are from paired samples (one each control and tumor) and I’m ultimately interested in looking for somatic mutations via something like mutect2.
OK, so I searched my bam files for which alignment tags to target and composed the following command using picard.jar downloaded on 25-sep-2017:
java -Xmx8G -jar /path/to/picard.jar RevertSam I=firstSample.bam \ O=firstSample_revertsam.bam \ SANITIZE=true \ MAX_DISCARD_FRACTION=0.005 \ ATTRIBUTE_TO_CLEAR=OC \ ATTRIBUTE_TO_CLEAR=OP \ ATTRIBUTE_TO_CLEAR=RG \ ATTRIBUTE_TO_CLEAR=XS \ SORT_ORDER=queryname \ RESTORE_ORIGINAL_QUALITIES=true \ REMOVE_DUPLICATE_INFORMATION=true \ REMOVE_ALIGNMENT_INFORMATION=true \ TMP_DIR=/path/to/tempDir
The process moves along well enough until it gets into the sequences that are aligning to the viral elements of the reference, at which point the output can no longer give a read position. Here is the error message in context (at this point it had already reverted the sequences aligned to autosomal chromosomes of GRCh37):
> read position: X:154,005,195 > INFO 2017-10-26 13:59:30 RevertSam Reverted 283,000,000 records. Elapsed time: 00:44:40s. Time for last 1,000,000: 9s. Last > read position: GL000236.1:13,855 > INFO 2017-10-26 13:59:39 RevertSam Reverted 284,000,000 records. Elapsed time: 00:44:49s. Time for last 1,000,000: 8s. Last > read position: hs37d5:1,465,475 > INFO 2017-10-26 13:59:49 RevertSam Reverted 285,000,000 records. Elapsed time: 00:44:58s. Time for last 1,000,000: 9s. Last > read position: hs37d5:20,357,882 > INFO 2017-10-26 13:59:58 RevertSam Reverted 286,000,000 records. Elapsed time: 00:45:08s. Time for last 1,000,000: 9s. Last > read position: / > INFO 2017-10-26 14:00:04 RevertSam Reverted 287,000,000 records. Elapsed time: 00:45:13s. Time for last 1,000,000: 5s. Last > read position: / > INFO 2017-10-26 14:00:07 RevertSam Reverted 288,000,000 records. Elapsed time: 00:45:17s. Time for last 1,000,000: 3s. Last > read position: / > INFO 2017-10-26 14:00:11 RevertSam Reverted 289,000,000 records. Elapsed time: 00:45:20s. Time for last 1,000,000: 3s. Last > read position: / > INFO 2017-10-26 14:00:12 RevertSam Detected quality format for HCGCJBBXX_1_J02043: Standard > INFO 2017-10-26 14:00:12 RevertSam Detected quality format for HCGCJBBXX_2_J02043: Standard > INFO 2017-10-26 14:00:12 RevertSam Detected quality format for HCGCJBBXX_3_J02043: Standard > INFO 2017-10-26 14:00:12 RevertSam Detected quality format for HCGCJBBXX_4_J02043: Standard > [Thu Oct 26 14:01:09 CDT 2017] picard.sam.RevertSam done. Elapsed time: 46.33 minutes. > Runtime.totalMemory()=3386376192 > To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp > Exception in thread “main” java.lang.NullPointerException > at picard.sam.RevertSam.sanitize(RevertSam.java:400) > at picard.sam.RevertSam.doWork(RevertSam.java:274) > at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205) > at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94) > at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)
I’d appreciate any direction you could give me, and thanks!
B
From bwray on 2017-10-27
Sorry, I noticed a small typo in the log running up to the error. Where the read position is just a slash, it should instead be a slash bracketed by asterisks’, i.e. /
From Sheila on 2017-10-29
@bwray
Hi B,
What happens when you try validating your input BAM file with [ValidateSamFile](http://broadinstitute.github.io/picard/command-line-overview.html#ValidateSamFile)?
Thanks,
Sheila
From bwray on 2017-10-30
Good morning [Sheila](https://gatkforums.broadinstitute.org/firecloud/profile/Sheila "
Sheila”),
Thanks for responding.
I’ve run ValidateSamFile on the first six samples so far, and all of the outputs report No errors found.
However, the “Last read position” for each sample is inconsistent.
Sample 1
Last read position: \*/\*
Sample 2
Last read position: X:111,833,845
Sample 3
Last read position: \*/\*
Sample 4
Last read position: X:148,817,477
Sample 5
Last read position: \*/\*
Sample 6
Last read position: GL000220.1:146,293
Here’s the template for the command I ran:
java -jar /path/to/picard.jar ValidateSamFile I=input.bam MODE=SUMMARY O=summary_input.bam
Thanks,
B
From Sheila on 2017-11-01
@bwray
Hi B,
Interesting. Can you confirm you are using the latest version of Picard? I need to check with the team on what may be happening here. We will get back to you soon.
-Sheila
From shlee on 2017-11-01
Hi @bwray,
Based on what you say:
> The process moves along well enough until it gets into the sequences that are aligning to the viral elements of the reference, at which point the output can no longer give a read position.
I see two options for you to move forward.
1. If you do not care about reads that align to the viral elements and later contigs in the reference, then you can subset your alignments to those contigs you care about using PrintReads + `-L` and revert just these.
2. If you do care about all the reads, then see if setting RevertSam’s `VALIDATION_STRINGENCY` to LENIENT or SILENT lets the process complete. Picard v2.10.6+ updated RevertSam to interpret the LENIENT/SILENT settings to ignore any alignment information that causes ValidateSam to error, which in turn in prior versions caused RevertSam to fail as well.
Solution 2 would only work if the validation step is causing your RevertSam step to error. Otherwise, can you post a few of the read records that align to contigs that error the reversion? Thanks.
From namratha on 2017-12-19
hello,
I followed the procedure to generate a UBAM file from a FASTQ file but my new UBAM file is created with a size of 0, i am not sure why, I followed the code exactly just changing the the fastq files and the run date
From shlee on 2017-12-20
Hi @namratha,
We’ll need more information from you to help diagnose what may be going on. For starters, please follow instruction #6 below:
1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (capture kit) or WGS (PCR-free or PCR), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running [ValidateSamFile](https://software.broadinstitute.org/gatk/guide/article?id=7571) for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.
From Kemool on 2018-04-08
Hello, I apologize in advance for the lack of knowledge/experience in this, I am just starting and am trying to figure things out. After generating a BAM file using the FastqToSam I do not get any error messages in terminal but do not know how to view the new BAM file. I tried opening it in IGV but it gives me the error “Error loading /Users/kellymoolick/Downloads/picard-2.18.2/tutorial_6484_FastqToSam/6484_snippet_fastqtosam.bam: An error occurred while accessing: /Users/kellymoolick/Downloads/picard-2.18.2/tutorial_6484_FastqToSam/6484_snippet_fastqtosam.bam Error loading BAM file: org.broad.igv.exceptions.DataLoadException: An error occurred while accessing: /Users/kellymoolick/Downloads/picard-2.18.2/tutorial_6484_FastqToSam/6484_snippet_fastqtosam.bam.bai Index file not found. Tried /Users/kellymoolick/Downloads/picard-2.18.2/tutorial_6484_FastqToSam/6484_snippet_fastqtosam.bai”
Thank you!!
From Sheila on 2018-04-09
@Kemool
Hi,
It looks like you don’t have index files for the BAM files. You can index them using samtools index. Have a look at [this article](https://gatkforums.broadinstitute.org/gatk/discussion/2909) for more help.
-Sheila
From FPBarthel on 2018-05-01
Hi, thanks for providing this great tool to the community. Do you have any information about the impact of RevertSam (esp. reversal of base recalibration scores) on downstream variant calling? One reason I ask is that I noticed that the GDC pipeline (https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/) does seem to use this step.
P.s. is there any good reference that explains all the BAM/FASTQ attributes, eg. XT, CO, AS, etc? I have tried a lot of googling but have not found a satisfactory resource.
From shlee on 2018-05-01
Hi @FPBarthel,
When you perform RevertSam, we recommend you `RESTORE_ORIGINAL_QUALITIES` and go through BQSR afresh with the new alignment data. We do not recommend performing BQSR on data that has already undergone recalibration. If your data to be reverted did not save the original qualities, and if you are curious, then you can see how well the base qualities match expectations by undergoing BQSR with the new alignments. Again, do not use twice-recalibrated reads. As far as I know, we have not looked into the impact of twice-recalibrated alignments.
The GDC pipeline performs BQSR after indel realignment.

Check out the SAM specification (SAM v1) at https://samtools.github.io/hts-specs/. The tags are described in a supplementary document (SAM tags). The links to the specs are on the left side.
From FPBarthel on 2018-05-01
Thanks @shlee ! That clears it up.
I had found the SAM specifications already, but it does not list the tags that I was looking to get information on (FT, XN and XT). I tried searching google for a reference of these “end-user specific tags” (specific to GATK and related tools) but without much result, except that FT is related to BQSR?
From Samir on 2018-05-01
```
XT tag, e.g. XT:A:U = unique mapper; XT:A:R = more than 1 high-scoring matches
```
FN is probably number of read features in each record3.
[1]: https://genome.sph.umich.edu/wiki/SAM
[2]: http://physiology.med.cornell.edu/faculty/elemento/lab/R/CSHL_2012/CSHL-basic-HTS-2012.pptx
[3]: https://samtools.github.io/hts-specs/CRAMv3.pdf
From Sheila on 2018-05-08
@FPBarthel
Hi,
FT tells you the reason why the record has been unaligned or marked as not-passing filters.
XN tells you the number of ambiguous bases in the reference.
XT tells you whether the read is uniquely mapped or not.
-Sheila
From jejacobs23 on 2018-06-19
There was a previous tutorial ,(howto) Revert a BAM file to FastQ format (https://gatkforums.broadinstitute.org/gatk/discussion/2908/howto-revert-a-bam-file-to-fastq-format), that indicated that an important component was the shuffle step. If the goal is to take a .bam file and revert it back to a uBAM and ultimately a FASTQ file so that the user can then analyze the reads in their own pipeline, then is this current tutorial the appropriate tool to use? Is there a shuffle-like procedure build into the RevertSam function?
From shlee on 2018-06-19
Hi @jejacobs23,
To quote a passage from the above tutorial:

From Ojo on 2019-08-28
Hi,
I tried converting paired WGS fastq files (SRR7665095_1.fastq.gz and SRR7665095_2.fastq.gz from 1000 Genomes) to uBAM format but ran into a problem.
First read of SRR7665095_1.fastq.gz:
```@SRR7665095.1 1/1```
```ATCCCTCCCCACTTCCCCCACCCCACAACAGTCCCTGGGGGGTTATTTTCTTAGGTTTGATTTCTGTTTTTATTGCCTCATGGTCCAAAAGTGTGGTAGGTATGATTTCACCTTTTTTTGAATTTTCTTAGTGTTGTGTTATTGCTGTTG```
```+```
```AAAF7FFJJJ-AFJJJJJJJJJJJAAJFFJAF-AAJFFJJJ-AAFJJJJJA7-AAJA—7F——```
first read of SRR7665095_2.fastq.gz:
```@SRR7665095.1 1/2```
```TGCAAATAGACGTAAATAATTACACAATAATAGTGGGAGACTTTAACAGCCCACTGACCATGTTAGATAGATCACTGAGACATAAAACTAATTAAGCATTTAAGACCTGAACTTGACACTTTACCAAATCGAACTAACAGACATCTACAA```
```+```
```AAAFFJFFJJJFFJJJAJJ-AJAJJJJJJJJFAJJJJ
I used the following command line used which generated an error message:
```java -Xmx8G -jar ~/bin/picard.jar FastqToSam FASTQ=SRR7665095_1.fastq.gz FASTQ2=SRR7665095_2.fastq.gz OUTPUT=SRR7665095_fastqtosam.bam SAMPLE_NAME=SRR7665095 PLATFORM=illumina RUN_DATE=2019-08-28T11:02:04+01:00```
Error message:
```Exception in thread “main” htsjdk.samtools.SAMException: invalid code lengths set at line 8065173 in fastq SRR7665095_1.fastq.gz at htsjdk.samtools.fastq.FastqReader.readNextRecord(FastqReader.java:139) at htsjdk.samtools.fastq.FastqReader.next(FastqReader.java:152) at picard.sam.FastqToSam.doPaired(FastqToSam.java:404) at picard.sam.FastqToSam.makeItSo(FastqToSam.java:380) at picard.sam.FastqToSam.doWork(FastqToSam.java:353) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
Caused by: java.util.zip.ZipException: invalid code lengths set at java.util.zip.InflaterInputStream.read(java.base@9-internal/InflaterInputStream.java:164) at java.util.zip.GZIPInputStream.read(java.base@9-internal/GZIPInputStream.java:117) at sun.nio.cs.StreamDecoder.readBytes(java.base@9-internal/StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(java.base@9-internal/StreamDecoder.java:326) at sun.nio.cs.StreamDecoder.read(java.base@9-internal/StreamDecoder.java:178) at java.io.InputStreamReader.read(java.base@9-internal/InputStreamReader.java:185) at java.io.BufferedReader.fill(java.base@9-internal/BufferedReader.java:161) at java.io.BufferedReader.readLine(java.base@9-internal/BufferedReader.java:325) at java.io.BufferedReader.readLine(java.base@9-internal/BufferedReader.java:390) at htsjdk.samtools.fastq.FastqReader.readLineConditionallySkippingBlanks(FastqReader.java:207) at htsjdk.samtools.fastq.FastqReader.readNextRecord(FastqReader.java:114) … 7 more```
Java version: openjdk version “9-internal“
Picard Version: 2.20.6-SNAPSHOT
Here’s the read in SRR7665095_1.fastq.gz file starting at the troublesome line 8065173 (@SRR7665095.2016294 2016294/1):
```@SRR7665095.2016294 2016294/1```
```TGAGTCCCCAAGATTTATTTTCCCTTCGTAAGTGTTCCTATGAGTATTAATTATTCATTGTGTCTTTTATTACACAAATAAGGCACAGATTTTTAAGAAATCATCAACTTCATGGCTACCTATATAGACATAATTACACAGAAGCTCAAC```
```+```
```AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
I’m not sure what is wrong with line 8065173 in the file. Can someone help please?
From Terrymoore11 on 2019-10-23
Buy IELTS certificate online, Buy TOEFL certificate online, Buy GRE Certificate Online, Buy GMAT Certificate Online, Buy CAE Certificate Online, Buy NEBOSH Certificate Online, Buy PTE Certificate Online,
Buy CAE certificate online, Registered IELTS certificate for sale, IELTS certificate without exam, Buy original IELTS certificate, Buy CAE certificate online, Buy TOEFL certificate online, Buy IELTS certificate online, Buy GRE Certificate Online, Buy GMAT Certificate Online, Buy original IELTS certificate, IELTS certificate without exam, Buy IELTS Certificate In Romania – Buy IELTS Certificate in Spain – Buy Verified IELTS certificate in Belarus – Buy IELTS Without Exam in Bulgaria – Buy Real IELTS Certificate in Czech Republic – Buy IELTS Certificate In Hungary – Buy Original IELTS certificate in Poland – Buy IELTS Certificate In Moldova – Buy IELTS Certificate in Russia – Buy IELTS Certificate in Slovakia – Buy IELTS Certificate In Ukraine – Buy IELTS Certificate In Aland Islands – Buy IELTS Certificate In Denmark – Buy IELTS Certificate In Estonia – Buy IELTS certificate in Faroe Islands – Buy ielts without exam in Finland – Buy IELTS Without Exam in Guernsey – Buy Registered IELTS certificate in Iceland – Buy Certified IELTS certificate in Ireland – Buy IELTS Certificate in Isle of Man – Buy IELTS Certificate in Jersey – buy ielts certificate without exam in Latvia – Buy IELTS Certificate Without Exam In Lithuania – Buy IELTS Certificate In Norway – Buy IELTS Certificate In Sark – Buy Registered IELTS Certificate in Sweden – Buy IELTS Certificate In United Kingdom – Buy IELTS Without Exam in Albania – Buy IELTS certificate In Andorra – Buy IELTS Certificate in Bosnia and Herzegovina – Buy IELTS certificate in Croatia – Buy IELTS certificate in Gibraltar
Looking IELTS/TOEFL/GMAT/GRE/PTE/CAE without Exams? Contact us to BUY ORIGINAL IELTS/TOEFL CERTIFICATES, Get resident permit in Canada, Australia, Europe, Buy registered IELTS/TOEFL Certificate WhatsApp: +1 (925) 391-8330 Websitelink:immigrationcertificate.wordpress.com
Buy Genuine Registered IELTS Certificate Without Attending Exam
YOUR PATHWAY TO IELTS SUCCESS
? Buy Original and Authentic IELTS, TOEFL, GMAT, GRE, PTE, CAE, SAT, PMP, CELPIP, TESOL, NEBOSH, FCE, PSAT, Certificates
? Buy IELTS Certificates With Your Desired Score Band without Exam
? 100% registered and verifiable IELTS certificates online
WhatApp:9253918330
Buy IELTS certificate online, Buy TOEFL certificate online, Buy GRE Certificate Online, Buy GMAT Certificate Online,
Buy CAE certificate online, IELTS Certificate for sale, Registered IELTS certificate for sale, IELTS certificate without exam, Buy original IELTS certificate, Buy CAE certificate online, Buy TOEFL certificate online, Buy IELTS certificate online, IELTS Certificate for sale, Registered IELTS certificate for sale, Buy GRE Certificate Online, Buy GMAT Certificate Online, Buy original IELTS certificate, IELTS certificate without exam, Buy IELTS Certificate In Romania – Buy IELTS Certificate in Spain – Buy Verified IELTS certificate in Belarus – Buy IELTS Without Exam in Bulgaria – Buy Real IELTS Certificate in Czech Republic – Buy IELTS Certificate In Hungary – Buy Original IELTS certificate in Poland – Buy IELTS Certificate In Moldova – Buy IELTS Certificate in Russia – Buy IELTS Certificate in Slovakia – Buy IELTS Certificate In Ukraine – Buy IELTS Certificate In Aland Islands – Buy IELTS Certificate In Denmark – Buy IELTS Certificate In Estonia – Buy IELTS certificate in Faroe Islands – Buy ielts without exam in Finland – Buy IELTS Without Exam in Guernsey – Buy Registered IELTS certificate in Iceland – Buy Certified IELTS certificate in Ireland – Buy IELTS Certificate in Isle of Man – Buy IELTS Certificate in Jersey – buy ielts certificate without exam in Latvia – Buy IELTS Certificate Without Exam In Lithuania – Buy IELTS Certificate In Norway – Buy IELTS Certificate In Sark – Buy Registered IELTS Certificate in Sweden – Buy IELTS Certificate In United Kingdom – Buy IELTS Without Exam in Albania – Buy IELTS certificate In Andorra – Buy IELTS Certificate in Bosnia and Herzegovina – Buy IELTS certificate in Croatia – Buy IELTS certificate in Gibraltar
Looking IELTS/TOEFL without Exams? Contact us to BUY ORIGINAL IELTS/TOEFL CERTIFICATES, Get resident permit in Canada, Australia, Europe, Buy registered IELTS/TOEFL Certificate WhatsApp:+1 (925) 391-8330
Buy Genuine Registered IELTS Certificate Without Attending Exam
YOUR PATHWAY TO IELTS SUCCESS
? Buy Original and Authentic IELTS, TOEFL, GMAT, GRE, PTE, CAE, SAT, PMP, CELPIP, TESOL, NEBOSH, FCE, PSAT, Certificates
? Buy IELTS Certificates With Your Desired Score Band without Exam
? 100% registered and verifiable IELTS certificates online
WhatApp:+1 (925) 391-8330
‘‘Buying Bitcion online in India. Any amount as from $200 dollars
Contact us via WhatsApp:+1 (925) 391-8330 for a prper and serious discussion’‘
Buy IELTS Certificate online | Genuine IELTS Certificate without Test
We are group of teachers and Examiner’s, specialized in the acquisition of Registered/valid/legit/Authentic IELTS, TOEFL, GMAT, GRE, PTE, CAE, SAT, PMP, CELPIP, TESOL, NEBOSH, FCE, PSAT,Nebosh Etc English Speaking certificate.. We work with the British Council; IDP: IELTS Australia and University of Cambridge ESOL Examinations (Cambridge ESOL). database technicians who are responsible for Test report verification and registration of all IELTS results and with our help, you have guaranteed entry of your information into the respective database (IDP/BC) with valid test results to proof without any problem. The IELTS/Toefl Certificates we issue are legit and verifiable and valid for two years. Our IELTS/Toefl/pte Certificates serve migration processing and also in obtaining Permanent Residence…
WhatApp:+1 (925) 391-8330
Websitelink:immigrationcertificate.wordpress.com
- We only produce Real and IDP/BC verified IELTS Certificates
- We do not produce fake IELTS certificates as they serve no purpose
- We keep client information discreet and we do not share with any third party
With these Certificates you have a shot at a migration process.
*Please note that All real IELTS, TOEFL, GMAT, GRE, PTE, CAE, SAT, PMP, CELPIP, TESOL, NEBOSH, FCE, PSAT, Certificates should be Original and registered in the database.
WhatApp: +1 (925) 391-8330
or
Email: (immigration.certificate01@yahoo.com) Websitelink:immigrationcertificate.wordpress.com
Two IELTS tests: Academic and General
Why you’re doing the IELTS test will effect which kind of test you choose:
The Academic module is normally for people who want to study an undergraduate or postgraduate degree course or who wish to register with a professional body.
The General Training module focuses on basic survival skills in a broader social and educational setting. General Training is suitable if you are joining a training programme or doing work experience in English speaking countries. It is not designed for degree level.
Please note, if you are applying to an organisation you must check with them to make sure you choose the correct IELTS test.
Our Services:
1- we provide Official certificate with registration into the database and actual center stamps for customers interested in obtaining the certificate without taking the test.
2- If you already took the test and it is less than a month that you took the test, we can update the results obtained in your previous test to provide you with a new certificate with the updated results for you to follow you PR procedures without any risk.
3- we can provide Question papers for future test before the actual test date. the questionnaires will be issued about 6 to 10 days before the test data and will be 100% same questions that will appear in the test. guaranteed at 100%.
4- We are teachers and examination officials working together as team so you can choice any of our proffessional to go in for the exams on your behafl.
5- You can register for your exams and go in for but we shall provide your target scores as you request because we have underground partners working at any center test which give us access into the system.
ABOUT US:
>>We are fast, reliable and flexible
>>We are popular and trusted
>>We are highly experienced
>>We have excellent pass into database.
WhatApp: +1 (925) 391-8330
or
Email: (immigration.certificate01@yahoo.com)
*Do you need Real and IDP/BC verified IELTS Certificates? *Do you like to Get academic or general IELTS Certificate Test? *Are You trying to get Band 7,8,9 or more in IELTS certification in Asia, Europe, America, Africa etc ? *Do you need to edit and increase your past score ? *Do you need a teacher to write the exams for you ? *Do you need questions/test/exams paper both questions and answers ? *Do you need our help in the exam to provide your target score even if you fail ?
We can help you!!!
Some may not have the time or patience to do this and some may be afraid of complications not to have the right agent from the right source. There are many agents and their competence (and honesty!) ranges from excellent all the way down to non-existent.
One may decide to use an agent to help and advice on how to get his/her certificate. But, if you do decide to use an agent, be careful especially on the internet. WE ALWAYS ADVISE OUR CLIENTS TO BE CAREFUL .
The best way to ensure that you are in direct deal with competent, professional and honest officials, feel free to leave us a message, using email: (immigration.certificate01@yahoo.com)
Our representatives are waiting to reply to your inquiries 24/7, and set you on your way toward obtaining your IELTS, TOEFL, GMAT, GRE, PTE, CAE, SAT, PMP, CELPIP, TESOL, NEBOSH, FCE, PSAT, certificates that may dramatically change your life for the better!.
Through us it is straight forward; with a little time and effort to spent
WhatApp: +1 (925) 391-8330
or
Email: (immigration.certificate01@yahoo.com)
Websitelink:immigrationcertificate.wordpress.com
=======KEYWORDS=========
obtain your desired scores in IELTS
get real ielts without exams in karachi,
get real ielts without exams in China
buy ielts certificate in saudi arabia,
buy ielts certificate in China
get real ielts without exams in kuwait,
buy ielts certificate in pakistan,
where to buy ielts certificate in Korea,
get real ielts without exams in in China
how to buy registered ielts in lebanon,
how to buy registered ielts in multan,
buy registered ielts in China
ielts band 7 for immigration in Australia,
how to buy registered ielts in in China
buy ilets certificate online
ielts backdoor in punjab
ielts certificate without exam in delhi
buy ielts certificate in punjab
ielts proxy agents
ielts certificate without exam in mumbai
ielts exam backdoor in hyderabad
ielts certificate without exam in punjab
buy ielts certificate india
how to get ielts certificate without exam in india
how to get ielts certificate without exam in chennai
ielts backdoor in punjab
ielts certificate without exam in kerala
buy original ielts certificate
ielts certificate without exam in hyderabad
ielts certificate without exam in chennai
ielts certificate without exam in kerala
ielts certificate without exam in delhi
ielts certificate without exam in punjab
ielts certificate without exam in mumbai
ielts certificate without exam in bangalore
ielts backdoor chennai
ielts certificate without exam in chennai
buy ielts certificate india
can buy ielts certificate in CHINA
Obtain/Order Real IELTS certificates without Exams
Buy/acquire/Obtain Genuine IELTS certificates Egypt
Obtain/Purchase IELTS certificates without Test
buy real ielts certificates Islamabad
buy real ielts certificates in china
get ielts certificate without exam in india
buy real ielts certificates in pakistan
buy ielts certificate Islamabad
buy ielts certificate in Hawally, Kuwait
get ielts certificate in united kingdom
get real ielts certificate in Hawally, Kuwait
buy ielts certificate in multan
get ielts certificate without exam in pakistan
buy original ielts certificate in UK
buy original ielts certificate in united kingdom
ielts certificate without exam in hyderabad
ielts certificate without exam in delhi
buy original ielts certificate in karachi
ielts certificate without exam in punjab
ielts certificate without exam in china
ielts certificate without exam in brazil
get ielts certificate without exan in Hawally, Kuwait
how to get ielts certificate without exam in brazil
Buy original ielts in brazil
get gegistered ielts in brazil without exam
Acquired valid ielts in brazil
get ielts without exam in brazil
Apply for authentic ielts in brazil
get your desired 7 in brazil
Obtain real ielts in brazil
buy registered ielts without exam in brazil
gain real certified ielts in brazil
ielts band 7.5 in Hawally, Kuwait
purchase ielts in brazil
ielts backdoor in Hawally, Kuwait
we selling registered ielts certificate in brazil
can buy ielts certificate in brazil
ielts backdoor in Brazil
get ielts certificate in Brazil
buy original/real/registered ielts in brazil
get your desired band 7 in brazil
buy original ielts certificate in China
buy ielts certificate in china
Take an IELTS test in England
Obtain Ielts Band 7 in China
Obtain Ielts Band 7 in spain
get ielts certificate in spain
buy ielts certificate in spain
apply for ielts in spain
acquire registered ielts certificate in spain
get ielts online without exam
get ielts certificate without test
ielts backdoor in spain
buy/get/obtain ielts asap
get ielts certificate online without exam
Apply for a registered ielts online
ielts certificates registration online
new ielts certificate proxy
Request/register/apply for real ielts certificate
Can I have another copy of my IELTS Test Report Form
get real Ielts Certificate Without exam in brazil
get real Ielts Certificate Without test in iran
how to pay for ielts exam in uk
how ro pay for ielts exam in Hawally, Kuwait
get real Ielts Certificate Without test
buy ielts certificte without exam
get real Ielts Certificate Without test in dubai
get real Ielts Certificate Without test in suadi arabia
buy ielts certificates in Autralia
cambridge ielts book 12 academic free download
get real Ielts Certificate Without test in united kingdom
get real Ielts Certificate Without test in malaysia
get real Ielts Certificate Without test in canada
get real Ielts Certificate Without test in usa
get real Ielts Certificate Without test in pakistan
get real Ielts Certificate Without test in oman
get real Ielts Certificate Without test online
get real Ielts Certificate Without test in multan
buy ielts certificates in multan
buy ielts certificates in UK
buy ielts certificate in punjab
buy original ielts certificate
get ielts certificate without exam in pakistan
buy ielts certificate uk
ielts score without exam
buy ielts certificate uk
ielts certificate for sale in india
buy valid ielts certificate
ielts certificates in in china
ielts my status page
buy ielts certificates in india
buy ielts certificates in united kingdom
ielts certificates in united kingdom
buy ielts certificates in pakistan
buy ielts certificates in karachi
buy ielts certificates in Islamabad
buy ielts certificates in Bahawalpur
IELTS Registration How do I register
buy ielts certificates in china
how can i get a registered ielts certificate
Buy Ielts Certificate Without Exam in uae
How do I register ielts
get ielts without exam n pakistan
how can i get ielts certificate without exam
Buy Ielts Certificate Without Exam in dubai
Buy Ielts Certificate Without Exam in kuwait
Buy Ielts Certificate Without Exam in india
Buy Ielts Certificate Without Exam in saudi arabia
Buy Ielts Certificate Without Exam uae
get real Ielts Certificate Without test online
how do i obtain a registered ielts
ielts results 2018 how do i check ?
buy ielts certificates in united kingdom
Obtain/Get IELTS certificates in Bahawalpur
Obtain/Order Real IELTS certificates without Exams
how can i see my ielts result online
Get IELTS certificate Band 7 Bahawalpur
Get IELTS certificate Band 7 in China
ielts band 7 for immigration in canada
Obtain/Order Real IELTS certificates without Exams IN UK
register IELTS certificate online
Get high band score in ielts – 100% verified and trusted?
We can help you to get IELTS Certificate without attending the exams ,our certificates are British Council certified.
You can use our certificates for University admission and any immigration issues. The regions we cover are UAE, Qatar, Oman, Saudi Arabia, Kuwait , Jordan , Australia ,Asia, Canada , Europe , Africa and US . Because the business is confidential , very little information is provided to the public and details of our certificates are only provided to paying clients. Our organisation is well connected with various invigilators,British council data base managers and test centers which enables us to register your scores in any ielts center around the world . All our certificates are original and British Council certified. We do not make fake certificates!!!!
WhatApp: +1 (925) 391-833
or
Email: (immigration.certificate01@yahoo.com)
Websitelink:immigrationcertificate.wordpress.com