created by shlee
on 2015-11-30
Tools involved
Prerequisites
Download example data
Related resources
PrintReads merges or subsets sequence data. The tool automatically applies MalformedReadFilter and BadCigarFilter to filter out certain types of reads that cause problems for downstream GATK tools, e.g. reads with mismatching numbers of bases and base qualities or reads with CIGAR strings containing the N operator.
-U ALLOW_N_CIGAR_READS
.Subsetting reads corresponding to a genomic interval using PrintReads requires reads that are aligned to a reference genome, coordinate-sorted and indexed. Place the .bai
index in the same directory as the .bam
file.
java -Xmx8G -jar /path/GenomeAnalysisTK.jar \ -T PrintReads \ -R /path/human_g1k_v37_decoy.fasta \ #reference fasta -L 10:91000000-92000000 \ #desired genomic interval chr:start-end -I 6517_2Mbp_input.bam \ #input -o 6517_1Mbp_output.bam
This creates a subset of reads from the input file, 6517_2Mbp_input.bam
, that align to the interval defined by the -L
option, here a 1 Mbp region on chromosome 10. The tool creates two new files, 6517_1Mbp_output.bam
and corresponding index 6517_1Mbp_output.bai
.
SANITIZE
option.To process large files, also designate a temporary directory.
TMP_DIR=/path/shlee #sets environmental variable for temporary directory
Updated on 2015-12-02