Post date: May 10, 2017 2:50:1 AM
3v17. All of the alignment (bam) files are in /uufs/chpc.utah.edu/common/home/u6000989/data/lycaeides/lycaeides_gbs/AssembliesGwa/. Some of these are new alignments and some are symbolic links. You have 1895 alignments (individuals) total (they are all listed in the bams file in that directory). We are using samtools (version 1.2, using htslib 1.2.1) and bcftools (version 1.3 using htslib 1.3) for variant calling.
The submission script is callvar.sh. Here are the exact commands from the script (it is a SLURM submission script, like PBS):
#!/bin/sh
#SBATCH --time=168:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=20
#SBATCH --account=gompert-kp
#SBATCH --partition=gompert-kp
#SBATCH --job-name=samtools
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=zach.gompert@usu.edu
module load samtools
cd /uufs/chpc.utah.edu/common/home/u6000989/data/lycaeides/lycaeides_gbs/AssembliesGwa/
samtools mpileup -b bams -C 50 -d 250 -f /uufs/chpc.utah.edu/common/home/u6000989/data/lycaeides/melissa_genome/final.assembly.fasta -q 30 -Q 20 -I -u -g -t DP,DPR -o varsLycGwa.bcf
bcftools call -v -m -p 0.01 -P 0.001 -O v -o varsLycGwa.vcf varsLycGwa.bcf
And here are descriptions (and defaults) for all of the command line options in case you want them.
###################################
Usage: samtools mpileup [options] in1.bam [in2.bam [...]]
Input options:
-6, --illumina1.3+ quality is in the Illumina-1.3+ encoding
-A, --count-orphans do not discard anomalous read pairs
-b, --bam-list FILE list of input BAM filenames, one per line
-B, --no-BAQ disable BAQ (per-Base Alignment Quality)
-C, --adjust-MQ INT adjust mapping quality; recommended:50, disable:0 [0]
-d, --max-depth INT max per-BAM depth; avoids excessive memory usage [250]
-E, --redo-BAQ recalculate BAQ on the fly, ignore existing BQs
-f, --fasta-ref FILE faidx indexed reference sequence file
-G, --exclude-RG FILE exclude read groups listed in FILE
-l, --positions FILE skip unlisted positions (chr pos) or regions (BED)
-q, --min-MQ INT skip alignments with mapQ smaller than INT [0]
-Q, --min-BQ INT skip bases with baseQ/BAQ smaller than INT [13]
-r, --region REG region in which pileup is generated
-R, --ignore-RG ignore RG tags (one BAM = one sample)
--rf, --incl-flags STR|INT required flags: skip reads with mask bits unset []
--ff, --excl-flags STR|INT filter flags: skip reads with mask bits set
[UNMAP,SECONDARY,QCFAIL,DUP]
-x, --ignore-overlaps disable read-pair overlap detection
Output options:
-o, --output FILE write output to FILE [standard output]
-g, --BCF generate genotype likelihoods in BCF format
-v, --VCF generate genotype likelihoods in VCF format
Output options for mpileup format (without -g/-v):
-O, --output-BP output base positions on reads
-s, --output-MQ output mapping quality
Output options for genotype likelihoods (when -g/-v is used):
-t, --output-tags LIST optional tags to output: DP,DPR,DV,DP4,INFO/DPR,SP []
-u, --uncompressed generate uncompressed VCF/BCF output
SNP/INDEL genotype likelihoods options (effective with -g/-v):
-e, --ext-prob INT Phred-scaled gap extension seq error probability [20]
-F, --gap-frac FLOAT minimum fraction of gapped reads [0.002]
-h, --tandem-qual INT coefficient for homopolymer errors [100]
-I, --skip-indels do not perform indelcalling
-L, --max-idepth INT maximum per-sample depth for INDEL calling [250]
-m, --min-ireads INT minimum number gapped reads for indel candidates [1]
-o, --open-prob INT Phred-scaled gap open seq error probability [40]
-p, --per-sample-mF apply -m and -F per-sample for increased sensitivity
-P, --platforms STR comma separated list of platforms for indels [all]
###################################################
About: SNP/indel variant calling from VCF/BCF. To be used in conjunction with samtools mpileup.
This command replaces the former "bcftools view" caller. Some of the original
functionality has been temporarily lost in the process of transition to htslib,
but will be added back on popular demand. The original calling model can be
invoked with the -c option.
Usage: bcftools call [options] <in.vcf.gz>
File format options:
-o, --output <file> write output to a file [standard output]
-O, --output-type <b|u|z|v> output type: 'b' compressed BCF; 'u' uncompressed BCF; 'z' compressed VCF; 'v' uncompressed VCF [v]
--ploidy <assembly>[?] predefined ploidy, 'list' to print available settings, append '?' for details
--ploidy-file <file> space/tab-delimited list of CHROM,FROM,TO,SEX,PLOIDY
-r, --regions <region> restrict to comma-separated list of regions
-R, --regions-file <file> restrict to regions listed in a file
-s, --samples <list> list of samples to include [all samples]
-S, --samples-file <file> PED file or a file with an optional column with sex (see man page for details) [all samples]
-t, --targets <region> similar to -r but streams rather than index-jumps
-T, --targets-file <file> similar to -R but streams rather than index-jumps
--threads <int> number of extra output compression threads [0]
Input/output options:
-A, --keep-alts keep all possible alternate alleles at variant sites
-f, --format-fields <list> output format fields: GQ,GP (lowercase allowed) []
-g, --gvcf <int>,[...] group non-variant sites into gVCF blocks by minimum per-sample DP
-i, --insert-missed output also sites missed by mpileup but present in -T
-M, --keep-masked-ref keep sites with masked reference allele (REF=N)
-V, --skip-variants <type> skip indels/snps
-v, --variants-only output variantsites only
Consensus/variant calling options:
-c, --consensus-caller the originalcalling method (conflicts with -m)
-C, --constrain <str> one of: alleles, trio (see manual)
-m, --multiallelic-caller alternative model for multiallelic and rare-variant calling(conflicts with -c)
-n, --novel-rate <float>,[...] likelihood of novel mutation for constrained trio calling, see man page for details [1e-8,1e-9,1e-9]
-p, --pval-threshold <float> variant if P(ref|D)<FLOAT with -c [0.5]
-P, --prior <float> mutation rate (use bigger for greater sensitivity) [1.1e-3]