Post date: Apr 16, 2016 4:49:54 PM
Variants that are on scaffolds that are part of sex chromosomes should exhibit specific coverage patterns.
Males are ZZ, Females are ZW:
Z markers: males coverage should be twice females
W markers: should only show up on females
I am going to test for this with the GBS data from GLA and SLA (these data are already published). I am calling variants for just these individuals (10 GLA F, 10 GLA M, 8 SLA F, 10 SLA M):
#!/bin/sh
#SBATCH --time=48:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --account=gompert
#SBATCH --partition=kingspeak
#SBATCH --job-name=samtools
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=zach.gompert@usu.edu
echo ------------------------------------------------------
echo SLURM: job identifier is $SLURM_JOBID
echo SLURM: job name is $SLURM_JOB_NAME
echo ------------------------------------------------------
module load samtools
cd /uufs/chpc.utah.edu/common/home/u6000989/data/lycaeides/lycaeides_gbs/AssembliesHost/melissaGbs2genome/
samtools mpileup -b melbams -C 50 -d 250 -f /uufs/chpc.utah.edu/common/home/u6000989/data/lycaeides/melissa_genome/final.assembly.fasta -q 20 -Q 20 -I -u -g -t DP,DPR -o melGLASLA.bcf
bcftools call -v -c -p 0.01 -P 0.001 -O v -o melGLASLA.vcf melGLASLA.bcf
Actually, I decided the above wasn't very useful. We don't need SNPs per se, we just need information on read depth. So, I used samtools to get read depth for male and female (separately) GLA and SLA:
samtools depth aln*LA*M.sorted.bam -Q 20 > depthGLA-SLA-M.txt
samtools depth aln*LA*F.sorted.bam -Q 20 > depthGLA-SLA-F.txt
I then used the perl script calScafSexDepth.pl to calculate total depth (across reads, positions and individuals) for each scaffold. The outfile from this is fmdepth.txt (scaffold, female count, male count). This was all done in king:data/lycaeides/lycaeides_gbs/AssembliesHost/melissaGbs2genome/.
I then began working on my laptop (I will put this all somewhere later) in /home/zgompert/Documents/Local/melmap/. I used addScafSize.pl to add the scaffold lengths to the data file (sc_fmdepth.txt), but I am not sure this is necessary. Plotting male vs. female depth for each scaffolds immediately reveals that most scaffolds are on a 1:1 line (actually 18:20 because of sample sizes), but that some scaffolds are well off of this line with nearly twice the depth for females as male. These are potentially on the Z chromosome. I peeked at the scaffolds most off the main 1:1 line and most of them are on a single LG based on my initial clustering with one family (not done with this). Note that this includes the first 7 or 8 scaffolds (kind of interesting). I want to get 'real' LGs (i.e., combine families), but this all looks very promising.