sexchromosome

Updated on Dec 7th 2018.

I am trying to plot coverage per SNP across males and females. I am using samtools depth command on bam files to do this. Here is the bash script I am running to get the output file (note the working directory):

#!/bin/bash

#SBATCH -n 12

#SBATCH -N 1

#SBATCH -t 300:00:00

#SBATCH -p usubio-kp

#SBATCH -A usubio-kp

#SBATCH -J samtools

cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/lycaeides/lycaeides_dubois/Alignments/bamfiles

module load samtools

samtools depth *M.fastq.sorted.bam > males_depth.txt

samtools depth *F.fastq.sorted.bam > females_depth.txt

After I created the files, I did the following to plot coverage across individuals for each SNP on the Z (1631) chromosome.

#check the total number of SNPs in the files

wc -l females_depth.txt #67586022

wc -l males_depth.txt #81760391

#grab the SNPs on Scaffold 1631 in both files above

grep ^Scaffold_1631 males_depth.txt > scaf1631_males_depth.txt

wc -l z_males_depth.txt #7843564

grep ^Scaffold_1631 females_depth.txt > scaf1631_females_depth.txt

wc -l scaf1631_females_depth.txt #5502804

######################################################################################################################################

Determining mean coverage across scaffolds

awk -F, '{print $1,$3,$2,$4}' OFS=, lyc_v2.csv > lyc_v2.csv

I have so far created a temp folder in the bamfiles directory to test for renaming files. The script works. I need to reorder the columns.

Folder where this is done: /uufs/chpc.utah.edu/common/home/gompert-group1/data/lycaeides/lycaeides_dubois/Alignments/bamfiles

Files needed for this are bam files (input to samtools) and .ids files.

Number of males = 479

Number of females = 352

Unknown = 4

Mapped scripts across males: 478640161

Mapped scripts across females: 354286447

1. I used samtools idxstats to create ids files for each bamfile. This command takes bam file as an input and then writes out a file in which number of mapped reads are listed in a column.

Usage for running across files: for f in mem*.sorted.bam; do samtools idxstats $f >> ./$f.ids; done

Number of lines or scaffolds for each file should be 1652.

Here is what ouput file looks like:

Scaffold Sequence length Number of mapped reads Number of unmapped reads

Scaffold_1;HRSCAF=6 969 0 0

Scaffold_2;HRSCAF=10 67112 7 0

Scaffold_3;HRSCAF=25 63677 7 0

Scaffold_4;HRSCAF=30 15840280 6005 0

Scaffold_5;HRSCAF=41 51524 9 0

Scaffold_6;HRSCAF=43 22424 47 0

Once, I had all these files I could use the information on number of mapped reads to calculate numbers for males and females.

2. Create master file for males and females.

I created one master file for males and females which combined all the ids files for each individual male and female respectively.

I did this by:

cat mem*F*.ids > outFemales.ids #581504 (Number of female ind. (352) * 1652).

cat mem*M*.ids > outMales.ids #791308 (Number of male individuals (479) * 1652).

3. Used R to get mean number of reads across individuals for each scaffold and then plot:

library(plyr)

males = read.table("outMales.ids", header = FALSE)

females = read.table("outFemales.ids", header = FALSE)

library(plyr)

m.scafs = ddply(males, "males[,1]", numcolwise(mean))

f.scafs = ddply(females, "females[,1]", numcolwise(mean))

plot(m.scafs$V3,f.scafs$V3, xlab= "males", ylab = "females")

dev.copy(pdf, 'sexchromosomes_mappedreads.pdf')

dev.off()

#sort the combined file to find out odd scaffolds for possible W chrom.

mf = cbind(m.scafs, f.scafs)

mf.sort = mf[order(mf[,3]),]

write.table(mf.sort, file = "mf_averages_mappedreads.txt")

Possible Z chromosome: row 705: Scaffold_1631;HRSCAF=2129 21870511 55486.95 0

Possible W chromosome: Scaffold_1353;HRSCAF=1677 1259 0 0 Scaffold_1353;HRSCAF=1677 1259 0.00852272727272727 0

Page updated

Google Sites

Report abuse