Stygoparnus lane: remove contaminants from raw sequence reads, parse barcodes

Post date: Aug 28, 2013 10:1:26 PM

I did the following @sunflower.uwyo.edu:

#ID the phix contaminants

tap_contam_analysis --db /data/public/contaminants/phix174 --pct 50 lane3_Undetermined_R1.cat.fastq > phix_lane3_Undetermined_R1.cat.txt &

#Get rid of the contaminants and make new, clean fastq file:

cat lane3_Undetermined_R1.cat.fastq | fqu_cull -r phix_lane3_Undetermined_R1.cat.txt > clean_lane3_Undetermined_R1.cat.fastq

#reminder: stop job and put in background

ctrl z

#Number of reads I got rid of:

wc -l phix_lane3_Undetermined_R1.cat.txt

#1804070

#Number of "clean" reads:

wc -l clean_lane3_Undetermined_R1.cat.fastq

#674037252

#Then I copied the barcode file to sunflower:

scp Desktop/Stygoparnus_barcodes.csv lauren@sunflower.uwyo.edu:/data/local/july13_ut/

#Then I parsed barcodes on node4. Note: Some illumina encodings now use @ as a quality score character, so you can't be sure that a line that starts with @ isn't a quality score line. So you now need to supply the name of the machine that generated the sequence, which is right after the @ in the header lines in your fastq file (HWI-ST1097).

parse_barcodes768.pl /data/local/july13_ut/Stygoparnus_barcodes.csv /data/local/july13_ut/clean_lane3_Undetermined_R1.cat.fastq HWI-ST1097

##Note: the above code for parsing barcodes is not correct. Use filesnames without the paths! I had to rerun it, in the /data/loca/july13_ut/ directory with:

parse_barcodes768.pl Stygoparnus_barcodes.csv clean_lane3_Undetermined_R1.cat.fastq HWI-ST1097

##Note: this didn't work either because I have the cut sites in my barcode file. So, I coped a version of parse_barcodes768.pl to /data/local/july13_ut/

cp /usr/local/bin/parse_barcodes768.pl ./

##Then I edited the script:

#...

#$bcode = "$line[1]"."CAATTC"; # add restriction site, not necessary if barcode + res. site is included

$bcode = $line[1];

#...

#Then I executed it from this directory:

./parse_barcodes768.pl Stygoparnus_barcodes.csv clean_lane3_Undetermined_R1.cat.fastq HWI-ST1097

#Stygoparnus barcode parsing results:

Total number of good mids: 149,068,564

I have 53 individuals, and there is data for all individuals, but only 478 reads for one of the individuals.

(Compared to 41.8 million for Eurycea, <10 million for Heterelmis, 33.8 million for Stygobromus)

Page updated

Google Sites

Report abuse