Post date: Aug 28, 2013 10:1:26 PM
I did the following @sunflower.uwyo.edu:
#ID the phix contaminants
tap_contam_analysis --db /data/public/contaminants/phix174 --pct 50 lane3_Undetermined_R1.cat.fastq > phix_lane3_Undetermined_R1.cat.txt &
#Get rid of the contaminants and make new, clean fastq file:
cat lane3_Undetermined_R1.cat.fastq | fqu_cull -r phix_lane3_Undetermined_R1.cat.txt > clean_lane3_Undetermined_R1.cat.fastq
#reminder: stop job and put in background
ctrl z
bg
#Number of reads I got rid of:
wc -l phix_lane3_Undetermined_R1.cat.txt
#1804070
#Number of "clean" reads:
wc -l clean_lane3_Undetermined_R1.cat.fastq
#674037252
#Then I copied the barcode file to sunflower:
scp Desktop/Stygoparnus_barcodes.csv lauren@sunflower.uwyo.edu:/data/local/july13_ut/
#Then I parsed barcodes on node4. Note: Some illumina encodings now use @ as a quality score character, so you can't be sure that a line that starts with @ isn't a quality score line. So you now need to supply the name of the machine that generated the sequence, which is right after the @ in the header lines in your fastq file (HWI-ST1097).
parse_barcodes768.pl /data/local/july13_ut/Stygoparnus_barcodes.csv /data/local/july13_ut/clean_lane3_Undetermined_R1.cat.fastq HWI-ST1097
##Note: the above code for parsing barcodes is not correct. Use filesnames without the paths! I had to rerun it, in the /data/loca/july13_ut/ directory with:
parse_barcodes768.pl Stygoparnus_barcodes.csv clean_lane3_Undetermined_R1.cat.fastq HWI-ST1097
##Note: this didn't work either because I have the cut sites in my barcode file. So, I coped a version of parse_barcodes768.pl to /data/local/july13_ut/
cp /usr/local/bin/parse_barcodes768.pl ./
##Then I edited the script:
#...
#$bcode = "$line[1]"."CAATTC"; # add restriction site, not necessary if barcode + res. site is included
$bcode = $line[1];
#...
#Then I executed it from this directory:
./parse_barcodes768.pl Stygoparnus_barcodes.csv clean_lane3_Undetermined_R1.cat.fastq HWI-ST1097
#Stygoparnus barcode parsing results:
Total number of good mids: 149,068,564
I have 53 individuals, and there is data for all individuals, but only 478 reads for one of the individuals.
(Compared to 41.8 million for Eurycea, <10 million for Heterelmis, 33.8 million for Stygobromus)