Post date: Nov 05, 2013 5:12:18 PM
My attempts to parse barcodes for the new data kept failing because I ran out of wall time on the dorc cluster. To overcome this problem I split each fastq file into files with 50 million lines (12.5 million sequences) using the unix split command. These split files are in data/lycaeides/lycaeides_gbs/Sequences/Split_Melissa. I am now parsing these files and the results are being written to data/lycaeides/lycaeides_gbs/Parsed_Melissa. Some have finished already and all jobs are running. Note some name lines have ' -- ' and some have '-'. The latter are those that had a barcode correction, and it looks like this was driven mainly by one N in the cut-site. I fixed parse_barcodes768.pl to always use ' -- ', but I will need to be careful with this for now. The next step is to split the sequences into individual files. I moved splitFastq.pl to a Scripts directory, and I will use this script to split sequences into individual files (I will merge all of the sequences first).