Post date: Mar 22, 2016 2:18:9 AM
Matt has experimental data (chemistry, etc.) from individual plants at a site called fallon. I have GBS data for these same individuals (48 individuals, not quite numbered sequentially). The data are in king:/uufs/chpc.utah.edu/common/home/gompert-group1/data/alfalfa/gbs/fallon/. Note that this was run twice, once on a HiSeq4000 and once on a HiSeq2500. There might have been problems with the first run, so I am using the HiSeq2500 data, which are in gomp011_NoIndex_L005_R1_001.fastq (231,973,765 sequences).
I first split the file into 13 chunks to make parsing quicker (with split). I then ran the parse barcodes script with a perl wrapper (to iterate over the chunks). Thus, I ran the following command:
perl wrap_qsub_slurm_parse.pl xa*
Note that sequence IDs will only be unique within a file (xa*) not across files. I will need to fix this before continuing.