Before I move on to do alignments, I had to split the parsed fastq file into individual fastq files. I used the sample IDs for this. Here are the steps to do this:
I have the parsed_gomp027_S8_L002_R1_001_phiXfiltered.fastq (parsed file) created after phiX filtering and then parsing the barcodes.
I have the sample ids in sampleids.txt. These are the sample IDs which I used to parse the barcodes and basically are in order of the samples in the 96 well plates for library prep.
I used the perl script splitFastq.pl to split the files. Usage of this file is: perl splitFastq sampleids.txt parsed_gomp027_S8_L002_R1_001_phiXfiltered.fastq
I created a shell script called subSplitFastq.sh to submit a job to the cluster to run the perl script above. Usage of this file: sbatch subSplitFastq.sh.
This script will create 192 files for each individual in each population which was sequenced for GBS data. To crosscheck just check the number of lines in the sample ids file.