Post date: Sep 12, 2013 11:39:24 PM
The file and folder names for the whole genome resequence data from oxford are less useful than they could be, but I now have collected the necessary information to track individuals in a README file and two other files: genotypeids.txt and sequencing-submission-form-P130133_stick_insect_Q130022.ods (all in data/timema/timema_wgrs/). Basically, the project names are in the P*md5sum files. These match plate numbers (see the README file) which show up in the ods file. That file has DNA numbers and the associated plate positions (which must correspond to the individual numbers on the fastq files). The DNA numbers are cross referenced with population information in gentoypeids.txt. Thus, once sequence data are associated with plates, it is not too hard to keep track of things.
All of the natural population individuals are on plates 5 and 6, and these are the samples I will assemble first. I am using the dorc to cp the sequence data for these plates to plate5 and plate6 directories, and to unzip the files. Once this is done I should be ready to run bwa.