MSG on Cetus

Multiplexed shotgun genotyping (MSG) from the Andolfatto Lab is a pipeline of scripts to assign ancestry to genomic segments using next-gen sequence data. This method can identify recombination breakpoints in a large number of individuals simultaneously at a resolution sufficient for most mapping purposes, such as quantitative trait locus (QTL) mapping and mapping of induced mutations.

MSG has been installed on the Genomics cetus grid to facilitate use by researchers at Princeton.

Below are instructions based on a document from Deniz Erezyilmaz.

Step 0 - Set PATH to use the required version of samtools.

MSG requires an old version of samtools which is not the default version on cetus.

To have MSG use the old version you need to prepend your PATH with the path to the old version.

Run: export PATH=/usr/local/samtools/0.1.9/bin:$PATH

You can put this line in your .bashrc (beware that this means that everytime you run samtools on cetus you will be using version 0.1.9 unless you explicitly call /usr/local/bin/samtools).

Step 1 - Create a directory and put the following files in it:

  • Sequence Reads - fastq formatted file
  • Barcodes - tab-delimited unix text file
  • Configuration - msg.cfg file, see sample msg.cfg
  • Two parental genomes - fasta formatted files

Step 2 - Tailor the msg.cfg to your analysis. In particular, update:

  • barcodes filename
  • sequence reads filename
  • two parental genome filenames
  • order of chromosomes, depending on the parental genomes' alignment
  • priors (if you are doing an F2 cross)
  • rfac and pnathresh (try defaults first, but you will probably have to adjust)

Step 3 - Update your barcodes file

Step 4 - Create a symlink of the msg program the directory with the data files: ln -s /usr/local/msg .

Step 5 - Run the program with the command: perl msg/msgCluster.pl

Step 6 - Monitor the progress.

You will get an email if problems occur.

You can also use: qstat -u username The column "state" indicates the status of your job. It will be one of d(eletion), E(rror), h(old), r(unning), R(estarted), s(uspended), S(uspended), t(ransfering), T(hreshold) or w(aiting). Type man qstat for more information about these states.

See the MSG Readme for more information on using MSG.