Post date: Sep 15, 2014 2:41:56 PM
We used allpaths-lg to assemble the original Lycaeides melissa genome, and I am trying this software first for the new genomes, starting with the sierra nevada population. Here is a link to the allpaths-lg manual. All of the assembly results will be in /labs/evolution/data/lycaeides/whole_genomes/ and there will be one directory per taxon, the first one is Lsierra. The taxon directory contains a DATA directory with the processed data (see below) and the scripts. There are two steps to the assembly.
First I convert the raw data to allpaths-lg format (script = prepareData.sh)
#!/bin/bash
PrepareAllPathsInputs.pl \
DATA_DIR=/labs/evolution/data/lycaeides/whole_genomes/Lsierra/DATA \
PLOIDY=2 HOSTS=16
This requires a in_libs.csv and in_groups.csv file in the taxon directory that describe the data.
Next I run the actual assembly. The assembly script gives the directory structure and any other options, such as the minimum contig size to retain. I started with 250 bp for this, as this worked best for the melissa genome assembly. Here is the qsub bash script (script = qsub_runassem_15sept14.sh)
#!/bin/sh
#PBS -N genome
#PBS -l nodes=1:ppn=48
#PBS -l walltime=96:00:00
#PBS -l mem=960g
#PBS -q batch
. /rc/tools/utils/dkinit
reuse ALLPATHS-LG
cd /home/A01963476/data/lycaeides/whole_genomes/Lsierra
# commands for allpaths-lg
basedir="/labs/evolution/data/lycaeides/whole_genomes"
RunAllPathsLG \
PRE=${basedir}\
REFERENCE_NAME=Lsierra\
DATA_SUBDIR=DATA\
RUN=RUN\
SUBDIR=assem15sept14\
TARGETS=standard\
HAPLOIDIFY=True \
MIN_CONTIG=250 \
THREADS=48\
OVERWRITE=True \
| tee -a ${basedir}/$0.out