test with sierra, preparing data and trying assembly with allpaths-lg

Post date: Sep 15, 2014 2:41:56 PM

We used allpaths-lg to assemble the original Lycaeides melissa genome, and I am trying this software first for the new genomes, starting with the sierra nevada population. Here is a link to the allpaths-lg manual. All of the assembly results will be in /labs/evolution/data/lycaeides/whole_genomes/ and there will be one directory per taxon, the first one is Lsierra. The taxon directory contains a DATA directory with the processed data (see below) and the scripts. There are two steps to the assembly.

First I convert the raw data to allpaths-lg format (script = prepareData.sh)

#!/bin/bash

PrepareAllPathsInputs.pl \

DATA_DIR=/labs/evolution/data/lycaeides/whole_genomes/Lsierra/DATA \

PLOIDY=2 HOSTS=16

This requires a in_libs.csv and in_groups.csv file in the taxon directory that describe the data.

Next I run the actual assembly. The assembly script gives the directory structure and any other options, such as the minimum contig size to retain. I started with 250 bp for this, as this worked best for the melissa genome assembly. Here is the qsub bash script (script = qsub_runassem_15sept14.sh)

#!/bin/sh

#PBS -N genome

#PBS -l nodes=1:ppn=48

#PBS -l walltime=96:00:00

#PBS -l mem=960g

#PBS -q batch

. /rc/tools/utils/dkinit

reuse ALLPATHS-LG

cd /home/A01963476/data/lycaeides/whole_genomes/Lsierra

# commands for allpaths-lg

basedir="/labs/evolution/data/lycaeides/whole_genomes"

RunAllPathsLG \

PRE=${basedir}\

REFERENCE_NAME=Lsierra\

DATA_SUBDIR=DATA\

RUN=RUN\

SUBDIR=assem15sept14\

TARGETS=standard\

HAPLOIDIFY=True \

MIN_CONTIG=250 \

THREADS=48\

OVERWRITE=True \

| tee -a ${basedir}/$0.out

Page updated

Google Sites

Report abuse