VAMP on cetus

VAMP Installation instructions for cetus cluster at Princeton

The Virus AsseMbly Pipeline (VAMP) (https://bitbucket.org/lance_parsons/vamp) is a system designed to assembly viral genomes from paired-end Illumina sequence data. (Paper in progress, Moriah Szpara and Lance Parsons)

Set PATH in .bash_profile

Add to .bash_profile:

export PATH=$HOME/bin:$HOME/.local/bin:$PATH

Logout and log back in so this takes effect

Python Dependencies

Install distribute

curl -O http://python-distribute.org/distribute_setup.py
python distribute_setup.py --user
easy_install --user pip

Install BioPython, cutadapt, pybedtools, and paired_sequence_utils

pip install BioPython --user
pip install cutadapt --user
pip install pybedtools --user
pip install paired_sequence_utils --user

Other Dependencies

Bedtools

cd ~
wget http://bedtools.googlecode.com/files/BEDTools.v2.16.2.tar.gz
tar xzvf BEDTools.v2.16.2.tar.gz
cd BEDTools-Version-2.16.2
make
cp bin/* ~/bin

Install libgtextutils

cd ~
wget http://hannonlab.cshl.edu/fastx_toolkit/libgtextutils-0.6.1.tar.bz2
tar xjvf libgtextutils-0.6.1.tar.bz2
cd libgtextutils-0.6.1
./configure --prefix=$HOME
make
make install
cd ..

Install Fastx_toolkit

wget http://hannonlab.cshl.edu/fastx_toolkit/fastx_toolkit-0.0.13.2.tar.bz2
tar xjvf fastx_toolkit-0.0.13.2.tar.bz2
cd fastx_toolkit-0.0.13.2
export PKG_CONFIG_PATH=$HOME/lib/pkgconfig
./configure --prefix=$HOME
make
make install
cd ..

Install FastQC

cd ~
wget http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.10.1.zip
unzip fastqc_v0.10.1.zip
chmod 755 FastQC/fastqc
ln -s ~/FastQC/fastqc ~/bin

Bowtie is already installed in /usr/local/bin/

Install and setup VAMP

cd ~
tar xzvf /Genomics/grid/users/lparsons/lance_parsons-vamp-6ee1c60a3cc9.tar.gz
cp lance_parsons-vamp-6ee1c60a3cc9/makefiles/config.mk.template lance_parsons-vamp-6ee1c60a3cc9/makefiles/config.mk

Mugsy Instructions

Mugsy (http://mugsy.sourceforge.net/) is a multiple alignment software that can be used to align similar to genomes to one another. The output of this or another similar alignment program is used after assembly to compare genomes.

Installing Mugsy

cd ~
wget "http://sourceforge.net/projects/mugsy/files/mugsy_x86-64-v1r2.3.tgz/download" -O "mugsy_x86-64-v1r2.3.tgz"
tar xzvf mugsy_x86-64-v1r2.3.tgz
cd mugsy_x86-64-v1r2.3

Edit mugsyenv.sh and add path to the installation area

export MUGSY_INSTALL=$HOME/mugsy_x86-64-v1r2.3

Running Mugsy

Before running mugsy, must source mugsyenv.sh, run qsub with -V parameter

OR add the three lines from mugsyevn.sh to your .bash_profile

source ~/mugsy_x86-64-v1r2.3/mugsyenv.sh

Go to directory with fasta files and use qsub to execute mugsy. The -V parameter ensure that the proper environment variables are available to Mugsy and the -cwd parameter ensure things run from the current directory and not your home directory.

cd path/to/genomes
qsub -V -cwd mugsy --directory . --prefix mugsy_alignment NC_001806_1.fasta GU734771_1.fasta GU734772_1.fasta

SNPEffector

SNPEffector (http://snpeff.sourceforge.net) is used to analyze the differences found during the alignment and summarized by compare_genomes.py

Download and install core program (http://snpeff.sourceforge.net/download.html)

cd ~
wget "http://sourceforge.net/projects/snpeff/files/snpEff_v2_1b_core.zip/download" -O "snpEff_v2_1b_core.zip"
unzip snpEff_v2_1b_core.zip
ln -s snpEff_2_1b snpEff

Setup New Custom Genome (http://snpeff.sourceforge.net/supportNewGenome.html)

Copy genome files to snpEff data directory

mkdir ~/snpEff/data
mkdir ~/snpEff/data/NC_001806_1
cp /path/to/genomes/NC_001806_1.fasta ~/snpEff/data/NC_001806_1/sequences.fa
cp /apth/to/genomes/NC_001806_1.gb.gtf ~/snpEff/data/NC_001806_1/genes.gtf

Add the genome to the config file (http://snpeff.sourceforge.net/supportNewGenome.html#conf)

Add the following lines to ~/snpEff/snpEff.config

# HSV1 Strain 17 genome, RefSeq NC001806.1
NC_001806_1.genome : HSV1_strain_17

Create the database

java -jar ~/snpEff/snpEff.jar build -gtf22 -v NC_001806_1 -c ~/snpEff/snpEff.config

Running SnpEff on output

java -jar ~/snpEff/snpEff.jar -c ~/snpEff/snpEff.config NC_001806_1 GU734771_1.vcf

Output from SnpEff

  1. Table to STDOUT of variants and predictions
  2. Summary in snpEff_summary.html
  3. Gene summary in snpEff_genes.txt