Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454. Velvet currently takes in short read sequences, removes errors then produces high quality unique contigs. It then uses paired-end read and long read information, when available, to retrieve the repeated areas between contigs. Velvet consists of two programs: velveth and velvetg.
For more detailed information about Velvet refer to:
Paper (cite when publishing results)
To run on the Coeus cluster load the module (refer to Example SLURM sbatch Scripts section below). When the module is loaded you can copy test data sets and useful scripts to your home directory with the following commands:
> module load Biosciences/Velvet/1.2.10
Velveth takes in a number of sequence files, produces a hashtable, then outputs two files in an output directory (creating it if necessary), Sequences and Roadmaps, which are necessary to velvetg. The syntax is as follows:
> velveth /scratch/$USER/output_directory hash_length [-file_format][-read_type] filename
The hash length, also known as k-mer length, corresponds to the length, in base pairs, of the words being hashed. Refer to website for more information on choosing hash length.
(refer to manual for full list)
fasta(default)
fastq
fasta.gz
fastq.gz
sam
(refer to manual for full list)
short (default)
shortPaired
long (for Sanger, 454 or even reference sequences)
Running the following for command line help:
> velveth | less
Velvetg is the core of Velvet where the de Bruijn graph is built then manipulated. Note that although velvetg saves some files during the process to avoid useless recalculations, the parameters are not saved from one run to the next.
The syntax for running velvetg is as follows:
> velvetg /scratch/$USER/output_directory hash_length [[-file_format][-read_type] filename]
(refer to manual for comprehensive list)
-cov_cutoff <floating-point|auto> : removal of low coverage nodes AFTER tour bus or allow the system to infer it
-ins_length <integer> : expected distance between two paired end reads (default: no read pairing)
-read_trkg <yes|no> : tracking of short read positions in assembly (default: no tracking)
-min_contig_lgth <integer> : minimum contig length exported to contigs.fa file (default: hash length * 2)
-amos_file <yes|no> : export assembly to AMOS file (default: no export)
-exp_cov <floating point|auto> : expected coverage of unique regions or allow the system to infer it
-long_cov_cutoff <floating-point> : removal of nodes with low long-read coverage AFTER tour bus
Running the following for command line help:
> velvetg | less
To use Velvet on the Coeus cluster you must submit a job through the SLURM job scheduler. To do so create a script. These jobs are based on an example from the The Velvet Manual
sub_velvet_ex1.sh:
#!/bin/bash
#SBATCH --job-name Velvet_example
#SBATCH --partition medium
#SBATCH --output=velvet_%j
module purge
module load Biosciences/Velvet/1.2.10
srun velveth /scratch/$USER/velvet_example 21 -shortPaired $VELVET_DATA/test_reads.fa
srun velvetg /scratch/$USER/velvet_example
srun velvetg /scratch/$USER/velvet_example -cov_cutoff 5 -read_trkg yes -amos_file yes
Then submit the sbatch script:
> sbatch sub_velvet_ex1.sh
Velvet has been compiled to utilize OpenMP. Note that OpenMP allows the use of multiple CPUs on a single node, not the use of multiple nodes. Refer to the below example script:
#!/bin/bash
#SBATCH --job-name Velvet_OPENMP_example
#SBATCH --nodes 1
#SBATCH --partition medium
#SBATCH --output=velvet_openmp_%j
module purge
module load Biosciences/Velvet/1.2.10
export OMP_THREAD_LIMIT=7
export OMP_NUM_THREADS=6
srun velveth /scratch/$USER/velvet_example 21 -shortPaired $VELVET_DATA/test_reads.fa
srun velvetg /scratch/$USER/velvet_example
srun velvetg /scratch/$USER/velvet_example -cov_cutoff 5 -read_trkg yes -amos_file yes