Post date: Mar 30, 2016 11:5:48 PM
I am working in king:/uufs/chpc.utah.edu/common/home/u6000989/data/callosobruchus/genome_lucigen/
I modified the file information to specify the correct (inward) orientation of the jumping libraries. I also corrected the mean insert size for the fragment library. The new files look like this:
in_groups.csv
group_name, library_name, file_name
cmacFrag, Fragment, /uufs/chpc.utah.edu/common/home/u6000989/data/callosobruchus/genome_lucigen/fragment/AHGJVMBCXX/SeedBeetleFragment*.fastq
cmac3kb, Jump3kb, /uufs/chpc.utah.edu/common/home/u6000989/data/callosobruchus/genome_lucigen/R*3kb*.fastq
cmac8kb, Jump8kb, /uufs/chpc.utah.edu/common/home/u6000989/data/callosobruchus/genome_lucigen/R*_IJS9_mates_ICC5_8KB_S1_L001_*.fastq
cmac20kb, Jump20kb, /uufs/chpc.utah.edu/common/home/u6000989/data/callosobruchus/genome_lucigen/R*_IJS9_mates_ICC5_20KB_S2_L001_*.fastq
in_libs.csv
library_name, project_name, organism_name, type, paired, frag_size, frag_stddev, insert_size, insert_stddev, read_orientation, genomic_start, genomic_end
Fragment, cmacwgs, cmac, fragment, 1, 360, 100, , , inward , ,
Jump3kb, cmacwgs, cmac, jumping, 1, , , 3000, 500, inward , ,
Jump8kb, cmacwgs, cmac, jumping, 1, , , 8000, 500, inward , ,
Jump20kb, cmacwgs, cmac, jumping, 1, , , 20000, 500, inward , ,
I am preparing the data:
#!/bin/bash
#SBATCH --time=240:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=32
#SBATCH --account=gompert-kp
#SBATCH --partition=gompert-kp
#SBATCH --job-name=allpaths-lg
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=zach.gompert@usu.edu
ml gcc/4.9.2 allpaths-lg
## run job
cd /uufs/chpc.utah.edu/common/home/u6000989/data/callosobruchus/genome_lucigen/
PrepareAllPathsInputs.pl \
DATA_DIR=/uufs/chpc.utah.edu/common/home/u6000989/data/callosobruchus/genome_lucigen/Cmac/DATA \
PLOIDY=2 HOSTS=30
This worked much better. The insert sizes for the jumping libraries now match expectations, and about 30% of the fragment reads overlap, which could be enough. Here are the numbers (from slurm-1111107.out):
lib_name lib_ID sep dev n_reads
-------- ------ --- --- -------
Fragment 0 58 100 315228010
lib_name lib_ID sep dev n_reads
-------- ------ --- --- -------
Jump3kb 0 2750 500 22246424
Jump8kb 1 7752 500 12035108
lib_name lib_ID sep dev n_reads
-------- ------ --- --- -------
Jump20kb 0 19776 500 9331270
===================================================================================================================================
prefix_out, n_groups, n_libraries, reads_in, fraction, reads_out, coverage, phys_cov, overlap, min_mean_max_len, mean_Qs
frag_reads_orig, 1, 1, 315228010, 100.0 %, 315228010, -, -, 28 %, [ 151 151 151], (35 32)
jump_reads_orig, 2, 2, 34281532, 100.0 %, 34281532, -, -, , [ 26 125 223], (36 33)
long_jump_reads_orig, 1, 1, 9331270, 100.0 %, 9331270, -, -, , [ 26 112 223], (36 35)
===================================================================================================================================
I started the assembly with FIX_LOCAL = True (this failed with Lycaeides, but is worth trying again). The submission script is qsub_runassem_cmac.sh, with these contents:
#!/bin/bash
#SBATCH --time=480:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=32
#SBATCH --account=gompert-kp
#SBATCH --partition=gompert-kp
#SBATCH --job-name=allpaths-lg
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=zach.gompert@usu.edu
ml gcc/4.9.2 allpaths-lg
## run job
cd /uufs/chpc.utah.edu/common/home/u6000989/data/callosobruchus/genome_lucigen/
basedir="/uufs/chpc.utah.edu/common/home/u6000989/data/callosobruchus/genome_lucigen"
RunAllPathsLG \
PRE=${basedir}\
REFERENCE_NAME=Cmac\
DATA_SUBDIR=DATA\
RUN=RUN\
SUBDIR=assem30March16\
TARGETS=standard\
HAPLOIDIFY=False \
MIN_CONTIG=250 \
THREADS=32\
OVERWRITE=True\
FIX_LOCAL=True\
| tee -a ${basedir}/$0.out
Summary of results:
Libraries statistics tables:
Table 1: library names, number of pairs (N), original (L0) and new sizes (L)
--------------------------------------------------------------------------
id library name num pairs N orig size L0 new size L
--- --------------------- ------------ ----------------- -----------------
0 Jump3kb 1601815 3303 +/- 768 3924 +/- 1284
1 Jump8kb 1174813 4587 +/- 1537 6464 +/- 2496
tot total 2776628
--------------------------------------------------------------------------
Table 2: fraction of reads in each length interval
---------------------------------------------------------------------------
id <L> L < 0 0-500 500-1k 1k-2k 2k-4k 4k-8k 8k-16k >16k
--- ----- ------- ------- ------- ------- ------- ------- ------- -------
0 3924 1.5% 2.3% 51.7% 44.1% 0.3%
1 6464 0.8% 0.6% 0.6% 1.6% 11.7% 58.2% 26.3%
---------------------------------------------------------------------------
Table 3: number of bridging links over a specific gap size
--------------------------------------------------------------------
id <L> <= 0 0 1k 2k 3k 4k 6k 8k 12k 16k
--- ----- ---- ----- ----- ----- ----- ----- ----- ----- ----- -----
0 3924 10 7 5 3 1
1 6464 1% 13 11 9 7 5 2 1
tot 23 18 14 10 6 2 1
--------------------------------------------------------------------
------------------ ErrorCorrectJump -> long_jump_reads_ec.fastb
6.87 % of jump reads pairs that are error corrected
------------------ AllPathsReport -> assembly_stats.report
1000 contig minimum size for reporting
203403 number of contigs
216.4 number of contigs per Mb
119214 number of scaffolds
673766454 total contig length
939793913 total scaffold length, with gaps
5.1 N50 contig size in kb
28 N50 scaffold size in kb
35 N50 scaffold size in kb, with gaps
126.85 number of scaffolds per Mb
1935 median size of gaps in scaffolds
871 median dev of gaps in scaffolds
27.64 % of bases in captured gaps
0.93 % of bases in negative gaps (after 5 devs)
85.94 %% of ambiguous bases
6.34 ambiguities per 10,000 bases
------------------ LibCoverage -> library_coverage.report
LibCoverage table:
LEGEND
n_reads: number of reads in input
%_used: % of reads assembled
scov: sequence coverage
n_pairs: number of valid pairs assembled
pcov: physical coverage
type lib_name lib_stats n_reads %_used scov n_pairs pcov
frag Fragment 36 +/- 100 315,228,010 54.5 37.5 102,442,113 56.0
jump Jump3kb 3024 +/- 768 22,246,424 28.2 1.1 1,466,793 12.6
jump Jump8kb 4354 +/- 1537 12,035,108 30.3 0.7 880,473 11.4
jump === total === 34,281,532 28.9 1.8 2,347,266 24.0
long_jump Jump20kb 19776 +/- 500 9,331,270 0.2 0.0 399 0.0