Post date: Feb 11, 2015 7:31:16 PM
I used ALLPATHS-LG to assemble the alfalfa genome. There were two steps
qsub prepareData.sh
qsub qsub_runassem_alfalfa.sh
Here are the assembly details from the second file
basedir="/labs/evolution/data/alfalfa/genome"
RunAllPathsLG \
PRE=${basedir}\
REFERENCE_NAME=Msativa\
DATA_SUBDIR=DATA\
RUN=RUN\
SUBDIR=assem1feb15\
TARGETS=standard\
HAPLOIDIFY=True \
MIN_CONTIG=250 \
THREADS=48\
OVERWRITE=True\
FIX_LOCAL=False\
| tee -a ${basedir}/$0.out
The results are in /labs/evolution/data/alfalfa/genome/Msativa/. Here is a summary (note, these numbers are on par with the L. melissa genomes, but not as good as the L. sierra genome).:
------------------ AllPathsReport -> assembly_stats.report
1000 contig minimum size for reporting
100386 number of contigs
149.2 number of contigs per Mb
41319 number of scaffolds
341814662 total contig length
672807206 total scaffold length, with gaps
5.5 N50 contig size in kb
40 N50 scaffold size in kb
37 N50 scaffold size in kb, with gaps
61.41 number of scaffolds per Mb
1395 median size of gaps in scaffolds
128 median dev of gaps in scaffolds
41.37 % of bases in captured gaps
0.66 % of bases in negative gaps (after 5 devs)
13.36 %% of ambiguous bases
11.05 ambiguities per 10,000 bases
------------------ LibCoverage -> library_coverage.report
LibCoverage table:
LEGEND
n_reads: number of reads in input
%_used: % of reads assembled
scov: sequence coverage
n_pairs: number of valid pairs assembled
pcov: physical coverage
type lib_name lib_stats n_reads %_used scov n_pairs pcov
frag Fragment -22 +/- 30 364,168,368 58.6 51.7 118,217,102 85.0
jump Jump3kb1 3011 +/- 195 85,075,706 21.4 4.5 2,438,963 28.5
jump Jump3kb2 3006 +/- 200 84,442,370 21.5 4.5 2,446,003 28.5
jump Jump5kb1 4960 +/- 277 76,553,550 15.0 2.8 1,228,158 24.4
jump Jump5kb2 4957 +/- 273 76,183,580 15.1 2.8 1,221,116 24.3
jump === total === 322,255,206 18.4 14.5 7,334,240 105.7
------------------ Memory and CPU usage
64 available cpus
1009.9 GB of total available memory
562.9 GB of available disk space
213.53 hours of total elapsed time
148.63 hours of total per-module elapsed time
726.98 hours of total per-module user time
4.89 effective parallelization factor
331.96 GB memory usage peak
Wed Feb 11 06:12:18 2015 : ALLPATHS-LG Pipeline Finished.
Run directory: /labs/evolution/data/alfalfa/genome/Msativa/DATA/RUN
Log directory: /labs/evolution/data/alfalfa/genome/Msativa/make_log/DATA/RUN/assem1feb15/