Post date: Sep 18, 2014 4:24:52 PM
The Sierra Nevada genome assembly stalled out on the FixLocal module (no CPU usage, but no evidence of waiting for I/O). Based on the ALLPATHS-LG faq, it appears that this module is not necessary, and thus I ran the assembly without it: FIX_LOCAL=False. This isn't necessarily ideal, but I don't have a way around this for now.
The assembly (minus FIX_LOCAL) finished and is in /labs/evolution/data/lycaeides/whole_genomes/Lsierra/DATA/RUN/ASSEMBLIES/assem15sept14/.
Things look relatively good. The median scaffold size for this assembly was 198 kb (171 kb excluding Ns) versus 62 kb for the L. melissa genome (44 kb excluding Ns). The total scaffold is also slightly larger (3.61 Mb vs. 3.48 Mb). Perhaps most interestingly, we used a much greater proportion and total number of the jumping library reads. The sierra nevada genome assembly used 103 million fragment reads (78.9% of the total we had) and 10 million 3kb reads (31.9% of what we had). In contrast, we used only 47.4% of the fragment reads for L. melissa (though we had more total reads, such that the total number used is similar), and only 7.5% of the 3kb reads (3.4 million). The L. melissa genome included 5 and 10 kb reads too, but even if you add these together we didn't use as many of them as we used 3 kb reads this time. So, it looks like we used more of our data (in total, and proportionally more) and ended up with a better assembly.
Here is a summary:
------------------ FindErrors -> frag_reads_edit.fastb
279766696 total number of original fragment reads
101.0 mean length of original fragment reads in bases
37.4 % gc content of fragment reads
0.1 % of bases pre-corrected
392404533 estimated genome size in bases
40.0 % genome estimated to be repetitive (at K=25 scale)
54 estimated genome coverage by fragment reads
0.21 estimated standard deviation of sequencing bias (at K=25 scale)
88.4 % of bases confirmed in cycle 0
0.26 % of bases corrected in cycle 0
0.01 % of bases with conflicting corrections in cycle 0
88.8 % of bases confirmed in cycle 1
0.12 % of bases corrected in cycle 1
0.01 % of bases with conflicting corrections in cycle 1
------------------ CleanCorrectedReads -> frag_reads_corr.25mer.kspec
1.1 % of reads removed because of low frequency kmers
------------------ FillFragments -> filled_reads.fastb
93.3 % of fragment pairs that were filled
------------------ SamplePairedReadStats -> jump_reads_filt.outies
Paired Read Separation Stats:
Lib OrigSep NewSep NewDev 3sigma% %NonJumps %ReadsAlgnd
Jump3kb 3798 2571 378 93 0 28
------------------ ErrorCorrectJump -> jump_reads_ec.fastb
34.19 % of jump reads pairs that are error corrected
------------------ SamplePairedReadDistributions -> jump_reads_ec.distribs
Libraries statistics tables:
Table 1: library names, number of pairs (N), original (L0) and new sizes (L)
--------------------------------------------------------------------------
id library name num pairs N orig size L0 new size L
--- --------------------- ------------ ----------------- -----------------
0 Jump3kb 14514418 2773 +/- 378 2850 +/- 489
--------------------------------------------------------------------------
Table 2: fraction of reads in each length interval
---------------------------------------------------------------------------
id <L> L < 0 0-500 500-1k 1k-2k 2k-4k 4k-8k 8k-16k >16k
--- ----- ------- ------- ------- ------- ------- ------- ------- -------
0 2850 0.1% 0.4% 3.6% 95.4% 0.3%
---------------------------------------------------------------------------
Table 3: number of bridging links over a specific gap size
--------------------------------------------------------------------
id <L> <= 0 0 1k 2k 3k 4k 6k 8k 12k 16k
--- ----- ---- ----- ----- ----- ----- ----- ----- ----- ----- -----
0 2850 126 82 39 5
tot 126 82 39 5
--------------------------------------------------------------------
------------------ AllPathsReport -> assembly_stats.report
1000 contig minimum size for reporting
58959 number of contigs
163.2 number of contigs per Mb
7123 number of scaffolds
267604627 total contig length
361180183 total scaffold length, with gaps
7.6 N50 contig size in kb
171 N50 scaffold size in kb
198 N50 scaffold size in kb, with gaps
19.72 number of scaffolds per Mb
833 median size of gaps in scaffolds
141 median dev of gaps in scaffolds
22.15 % of bases in captured gaps
0.32 % of bases in negative gaps (after 5 devs)
13.04 %% of ambiguous bases
11.84 ambiguities per 10,000 bases
------------------ LibCoverage -> library_coverage.report
LibCoverage table:
LEGEND
n_reads: number of reads in input
%_used: % of reads assembled
scov: sequence coverage
n_pairs: number of valid pairs assembled
pcov: physical coverage
type lib_name lib_stats n_reads %_used scov n_pairs pcov
frag Fragment -23 +/- 30 279,768,018 78.9 78.1 103,378,948 79.9
jump Jump3kb 2571 +/- 378 145,526,672 31.9 16.4 10,767,667 126.4
------------------ Memory and CPU usage
64 available cpus
1009.9 GB of total available memory
2453.7 GB of available disk space
73.39 hours of total elapsed time
30.44 hours of total per-module elapsed time
445.14 hours of total per-module user time
14.62 effective parallelization factor
149.17 GB memory usage peak
--------------------------------------------------------------------------------
Thu Sep 18 09:37:30 2014 run on lycaeides-k0625, pid=3166 [Sep 12 2013 13:59:01 R47547 ]
AllPathsReport PRE=/labs/evolution/data/lycaeides/whole_genomes \
DATA=Lsierra/DATA RUN=RUN SUBDIR=assem15sept14 ASSEMBLY=final \
MM=True _MM_INTERVAL=10 _MM_SUMMARY=False \
_MM_OUT=/labs/evolution/data/lycaeides/whole_genomes/Lsierra/DA \
TA/RUN/ASSEMBLIES/assem15sept14/makeinfo/assembly_stats.report. \
mm.AllPathsReport
--------------------------------------------------------------------------------
Redirecting standard output to the following files:
/labs/evolution/data/lycaeides/whole_genomes/Lsierra/DATA/RUN/ASSEMBLIES/assem15sept14/assembly_stats.report
/labs/evolution/data/lycaeides/whole_genomes/Lsierra/DATA/RUN/ASSEMBLIES/assem15sept14/makeinfo/assembly_stats.report.out.AllPathsReport
/labs/evolution/data/lycaeides/whole_genomes/Lsierra/make_log/DATA/RUN/assem15sept14/2014-09-18T09:18:04/AllPathsReport.out
PERFSTAT: contig minimum size for reporting [ap_report_min_contig] = 1000
PERFSTAT: number of contigs [n_contigs] = 58959
PERFSTAT: number of contigs per Mb [contigs_per_Mb] = 163.2
PERFSTAT: number of scaffolds [n_scaffolds] = 7123
PERFSTAT: total contig length [contig_length] = 267604627
PERFSTAT: total scaffold length, with gaps [scaff_length_gap] = 361180183
PERFSTAT: N50 contig size in kb [N50_contig] = 7.6
PERFSTAT: N50 scaffold size in kb [N50_scaffold] = 171
PERFSTAT: N50 scaffold size in kb, with gaps [N50_scaff_gap] = 198
PERFSTAT: number of scaffolds per Mb [scaff_per_Mb] = 19.72
PERFSTAT: median size of gaps in scaffolds [median_gap] = 833
PERFSTAT: median dev of gaps in scaffolds [median_gap_dev] = 141
PERFSTAT: % of bases in captured gaps [frac_captured_gaps] = 22.15
PERFSTAT: % of bases in negative gaps (after 5 devs) [frac_negative_gaps] = 0.32
PERFSTAT: %% of ambiguous bases [amb_base_frac] = 13.04
PERFSTAT: ambiguities per 10,000 bases [ambiguity_frac] = 11.84