Nanopore assembly of plasmid sequences: a twitter story
A nice twitter story on plasmids assembly using single-molecule Nanopore reads
Stefano Campanaro @campanarostef; Maria Silvia Morlino @morlino_silvia
Department of Biology, University of Padova (Italy), June 2nd 2022
On May 2022 Maria Silvia Morlino (@morlino_silvia) has done in our lab a first attempt of sequencing some plasmids on MinION and performing the assembly with Flye (Kolmogorov M et al., 2019). We realized that the size of the assemblies was overestimated in comparison to what was expected. On the June 1st 2022 I tweeted the following message in order to collect some information regarding the assembly of plasmids using long nanopore reads:
“Did someone try to assemble #plasmid sequences from #Nanopore reads only? Any suggestion on the #assembly #software? With #Flye we obtain assemblies larger than expected
@nanopore” (https://twitter.com/campanarostef/status/1531878718586736641)
The message was visualized more than 14000 times in a couple of days and I received many useful suggestions to solve this issue. I thought this was a very hot topic and I decided, just for fun, to collect all the information and to resume them all in this document. I marked my comments in bold blue.
Answers (in order of appearance – limited to those reported within 2nd June 2022)
6 h: Callum @CallumJCParr
answering to @campanarostef and @nanopore
Ask for information to @gringene_bio
6 h: David Eccles @gringene_bio
I have my own protocol for this using Canu for assembly, pre-filtering for the most accurate reads:
https://www.protocols.io/view/plasmid-sequence-analysis-from-long-reads-36wgq4n5yvk5/v7
This is a very detailed protocol to perform all the steps of the procedure starting from the DNA extraction up to the final assembly.
8 h: Nick Vereecke @methenickname
answering to @rpetit3, @campanarostef and @nanopore
Manual trimming is in many/most cases still required for plasmid sequences. So automated pipelines are still rather difficult
14 h: Alessandro Garritano @Alegarritano
answering to @campanarostef and @nanopore
Ask for information to @marwan_majzoub check this out
15 h: Sumeet Tiwari @skt_genomics
In risposta a @campanarostef and @nanopore
May be this will work
B-assembler: a circular bacterial genome assembler
Very interesting paper, I was not knowing this tool https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-022-08577-7
20 h: Chris Fields @cjfields
answering to @campanarostef, @mike_schatz and @nanopore
We can vouch for this, it does help. Tho some low-freq ambiguities appeared possibly biologically relevant (eg seq depth-related ambiguity around possible viral/plasmid integration sites
20 h: Raúl Llera @xrllera
answering to @campanarostef e @nanopore
Unicycler would be my first choice
20 h: Raúl Llera @xrllera
and do not overkill! extremely high coverage could be problematic
21 h: Belén Prado @BelenPradoV
answering to @campanarostef e @nanopore
@liss_salinast
Ask for suggestions to Liseth Salinas
21 h: Keith Robison @OmicsOmicsBlog
answering to @campanarostef e @nanopore
There have been some posts where with long Nanopore reads you will sometimes see replication intermediates that are multimers of the plasmid
22 h: Francesco Emiliani @fe_emiliani
answering to @campanarostef e @nanopore
We solve that issue in our pipeline! We have a little tool called DupScoop and also turns out that the ends of the assemblies do not polish well so it helps to rotate it 50% and do a last round of polishing
https://github.com/mckennalab/Circuitseq
Another interesting tool I was not knowing, CircuitSeq is a pipeline to assemble and analyze plasmids from Nanopore long-read sequencing.
23 h: George Bouras @GB13Faithless
answering to @campanarostef e @nanopore
Trycycler
23 h: Connor Brown @env_biochem
answering to @campanarostef e @nanopore
canu has always worked well for us. Some parameter optimization may be required
23 h: Michael Schatz @mike_schatz
answering to @campanarostef e @nanopore
make sure to downsample to 50x coverage or so. Assemblers tend to get really confused if you have excess coverage
23 h: Robert A. Petit III, PhD @rpetit3
answering to @campanarostef e @nanopore
I have Dragonflye (https://github.com/rpetit3/dragonflye) for Nanopore sequences. It allows assembly and polishing with multiple tools, with a similar experience to using Shovill.
Shovill (https://github.com/tseemann/shovill) is a pipeline which uses SPAdes at its core, but alters the steps before and after the primary assembly step to get similar results in less time.
The other tool, Dragonflye is a pipeline to easily assemble Oxford Nanopore reads.
But if you have the time for the manual steps, I also suggest using Trycycler for a better overall assembly
Main steps:
Estimate genome size and read length from reads (unless –gsize provided) (kmc)
Filter reads by length (default –minreadlength 1000) (Nanoq)
Reduce FASTQ files to a sensible depth (default –depth 150) (rasusa)
Remove adapters (requires –trim be given) (Porechop)
Assemble with Flye, Miniasm, or Raven
Polish assembly with Racon and/or Medaka
Polish assembly with short reads via Polypolish and/or Pilon
Remove contigs that are too short, too low coverage, or pure homopolymers
Produce final FASTA with nicer names and parsable annotations
Output parsable assembly statistics (assembly-scan)
June 1st: Rauf Salamzade @SalamMicrobes
Trycycler by Ryan Wick offers a great framework to use multiple long-read assemblers, including Flye, and then consolidate allowing for detection of false duplications which might be causing your larger than expected plasmid sizes
https://github.com/rrwick/Trycycler/wiki/Clustering-contigs
A very intersting procedure from Ryan Wick, the goal of this step is to cluster the contigs of your input assemblies into per-replicon groups. It also serves to exclude any spurious, incomplete or badly misassembled contigs.
June 1st: Adelme Bazin @axbazin
answering to @campanarostef e @nanopore
I've been using https://github.com/epi2me-labs/wf-clone-validation which uses multiple runs of canu+ trycycler, I've not seen larger than expected sizes with this approach but sometimes the assembled plasmids are missing bits
This repository contains a nextflow workflow that can be used for de novo assembly of plasmid sequences from Oxford Nanopore data.
June 1st: Christian Gallardo @MrGalloLA
Can second the comment on unicycler. For our HIV plasmids, which contain an inverted repeat at each end of insert (ie LTR), aggressive size filtering prior to assembly is critical (though this depends heavily on the library prep used)
June 1st : Jan Gawor @gaworj
answering to @campanarostef e @nanopore
You can also try Unicycler in ont-only mode. Another option is Raven. Flye assembler tends to produce concatemers for plasmids or mito genomes.
Final comments
What I understood from this story is that, when you perform an assembly using Nanopore reads, you must be very careful, in particular when you are dealing with plasmids or viruses. Apparently, there is an agreement on some crucial aspects. The first one is that an excessive coverage can deteriorate the assembly quality, and it is better to downsample the reads to 50-150x coverage. Another important aspect is to clean the assembly very carefully with multiple tools. This is probably due to the fact that many software are not optimzied for plasmid assembly. The presence of replication intermediates, representing multimers of the plasmid, and the presence of false duplications is also problematic. Finally, many colleagues suggested to use Trycycler in combination with other tools.
I would really thank all the colleagues that helped with comments and suggestions to solve this issue related to Nanopore assembly.
Aknowledgments (to the tweeters I was able to identify)
- Adelme Bazin @axbazin https://scholar.google.it/citations?hl=en&user=OgXjLxsAAAAJ
- Alessandro Garritano @Alegarritano https://scholar.google.it/citations?hl=en&user=conXSckAAAAJ
- Belén Prado @BelenPradoV https://scholar.google.it/citations?hl=en&user=YrwLSOsAAAAJ
- Chris Fields @cjfields https://twitter.com/cjfields
- Christian Gallardo @MrGalloLA https://scholar.google.it/citations?hl=en&user=sl5K0CsAAAAJ
- Connor Brown @env_biochem https://scholar.google.it/citations?hl=en&user=-9C-SUMAAAAJ
- David Eccles @gringene_bio https://scholar.google.it/citations?hl=en&user=qDIvWKgAAAAJ
- Francesco Emiliani @fe_emiliani https://scholar.google.it/citations?hl=en&user=ja6145oAAAAJ
- George Bouras @GB13Faithless https://scholar.google.it/citations?hl=en&user=cjk3fbkAAAAJ
- Jan Gawor @gaworj https://twitter.com/gaworj
- Keith Robison @OmicsOmicsBlog http://omicsomics.blogspot.com/
- Michael Schatz @mike_schatz https://scholar.google.it/citations?hl=en&user=rcs6IKwAAAAJ
- Nick Vereecke @methenickname https://twitter.com/methenickname
- Sumeet Tiwari @skt_genomics https://twitter.com/skt_genomics
- Raúl Llera @xrllera https://scholar.google.it/citations?hl=en&user=jBis0t4AAAAJ
- Rauf Salamzade @SalamMicrobes https://scholar.google.it/citations?hl=en&user=OBPpZq4AAAAJ
- Robert A. Petit III, PhD @rpetit3 https://scholar.google.it/citations?hl=en&user=sBSRYTkAAAAJ
References
Emiliani FE, Hsu I, McKenna A. Circuit-seq: Circular reconstruction of cut in vitro transposed plasmids using Nanopore sequencing. bioRxiv, 2022 doi: https://doi.org/10.1101/2022.01.25.477550
Huang F, Xiao L, Gao M, Vallely EJ, Dybvig K, Atkinson TP, Waites KB, Chong Z. B-assembler: a circular bacterial genome assembler. BMC Genomics. 2022 May 11;23(Suppl 4):361. doi: 10.1186/s12864-022-08577-7. PMID: 35546658; PMCID: PMC9092672.
Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol. 2019 May;37(5):540-546. doi: 10.1038/s41587-019-0072-8. Epub 2019 Apr 1. PMID: 30936562.
Kokot M, Dlugosz M, Deorowicz S. KMC 3: counting and manipulating k-mer statistics. Bioinformatics. 2017 Sep 1;33(17):2759-2761. doi: 10.1093/bioinformatics/btx304. PMID: 28472236.
Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, Earl AM. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014 Nov 19;9(11):e112963. doi: 10.1371/journal.pone.0112963. PMID: 25409509; PMCID: PMC4237348.
Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol. 2017 Jun 8;13(6):e1005595. doi: 10.1371/journal.pcbi.1005595. PMID: 28594827; PMCID: PMC5481147.
Wick RR, Holt KE. Polypolish: Short-read polishing of long-read bacterial genome assemblies. PLoS Comput Biol. 2022 Jan 24;18(1):e1009802. doi: 10.1371/journal.pcbi.1009802. PMID: 35073327; PMCID: PMC8812927.
https://www.protocols.io/view/plasmid-sequence-analysis-from-long-reads-36wgq4n5yvk5/v7
https://github.com/rpetit3/dragonflye
https://github.com/tseemann/shovill
https://github.com/rrwick/Trycycler/wiki/Clustering-contigs
https://github.com/epi2me-labs/wf-clone-validation
https://github.com/esteinig/nanoq
https://github.com/rrwick/Porechop
https://github.com/lh3/miniasm
https://github.com/lbcb-sci/raven
https://github.com/isovic/racon
https://github.com/nanoporetech/medaka
https://github.com/rpetit3/assembly-scan