Nanopore assembly of plasmid sequences: a twitter story

A nice twitter story on plasmids assembly using single-molecule Nanopore reads

Stefano Campanaro @campanarostef; Maria Silvia Morlino @morlino_silvia

Department of Biology, University of Padova (Italy), June 2nd 2022

On May 2022 Maria Silvia Morlino (@morlino_silvia) has done in our lab a first attempt of sequencing some plasmids on MinION and performing the assembly with Flye (Kolmogorov M et al., 2019). We realized that the size of the assemblies was overestimated in comparison to what was expected. On the June 1st 2022 I tweeted the following message in order to collect some information regarding the assembly of plasmids using long nanopore reads:

Did someone try to assemble #plasmid sequences from #Nanopore reads only? Any suggestion on the #assembly #software? With #Flye we obtain assemblies larger than expected

@nanopore” (https://twitter.com/campanarostef/status/1531878718586736641)

The message was visualized more than 14000 times in a couple of days and I received many useful suggestions to solve this issue. I thought this was a very hot topic and I decided, just for fun, to collect all the information and to resume them all in this document. I marked my comments in bold blue.


Answers (in order of appearance – limited to those reported within 2nd June 2022)

6 h: Callum @CallumJCParr

answering to @campanarostef and @nanopore

Ask for information to @gringene_bio

6 h: David Eccles @gringene_bio

I have my own protocol for this using Canu for assembly, pre-filtering for the most accurate reads:

https://www.protocols.io/view/plasmid-sequence-analysis-from-long-reads-36wgq4n5yvk5/v7

This is a very detailed protocol to perform all the steps of the procedure starting from the DNA extraction up to the final assembly.

8 h: Nick Vereecke @methenickname

answering to @rpetit3, @campanarostef and @nanopore

Manual trimming is in many/most cases still required for plasmid sequences. So automated pipelines are still rather difficult

14 h: Alessandro Garritano @Alegarritano

answering to @campanarostef and @nanopore

Ask for information to @marwan_majzoub check this out

15 h: Sumeet Tiwari @skt_genomics

In risposta a @campanarostef and @nanopore

May be this will work

B-assembler: a circular bacterial genome assembler

Very interesting paper, I was not knowing this tool https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-022-08577-7

20 h: Chris Fields @cjfields

answering to @campanarostef, @mike_schatz and @nanopore

We can vouch for this, it does help. Tho some low-freq ambiguities appeared possibly biologically relevant (eg seq depth-related ambiguity around possible viral/plasmid integration sites

20 h: Raúl Llera @xrllera

answering to @campanarostef e @nanopore

Unicycler would be my first choice

20 h: Raúl Llera @xrllera

and do not overkill! extremely high coverage could be problematic

21 h: Belén Prado @BelenPradoV

answering to @campanarostef e @nanopore

@liss_salinast

Ask for suggestions to Liseth Salinas

21 h: Keith Robison @OmicsOmicsBlog

answering to @campanarostef e @nanopore

There have been some posts where with long Nanopore reads you will sometimes see replication intermediates that are multimers of the plasmid

22 h: Francesco Emiliani @fe_emiliani

answering to @campanarostef e @nanopore

We solve that issue in our pipeline! We have a little tool called DupScoop and also turns out that the ends of the assemblies do not polish well so it helps to rotate it 50% and do a last round of polishing

https://github.com/mckennalab/Circuitseq

Another interesting tool I was not knowing, CircuitSeq is a pipeline to assemble and analyze plasmids from Nanopore long-read sequencing.

23 h: George Bouras @GB13Faithless

answering to @campanarostef e @nanopore

Trycycler

23 h: Connor Brown @env_biochem

answering to @campanarostef e @nanopore

canu has always worked well for us. Some parameter optimization may be required

23 h: Michael Schatz @mike_schatz

answering to @campanarostef e @nanopore

make sure to downsample to 50x coverage or so. Assemblers tend to get really confused if you have excess coverage

23 h: Robert A. Petit III, PhD @rpetit3

answering to @campanarostef e @nanopore

I have Dragonflye (https://github.com/rpetit3/dragonflye) for Nanopore sequences. It allows assembly and polishing with multiple tools, with a similar experience to using Shovill.

Shovill (https://github.com/tseemann/shovill) is a pipeline which uses SPAdes at its core, but alters the steps before and after the primary assembly step to get similar results in less time.

The other tool, Dragonflye is a pipeline to easily assemble Oxford Nanopore reads.

But if you have the time for the manual steps, I also suggest using Trycycler for a better overall assembly

Main steps:

  1. Estimate genome size and read length from reads (unless –gsize provided) (kmc)

  2. Filter reads by length (default –minreadlength 1000) (Nanoq)

  3. Reduce FASTQ files to a sensible depth (default –depth 150) (rasusa)

  4. Remove adapters (requires –trim be given) (Porechop)

  5. Assemble with Flye, Miniasm, or Raven

  6. Polish assembly with Racon and/or Medaka

  7. Polish assembly with short reads via Polypolish and/or Pilon

  8. Remove contigs that are too short, too low coverage, or pure homopolymers

  9. Produce final FASTA with nicer names and parsable annotations

  10. Output parsable assembly statistics (assembly-scan)

June 1st: Rauf Salamzade @SalamMicrobes

Trycycler by Ryan Wick offers a great framework to use multiple long-read assemblers, including Flye, and then consolidate allowing for detection of false duplications which might be causing your larger than expected plasmid sizes

https://github.com/rrwick/Trycycler/wiki/Clustering-contigs

A very intersting procedure from Ryan Wick, the goal of this step is to cluster the contigs of your input assemblies into per-replicon groups. It also serves to exclude any spurious, incomplete or badly misassembled contigs.

June 1st: Adelme Bazin @axbazin

answering to @campanarostef e @nanopore

I've been using https://github.com/epi2me-labs/wf-clone-validation which uses multiple runs of canu+ trycycler, I've not seen larger than expected sizes with this approach but sometimes the assembled plasmids are missing bits

This repository contains a nextflow workflow that can be used for de novo assembly of plasmid sequences from Oxford Nanopore data.

June 1st: Christian Gallardo @MrGalloLA

Can second the comment on unicycler. For our HIV plasmids, which contain an inverted repeat at each end of insert (ie LTR), aggressive size filtering prior to assembly is critical (though this depends heavily on the library prep used)

June 1st : Jan Gawor @gaworj

answering to @campanarostef e @nanopore

You can also try Unicycler in ont-only mode. Another option is Raven. Flye assembler tends to produce concatemers for plasmids or mito genomes.


Final comments

What I understood from this story is that, when you perform an assembly using Nanopore reads, you must be very careful, in particular when you are dealing with plasmids or viruses. Apparently, there is an agreement on some crucial aspects. The first one is that an excessive coverage can deteriorate the assembly quality, and it is better to downsample the reads to 50-150x coverage. Another important aspect is to clean the assembly very carefully with multiple tools. This is probably due to the fact that many software are not optimzied for plasmid assembly. The presence of replication intermediates, representing multimers of the plasmid, and the presence of false duplications is also problematic. Finally, many colleagues suggested to use Trycycler in combination with other tools.

I would really thank all the colleagues that helped with comments and suggestions to solve this issue related to Nanopore assembly.


Aknowledgments (to the tweeters I was able to identify)

- Adelme Bazin @axbazin https://scholar.google.it/citations?hl=en&user=OgXjLxsAAAAJ

- Alessandro Garritano @Alegarritano https://scholar.google.it/citations?hl=en&user=conXSckAAAAJ

- Belén Prado @BelenPradoV https://scholar.google.it/citations?hl=en&user=YrwLSOsAAAAJ

- Chris Fields @cjfields https://twitter.com/cjfields

- Christian Gallardo @MrGalloLA https://scholar.google.it/citations?hl=en&user=sl5K0CsAAAAJ

- Connor Brown @env_biochem https://scholar.google.it/citations?hl=en&user=-9C-SUMAAAAJ

- David Eccles @gringene_bio https://scholar.google.it/citations?hl=en&user=qDIvWKgAAAAJ

- Francesco Emiliani @fe_emiliani https://scholar.google.it/citations?hl=en&user=ja6145oAAAAJ

- George Bouras @GB13Faithless https://scholar.google.it/citations?hl=en&user=cjk3fbkAAAAJ

- Jan Gawor @gaworj https://twitter.com/gaworj

- Keith Robison @OmicsOmicsBlog http://omicsomics.blogspot.com/

- Michael Schatz @mike_schatz https://scholar.google.it/citations?hl=en&user=rcs6IKwAAAAJ

- Nick Vereecke @methenickname https://twitter.com/methenickname

- Sumeet Tiwari @skt_genomics https://twitter.com/skt_genomics

- Raúl Llera @xrllera https://scholar.google.it/citations?hl=en&user=jBis0t4AAAAJ

- Rauf Salamzade @SalamMicrobes https://scholar.google.it/citations?hl=en&user=OBPpZq4AAAAJ

- Robert A. Petit III, PhD @rpetit3 https://scholar.google.it/citations?hl=en&user=sBSRYTkAAAAJ


References

Emiliani FE, Hsu I, McKenna A. Circuit-seq: Circular reconstruction of cut in vitro transposed plasmids using Nanopore sequencing. bioRxiv, 2022 doi: https://doi.org/10.1101/2022.01.25.477550

Huang F, Xiao L, Gao M, Vallely EJ, Dybvig K, Atkinson TP, Waites KB, Chong Z. B-assembler: a circular bacterial genome assembler. BMC Genomics. 2022 May 11;23(Suppl 4):361. doi: 10.1186/s12864-022-08577-7. PMID: 35546658; PMCID: PMC9092672.

Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol. 2019 May;37(5):540-546. doi: 10.1038/s41587-019-0072-8. Epub 2019 Apr 1. PMID: 30936562.

Kokot M, Dlugosz M, Deorowicz S. KMC 3: counting and manipulating k-mer statistics. Bioinformatics. 2017 Sep 1;33(17):2759-2761. doi: 10.1093/bioinformatics/btx304. PMID: 28472236.

Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, Earl AM. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014 Nov 19;9(11):e112963. doi: 10.1371/journal.pone.0112963. PMID: 25409509; PMCID: PMC4237348.

Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol. 2017 Jun 8;13(6):e1005595. doi: 10.1371/journal.pcbi.1005595. PMID: 28594827; PMCID: PMC5481147.

Wick RR, Holt KE. Polypolish: Short-read polishing of long-read bacterial genome assemblies. PLoS Comput Biol. 2022 Jan 24;18(1):e1009802. doi: 10.1371/journal.pcbi.1009802. PMID: 35073327; PMCID: PMC8812927.

https://www.protocols.io/view/plasmid-sequence-analysis-from-long-reads-36wgq4n5yvk5/v7

https://github.com/rpetit3/dragonflye

https://github.com/tseemann/shovill

https://github.com/rrwick/Trycycler/wiki/Clustering-contigs

https://github.com/epi2me-labs/wf-clone-validation

https://github.com/esteinig/nanoq

https://github.com/rrwick/Porechop

https://github.com/lh3/miniasm

https://github.com/lbcb-sci/raven

https://github.com/isovic/racon

https://github.com/nanoporetech/medaka

https://github.com/rpetit3/assembly-scan