Posters

Workflows_Community_v5.pdf

Building a Bioinformatics Community

Workflows Community of Practice, Wellcome Genome Campus

23-12-04 NextFlow NGSCheckMate.pdf

NGSCheckMate: A tool to verify sample identities

Simon Pearce, Principal Bioinformatician, Cancer Biomarker Centre, Cancer Research UK

While every precaution should be made, it is almost inevitable that sample mislabelling will occur with large enough datasets. Mislabelling can occur in a range of ways: sample collection, DNA/RNA preparation, library construction, sequencer indexing, as well as bioinformatically. This issue is magnified when working in multi-omics, or across institutions. Errors can lead to incorrect results, especially at low sample numbers, where a single mislabelling can obliterate statistical power.

Here I discuss the tool NGSCheckMate (not my own), which is able to verify whether samples match using a set of exonic SNPs, and works across RNA-seq, WES, (shallow) WGS, ChIP-seq etc. The tool is designed for human samples, although would work for other species with a suitable set of SNPs, and can be used to verify replicates, tumour/normal pairing and longitudinal sampling.

I have implemented nf-core modules and subworkflows for NGSCheckMate, for both FASTQ and BAM modes, ready for community use.

Genomeassembly_Nextflow_Symposium_Dec_2023.pdf

Highlights on the Sanger Tree of Life Assembly pipeline in Nextflow DSL2

Ksenia Krasheninnikova, Senior Bioinformatician, Tree of Life, Wellcome Sanger Institute

On average 25 genome assemblies are generated in Tree of Life Assembly team per week. The current production pipeline is implemented as a collection of vr-runner and wr scripts performing calls to software wrapped into singularity images. The pipeline consists in generating a genome assembly from the PacBio HiFi data, purging it from the haplotigs, further scaffolding using the HiC data and identification of organelles. The main steps are supported by various utilities such as generating kmer database and gathering assembly statistics. In order to facilitate open access and reusability of the pipeline, as well as to increase its scalability, work has been done to implement this pipeline into nextflow DSL2 language using the nf-core workflow template. The talk will give an overview of the pipeline structure, its current pre-release status and planned features.

24.11.2023_ascc_poster3.pdf

A pipeline for detecting cobionts and contaminants in genome assemblies

Eerik Aunin, Senior Bioinformatician, Tree of Life, Wellcome Sanger Institute

In the Tree of Life assembly team of the Sanger Institute we are working on a Nextflow pipeline for taxonomic identification of sequences in assembled genomes. The pipeline is called Assembly Screen for Cobionts and Contaminants (ASCC). It contains Tiara, read mapping, BLAST and Kraken2 against the NCBI nt database, Diamond BLASTX against the NCBI nr and Uniprot databases, kmer counting and dimensionality reduction, CobiontID, the BUSCO-based BlobToolKit Snakemake pipeline, FCS-GX, FCS-adaptor, VecScreen, PacBio barcodes check and BLAST for detecting organellar sequences. All individual components of the pipeline are optional. The results of a run are collected as a BlobToolKit dataset and CSV tables. There is a fully functional prototype of the pipeline that consists of Python scripts tied together by a Nextflow master script. We are in the process of creating a newer version of the pipeline that follows the nf-core standards.

TreeVal-SymposiumPoster.pdf

TreeVal: Modernising manual curation

Damon-Lee B Pointon, Bioinformatician, Tree of Life, Wellcome Sanger Institute

In order for the Tree of Life Project (ToL) to scale up and generate reference quality genomes for the ~70,000 species that inhabit the UK and Northern Ireland, ToL must undergo a systemic change in the processes and pipelines they have used in the past decade of work. TreeVal is the first step in this; it is a pipeline with the goal of replacing and improving upon the gEVAL genome browser which, since the Human Genome Project, has generated the data required for the manual curation of genomic assemblies.

This improvement is achieved by the use of Nextflow DSL2, the NF-Core standards as well as the JBrowse2 genome browser. The extensible nature of these technologies allows for a high degree of customisation, and the open-source nature of NF-Core pipelines and Jbrowse2 allows for wider community engagement and collaboration.

nf-scautoqc_poster.pdf

nf-scautoqc: Adapting the automatic single-cell RNAseq QC workflow into a Nextflow pipeline

Batuhan Cakir, Bioinformatician, Cellular Genetics, Wellcome Sanger Institute

Genome After Party.pdf

Genome After Party: Standardised analysis for automated publications

Priyanka Surana, Principal Bioinformatician, Tree of Life, Wellcome Sanger Institute