Tex code for making a flow diagram of the data analysis for next-generation sequencing, including quality control and uploading to public data servers.
Published version available here https://f1000research.com/articles/5-2644/v3
A compromise between basic and contrived styles.
\documentclass{standalone}For setting page margins.
\usepackage{adjustbox} Stops auto-hyphenation at line-breaks.
\usepackage[none]{hyphenat} For making flow-charts.
\usepackage{tikz} \usetikzlibrary{calc,trees,positioning,arrows,chains,shapes.geometric, decorations.pathreplacing,decorations.pathmorphing,shapes, matrix,shapes.symbols}This defines the box properties. I've set the fill to red but when I re-use the function for later sections, fill is re-specified.
\tikzset{>=stealth', punktchain/.style = { rectangle, rounded corners, fill = red!06, draw = black, text width = 12em, minimum height = 1em, text centered, on chain}, every join/.style = {->}, decoration = {brace}, tuborg/.style = {decorate}, tubnode/.style = {midway, left = 3pt, text width = 2cm},}\begin{document} \trimbox{-.5cm -.5cm -.5cm -.5cm}{ \begin{tikzpicture}[node distance = .5cm, start chain = going below,]This says to define a box, named e.g. seqdat and labelled Download via ftp, and then to join to the next box automatically, with the branches defined as branch = venstre.
\node[punktchain, join] (seqdat) {Download via ftp};\node[punktchain, join] (qcseq) {Quality control on fastq data \textit{FastQC}, \textit{R}};\begin{scope}[start branch = venstre, every join/.style = {<-, shorten <= 0.75pt}]\node[punktchain, on chain = going right, join = by {->}, text width=6em] (fasup) {Upload to NCBI SRA (1.0TB)}; \end{scope}\begin{scope}[start branch = venstre, every join/.style = {<-, shorten <= 0.75pt}]\node[punktchain, on chain = going left, join = by {->}, text width=6em] (fasdis) {\small{Discard poor quality samples}}; \end{scope}\node[punktchain, join] (cleanseq) {Remove adaptor \& low quality sequence \textit{Ea-Utils}};\node[punktchain, join] (map) {Map reads to reference genome \textit{BWA},\textit{Stampy}};\node[punktchain, join, fill = green!06] (readgrp) {\raggedright{Add read-group info \\ Remove duplicate reads \\ Build index file \\}\textit{Picard}};\node[punktchain, join, fill = green!06] (remap2) {Re-map around indels\\ \textit{GATK}};\node[punktchain, join, fill = green!06] (qcbam) {Bam file QC\\ \textit{GATK}, \textit{Picard}, \textit{R}};\begin{scope}[start branch = venstre, every join/.style = {<-, shorten <= 0.75pt}]\node[punktchain, on chain = going right, join = by {->}, text width = 6em, fill = green!06] (bamup) {Upload to NCBI SRA (0.8TB)}; \end{scope}\begin{scope}[start branch = venstre, every join/.style = {<-, shorten <= 0.75pt}]\node[punktchain, on chain = going left, join = by {->}, text width=6em, fill = green!06] (bamdis) {\small{Discard poor quality samples}}; \end{scope}\node[punktchain, join, fill = green!06] (geno) {Variant discovery \& genotyping};By now the diagram looks like this :
\node [punktchain, below left of = geno, node distance = 2cm, xshift = -1cm, yshift = -0.7cm, fill = blue!06] (hcall) {SNPs and indels:\\\textit{GATK Haplotype Caller}\\\raggedright{Genotype each bam\\Combine gvcfs\\Genotype combined gvcf\\}};\node [punktchain, below right of = geno, node distance = 2cm, xshift = 1cm, yshift = -0.7cm, fill=blue!06] (gstrip) {Structural variants:\\ \textit{Genomestrip CNV pipeline} \raggedright{Pre-processing\\Discovery\\Genotyping\\}};[node distance = 0.5cm, start chain = going below,]\node [punktchain, below left of = gstrip, node distance = 2cm, xshift = -1cm, yshift = -.9cm, fill = blue!06] (genoqc) {Separate QC, annotation \& formatting \textit{GATK}, \textit{vcfTools}, \textit{SNPeff}, \textit{R}};\begin{scope}[start branch = venstre, every join/.style = {<-, shorten <= 0.75pt}]\node[punktchain, on chain = going right, text width = 6em, fill = blue!05](genoup) {Upload to NCBI dbVar/dbSNP (13GB)};\end{scope}\begin{scope}[start branch = venstre, every join/.style = {<-, shorten <= 0.75pt}]\node[punktchain, on chain = going left, text width = 6em, fill = blue!05](genodis) {\small{Discard poor quality variants and samples}};\end{scope}% \node[punktchain, fill = blue!06] (combine) {Combine genotype data from \textit{HaplotypeCaller} \& \textit{Genomestrip}};% \begin{scope}[start branch = venstre, every join/.style = {<-, shorten <= 0.75pt}]% \node[punktchain, on chain = going right, join = by {->}, text width = 6em, fill = blue!06](comup) {Upload to Zenodo, Fly-var};\end{scope}The north-south terms refer to which side of the box the connectors meet at.
\draw[->] (geno.south) |-+(0.5,-1em)-| (hcall.north);\draw[->] (geno.south) |-+(0.5,-1em)-| (gstrip.north);\draw[->] (gstrip.south) |-+(0,-1em)-| ([xshift=2pt]genoqc.north);\draw[->] (hcall.south) |-+(0.5,-1em)-| ([xshift=-2pt]genoqc.north);\draw[->] ([yshift=1pt]genoqc.west) -- (genodis.east);\draw[->] ([yshift=-1pt]genoqc.west) -- (genodis.east);\draw[->] ([yshift=1pt]genoqc.east) -- (genoup.west);\draw[->] ([yshift=-1pt]genoqc.east) -- (genoup.west);% \draw[->] ([xshift=-2pt]genoqc.south) -- (combine.north);% \draw[->] ([xshift=2pt]genoqc.south) -- (combine.north); \draw[tuborg, decoration={brace}] let \p1=(seqdat.north), \p2=(map.south) in ($(-5.5, \y2)$) -- ($(-5.5, \y1)$) node[tubnode] {\centering{Sequencing data \textit{*.fastq}}};\draw[tuborg, decoration={brace}] let \p1=(readgrp.north), \p2=(geno.south) in ($(-5.5, \y2)$) -- ($(-5.5, \y1)$) node[tubnode] {Sequence alignment data \textit{*.bam}};\draw[tuborg, decoration={brace}] let \p1=(gstrip.north), \p2=(genoqc.south) in($(-5.5, \y2)$) -- ($(-5.5, \y1)$) node[tubnode] {Genotype data \textit{*.vcf}};\end{tikzpicture}}\end{document}%%% Local Variables: %%% mode: latex%%% TeX-master: t%%% End: Structure and style adapted from Rasmus Pank Roulund http://www.texample.net/tikz/examples/assignment-structure/ and also from http://tex.stackexchange.com/questions/269900/tikz-flowchart-nodes-needed-refinement.
Implemented in the on-line Latex server, Overleaf https://www.overleaf.com/
Published version available here https://f1000research.com/articles/5-2644/v3
Tex code file availble here https://zenodo.org/record/168582
25th March 2018
Print-out and my suggested modifications to an early version. Burning at top was accidental.