Tex code for making a flow diagram of the data analysis for next-generation sequencing, including quality control and uploading to public data servers.
Published version available here https://f1000research.com/articles/5-2644/v3
A compromise between basic and contrived styles.
\documentclass{standalone}
For setting page margins.
\usepackage{adjustbox}
Stops auto-hyphenation at line-breaks.
\usepackage[none]{hyphenat}
For making flow-charts.
\usepackage{tikz}
\usetikzlibrary{calc,trees,positioning,arrows,chains,shapes.geometric, decorations.pathreplacing,decorations.pathmorphing,shapes, matrix,shapes.symbols}
This defines the box properties. I've set the fill to red but when I re-use the function for later sections, fill is re-specified.
\tikzset{
>=stealth', punktchain/.style = {
rectangle, rounded corners,
fill = red!06, draw = black,
text width = 12em, minimum height = 1em,
text centered, on chain},
every join/.style = {->},
decoration = {brace},
tuborg/.style = {decorate},
tubnode/.style = {midway, left = 3pt, text width = 2cm},}
\begin{document} \trimbox{-.5cm -.5cm -.5cm -.5cm}{
\begin{tikzpicture}
[node distance = .5cm, start chain = going below,]
This says to define a box, named e.g. seqdat and labelled Download via ftp, and then to join to the next box automatically, with the branches defined as branch = venstre.
\node[punktchain, join] (seqdat) {Download via ftp};
\node[punktchain, join] (qcseq) {Quality control on fastq data \textit{FastQC}, \textit{R}};
\begin{scope}[start branch = venstre, every join/.style = {<-, shorten <= 0.75pt}]
\node[punktchain, on chain = going right, join = by {->}, text width=6em] (fasup) {Upload to NCBI SRA (1.0TB)}; \end{scope}
\begin{scope}[start branch = venstre, every join/.style = {<-, shorten <= 0.75pt}]
\node[punktchain, on chain = going left, join = by {->}, text width=6em] (fasdis) {\small{Discard poor quality samples}}; \end{scope}
\node[punktchain, join] (cleanseq) {Remove adaptor \& low quality sequence \textit{Ea-Utils}};
\node[punktchain, join] (map) {Map reads to reference genome \textit{BWA},\textit{Stampy}};
\node[punktchain, join, fill = green!06] (readgrp) {\raggedright{Add read-group info \\ Remove duplicate reads \\ Build index file \\}\textit{Picard}};
\node[punktchain, join, fill = green!06] (remap2) {Re-map around indels\\ \textit{GATK}};
\node[punktchain, join, fill = green!06] (qcbam) {Bam file QC\\ \textit{GATK}, \textit{Picard}, \textit{R}};
\begin{scope}[start branch = venstre, every join/.style = {<-, shorten <= 0.75pt}]
\node[punktchain, on chain = going right, join = by {->}, text width = 6em, fill = green!06] (bamup) {Upload to NCBI SRA (0.8TB)}; \end{scope}
\begin{scope}[start branch = venstre, every join/.style = {<-, shorten <= 0.75pt}]
\node[punktchain, on chain = going left, join = by {->}, text width=6em, fill = green!06] (bamdis) {\small{Discard poor quality samples}}; \end{scope}
\node[punktchain, join, fill = green!06] (geno) {Variant discovery \& genotyping};
By now the diagram looks like this :
\node [punktchain, below left of = geno, node distance = 2cm, xshift = -1cm, yshift = -0.7cm, fill = blue!06] (hcall) {SNPs and indels:\\\textit{GATK Haplotype Caller}\\\raggedright{Genotype each bam\\Combine gvcfs\\Genotype combined gvcf\\}};
\node [punktchain, below right of = geno, node distance = 2cm, xshift = 1cm, yshift = -0.7cm, fill=blue!06] (gstrip) {Structural variants:\\ \textit{Genomestrip CNV pipeline} \raggedright{Pre-processing\\Discovery\\Genotyping\\}};
[node distance = 0.5cm, start chain = going below,]
\node [punktchain, below left of = gstrip, node distance = 2cm, xshift = -1cm, yshift = -.9cm, fill = blue!06] (genoqc) {Separate QC, annotation \& formatting \textit{GATK}, \textit{vcfTools}, \textit{SNPeff}, \textit{R}};
\begin{scope}[start branch = venstre, every join/.style = {<-, shorten <= 0.75pt}]
\node[punktchain, on chain = going right, text width = 6em, fill = blue!05](genoup) {Upload to NCBI dbVar/dbSNP (13GB)};\end{scope}
\begin{scope}[start branch = venstre, every join/.style = {<-, shorten <= 0.75pt}]
\node[punktchain, on chain = going left, text width = 6em, fill = blue!05](genodis) {\small{Discard poor quality variants and samples}};\end{scope}
% \node[punktchain, fill = blue!06] (combine) {Combine genotype data from \textit{HaplotypeCaller} \& \textit{Genomestrip}};
% \begin{scope}[start branch = venstre, every join/.style = {<-, shorten <= 0.75pt}]
% \node[punktchain, on chain = going right, join = by {->}, text width = 6em, fill = blue!06](comup) {Upload to Zenodo, Fly-var};\end{scope}
The north-south terms refer to which side of the box the connectors meet at.
\draw[->] (geno.south) |-+(0.5,-1em)-| (hcall.north);
\draw[->] (geno.south) |-+(0.5,-1em)-| (gstrip.north);
\draw[->] (gstrip.south) |-+(0,-1em)-| ([xshift=2pt]genoqc.north);
\draw[->] (hcall.south) |-+(0.5,-1em)-| ([xshift=-2pt]genoqc.north);
\draw[->] ([yshift=1pt]genoqc.west) -- (genodis.east);
\draw[->] ([yshift=-1pt]genoqc.west) -- (genodis.east);
\draw[->] ([yshift=1pt]genoqc.east) -- (genoup.west);
\draw[->] ([yshift=-1pt]genoqc.east) -- (genoup.west);
% \draw[->] ([xshift=-2pt]genoqc.south) -- (combine.north);
% \draw[->] ([xshift=2pt]genoqc.south) -- (combine.north);
\draw[tuborg, decoration={brace}] let \p1=(seqdat.north), \p2=(map.south) in ($(-5.5, \y2)$) -- ($(-5.5, \y1)$) node[tubnode] {\centering{Sequencing data \textit{*.fastq}}};
\draw[tuborg, decoration={brace}] let \p1=(readgrp.north), \p2=(geno.south) in ($(-5.5, \y2)$) -- ($(-5.5, \y1)$) node[tubnode] {Sequence alignment data \textit{*.bam}};
\draw[tuborg, decoration={brace}] let \p1=(gstrip.north), \p2=(genoqc.south) in($(-5.5, \y2)$) -- ($(-5.5, \y1)$) node[tubnode] {Genotype data \textit{*.vcf}};
\end{tikzpicture}}
\end{document}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: t
%%% End:
Structure and style adapted from Rasmus Pank Roulund http://www.texample.net/tikz/examples/assignment-structure/ and also from http://tex.stackexchange.com/questions/269900/tikz-flowchart-nodes-needed-refinement.
Implemented in the on-line Latex server, Overleaf https://www.overleaf.com/
Published version available here https://f1000research.com/articles/5-2644/v3
Tex code file availble here https://zenodo.org/record/168582
25th March 2018
Print-out and my suggested modifications to an early version. Burning at top was accidental.