Overview
The Pipeline for Cancer Inference PiCnIc is our attempt at devise an effective pipeline to extract ensemble-level cancer progression models from cross-sectional data. The pipeline is versatile, modular and customizable and exploits state-of-the-art data processing and machine learning tools to: 
  1.  identify tumor subtypes and then in each subtype;
  2. select (epi)genomic events driving the progression;
  3. identify groups of events that are likely to be observed as mutually exclusive;
  4. infer progression models from groups and such data, and annotate them with associated statistical confidence.
The pipeline was first described in our paper:
and is naturally implemented within TRONCO.

The main steps of PicNiC


Motivation
All these steps are necessary to minimize the confounding effects of inter-tumor heterogeneity, which are likely to lead to wrong results when data is not appropriately pre-processed.

In each stage of PicNiC different techniques can be employed, alternatively or jointly, according to specific research goals, input data, and cancer type. Prior knowledge can be easily accommodated into our pipeline, as well as appropriate computational tools . The rationale is similar in spirit to workflows implemented by consortia such as TCGA to analyze huge populations of cancer samples. 

One of the main novelties of our approach, is the exploitation of groups of exclusive alterations as a proxy to detect fitness-equivalent trajectories of cancer progression. This is only possible by the hypothesis-testing features of our recently developed CAPRI algorithm, an algorithm uniquely addressing this crucial aspect of the ensemble-level progression inference problem.



Which tools

The tools that PicNiC can exploit are of different nature, and we plan to include the of TRONCO to interface with them as far as our case studies are developedWe are happy to receive suggestions about tools that you would like to use with this pipeline, and accept your contribution towards this effort.

The current version of TRONCO supports input/output toward these tools:
  • Network Based Stratification (NBS),  a method for stratification (clustering) of patients in a cancer cohort based on genome scale somatic mutations measurements and a gene interaction network.
  • MUTEX, a method for the identification of sets of mutually exclusive gene alterations in a given set of genomic profiles by scanning the groups of genes with a common downstream effect on the signaling network.