Piphillin is a free tool from http://secondgenome.com/ to infer metagenomes from 16S rRNA OTU/ASV counts tables and representative sequences of each OTU.
In short Piphillin uses the 16S rDNA sequences in the FASTA file for each OTU and finds the closest matching sequence in a database of complete genomes. It then compiles all of the genes from all of the genomes represented by the OTUs into an “inferred” metagenome for each sample. There are some major caveats to this kind of analysis. First, we are predicting genes that might be there, with the assumption that we made the correct 16S to genome match. Second, even if all of the predicted genes are actually present, we do not know whether they are actually expressed in that microbial community. Third, Piphillin was optimized for human microbiome studies, and the Spearman’s correlation coefficient of the Piphillin predicted metagenome and corresponding shotgun metagenomics from environmental samples is only around 0.25, compared to >0.75 for human microbiomes.
Piphillin workflow
Describe the workflow from 16S rRNA gene sequencing to functional community profile.
Explain the limitations of PICRUSt/Piphillin analysis.
Describe the KEGG databases.
Connect inferred or predicted functional community analysis with taxonomic community analysis.
Synthesize cultivation-dependent functions of characterized isolates with metagenomic functional capabilities.
Detailed instructions can be found in the PUMAA Manual Activity 4