Notes

July 27, 2020

The pick_open_reference_otus.py script perform open-reference OTU picking in QIIME1. This script has several options, including --min-otu-size, which refers to "The minimum otu size (in number of sequences) to retain the otu [default: 2]". For a long time, I thought that this would imply the absence of singletons in the resulting OTU table. Indeed, the script produces a biom file (otu_table_mc2_w_tax.biom) that does not contain singletons (using -n 2 in filter_otus_from_otu_table.py produces the same number of OTUs).

HOWEVER, thanks to a reviewer in a paper about the fecal microbiota in pet birds, I noticed something strange that I thought needed to be shared with everyone in the bioinformatics world. Basically, if you split the OTU table that results from the open approach, say to include only half the number of samples thus producing filtered OTU tables A and B, and you run again filter_otus_from_otu_table.py with -n 2, you will discard some singletons. The only possible explanation for this is the existence of singletons on BOTH sets of data (A and B) BUT that are not considered singletons in the original OTU table because they appeared at least twice. I honestly never thought about this.

QIIME1 is unfortunately not supported anymore but there are people that are still interested in this tool.