POBAM

Philosophy of Biology at the Mountains

Aleta Quinn

Species in the Time of Big Data

In recent decades, systematics has benefitted from the dramatic development of new tools for obtaining (Hillis 1996) and analyzing (Ronquist and Huelsenbeck 2003, Swofford 2002) genome-scale data. This development has not slowed - if anything, advances are accelerating, with powerful new tools for inferring population structure (Huelsenbeck et al. 2011), phylogenies (Liu et al. 2009), and genome sequences (Jennings 2016). With the availability of big data and sophisticated computational methods comes the tendency to use the latest techniques rapidly and too often without adequate critical analysis. The motivation for speed is strong: species and ecosystems are disappearing rapidly, and the legal framework for conservation is tightly tied to recognition of species and subspecies. On the professional side, the pressures of the job market and career benchmarks generate enormous pressure to use the latest methods and to generate positive results. In this paper I focus criticism and caution about phylogenetic inference, particularly with respect to hypothesizing species.

The plethora of available methods enables researchers to run multiple preliminary analyses and choose to pursue those methods that show positive results. One danger is that the statistical expectation is that running 14 studies at p=.05 will likely yield one positive result in the absence of any real signal. Another danger is that the researcher may not be adequately familiar with the assumptions of the model chosen in this fashion. Indeed, the rapid proliferation of methods has made it virtually impossible for any single researcher to keep up.

End-users of taxonomy are frustrated with taxonomic instability and the plurality of sources of taxonomic authority. To some extent, these features reflect the necessary back and forth of open science. However, the back and forth must adhere to norms that enable the healthy functioning of science (Longino 1990). One worry is that the problems outlined above compromise the ability of the community to uptake constructive criticism. I show that researcher error, methodological flaws, and conceptual criticism have been conflated in recent debates about the appropriate uses of the multi-species coalescent.

I emphasize the dangers with respect to species delimitation in particular. At present, MSC-based methods applied to species delimitation rely on several assumptions known to be problematic. Recent simulation (Sukumaran and Knowles 2017) and empirical (Chan et al. 2017) work shows that one common MSC-based model is prone to over-estimating species. I show that these problems stem from the tacit assumption that the General Lineage Concept (de Queiroz 1999) is sufficient to delimit species.

In this paper I aim to analyze these worries via accounts of scientific pluralism (Longino 2002; Mitchell 2003; Tabery 2014). I anticipate that an integrative pluralist account best captures the way in which systematists operating in (more or less) distinct theoretical frameworks converge on hypotheses and criticisms about particular empirical claims.

My two normative aims are (1) to critique particular procedures for publishing claims about cryptic species diversity and (2) to specify what species delimitation requires beyond the General Lineage Concept. The particular empirical studies I critique are those that use Structurama for species delimitation (a procedure that it was not designed for) and those that infer diversity based on mitochondrial barcoding (a procedure that barcoding was not designed for, and which has not been adequately tested empirically). My analysis of the General Lineage Concept will draw on Dupré’s (2015) process ontology.


Works Cited

Chan, K.O., A.M. Alexander, L.L. Grismer, Y.C. Su, J.L. Grismer, E.S.H. Quah, R.M. Brown. 2017. Species delimitation with gene flow: A methodological comparison and population genomics approach to elucidate cryptic species boundaries in Malaysian Torrent Frogs. Molecular Ecology 26(20): 5435-5450.

Dupré, J. 2015. A process ontology for biology. Physiology News 33-34.

Hillis, D. 1996. Nucleic acids, IV: Sequencing and cloning. Pp. 321-381 in Molecular Systematics (D.M. Hillis, C. Moritz, and B.K. Mable, eds.), 2nd ed. Sinauer Associates, Sunderland, MA.

Ronquist, F., and J. Huelsenbeck. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19(12): 1572-1574.

Huelsenbeck, J., P. Andolfatto, and E.T. Huelsenbeck. 2011. Structurama: Bayesian inference of population structure. Evolutionary Bioinformatics 7: S6761.

Jennings, W.B. 2016. Phylogenomic Data Acquisition: Principles and Practice. CRC Press, Boca Raton, FL.

Liu, L., Yu, L., Kubatko, L., Pearl, D. K., and Edwards, S. V. 2009. Coalescent methods for estimating phylogenetic trees. Molecular Phylogenetics and Evolution 53(1): 320-328.

Longino, H. 2002. The Fate of Knowledge. Princeton University Press, Princeton, NJ.

Mitchell, S. 2003. Biological Complexity and Integrative Pluralism. Cambridge University Press, Cambridge.

de Queiroz, K. 1999. The general lineage concept of species and the defining properties of the species category. Pp. 49-89 in Species: New Interdisciplinary Essays (R.A. Wilson, ed.). MIT Press.

Sukumaran, J., and L.L. Knowles 2017. Multispecies coalescent delimits structure, not species. PNAS 114(7): 1607-1612.

Swofford, D.L. 2002. Paup*: Phylogenetic Analysis Using Parsimony (*and other methods). Sinauer Associates, Sunderland, MA.

Tabery, J. 2014. Pluralism, social action, and the causal space of human behavior. Metascience 23: 443-459.