On "triangulation" in genome scans

Post date: Feb 7, 2014 3:52:55 PM

Cross-post from the Molecular Ecologist on Feb. 2, 2014

On “triangulation” in genome scans

A major goal of evolutionary biology is to understand the genetic basis for adaptation to heterogeneous environments. Rapid advances in technology are allowing a large amount of sequence data to be collected (mostly in the form of single nucleotide polymorphisms: SNPs), presenting us with an unprecedented opportunity to address this question in non-model species on a genome scale.

A major challenge for genome scans is to determine whether patterns of genetic variation are due to the effects of selection versus neutral processes such as genetic drift and demography.

In this post, I will introduce the concept of triangulation* in genome-scans: the process of gathering more than one independent source of evidence for the inference of loci under selection. (Disclaimer: I’m thinking about long-lived, non-model organisms here, where recombinant inbred lines, knocking-out genes, or complementation tests would not be feasible). Although recent reviews have highlighted the importance of integrating multiple types of data, analyses, and experiments to uncover the loci responsible for adaptation (Barrett and Hoekstra 2011, Scheinfeldt and Tishkoff 2013), there are still relatively few studies that have achieved this integration.

How can one plan a study such that genome-scan analyses can be considered independent?

First, let’s consider the two most common types of genome scans for single-nucleotide polymorphisms (SNPs) in non-model organisms:

The FST outlier test: FST is a measure of genetic differentiation among populations. Outliers are loci that are more different in their allele frequencies when compared to the rest of the genome, and thus may explain adaptive differences among populations.

The Genetic-Environment Association (GEA): A measure of the correlation between allele frequencies (in populations or individuals) and an environmental axis, usually modeled with allele frequencies as the response variable and genotype as a predictor variable.

Let’s say a number of individuals were collected from heterogeneous environments on the landscape. Some SNPs were significant both in an FST outlier analysis and a GEA. Would we consider these SNPs to have two independent sources of evidence?

Read more...