NSF recommends our proposal to Test the Tests for funding

Post date: Jul 7, 2017 2:10:04 PM

Testing the tests: a predictive framework to guide genome scans for locally adapted traits

This research is in collaboration with Sam Yeaman at the University of Calgary and Matt Fitzpatrick at the University of Maryland.

Today, biologists are able to ascertain massive amounts of DNA sequence data from many species, including humans. This data has been used to analyze the genetic basis of species traits in thousands of diverse studies, with a particular focus on understanding traits that are adapted to the local environment. Current statistical methods for analyzing this data, known as genome scans, are limited because they are only designed to detect obvious patterns. However, mathematical models predict that more subtle, yet predictable, patterns will evolve for many traits that are common in nature. The genetic basis of these traits may not be detectable by widely used genome-scan methods. There are promising new approaches, however, that may be able to detect these more subtle patterns. This research project aims to “test the tests:” to evaluate genome scan methods in a common framework against simulated data. Results will provide new insights into how to implement tests and summarize results so researchers can more effectively study the genetic basis of species traits. Since genome scans have been widely applied in medicine, agriculture, and animal breeding, a better application of these tests can lead to measureable improvements in human lives. To engage persons at different levels of understanding in our research, we will develop a training and outreach program in Genomics, Evolution, Mathematical Modeling and Analysis (GEMMA).

We will develop a robust framework, grounded in quantitative genetic theory, to guide the creation of a novel set of simulated datasets spanning monogenic to highly polygenic architectures. In phase 1 of the project, we will ask how adding realism affects the evolution of genetic architecture and the extent to which populations are adapted to their local environment. Then, we will examine the extent to which the results from univariate and multivariate genome-scan approaches (differentiation outlier tests, association tests, and haplotype-based tests) agree and are accurate. More likely than not, no one method will be ideal for all architectures. Therefore, in phase 2 of the project, we will develop approaches for integrating signals from multiple tests to detect outliers in multivariate space, thereby leveraging the unique strengths of different methods. Because we study these evolutionary processes with a particular focus on the design, implementation, and interpretation of genome scans for polygenic traits, results from this research will allow more accurate characterization of the genetic variation responsible for locally adapted traits.