Post date: Jan 27, 2020 9:56:21 AM
Tasks done:
- Learning more about Bayesian statistics:
https://www.youtube.com/watch?v=XepXtl9YKwc&list=PL6MQD2QFTurIrzvdRjC1knbofkNiuxryB&index=1
- Read Discriminant analysis of principal components: a new method for the analysis of genetically structures populations
Jombart et al. 2010
Summary:
This paper presents anew method to infer genetical structure of a population. Most of the models applied use Bayesian clustering methods such as STUCTURE and BAPS. They allow to identify genetic clusters under explicit population genetics model. Estimation of large number of parameters can require a lot of computational time when analyzing large datasets. PCA has been suggested as an alternative to Bayesian Clustering algorithms. Main asset of PCA: identify genetic structure in very large datasets within negligible computational time, no assumption on the genetic model. PCA focuses on both within and between group variability, while the asses the relationship between different clusters, the method should focus on between-group variability only. This is the rationale of Discriminant Analysis (DA). DA has a lot of limitations when applied to genetic data: the number of variables has to be less than the number of individuals, hampered by correlations between variables. It is thus hard to apply it to genetic data. The method they present retains all assets of DA without the limitations: DAPC = Discrimnant Analysis of Principal Components. DAPC uses a PCA to transform the data as a first step, ensures that the data given to DA are perfectly uncorrelated, and numbers is less than obs individuals. K-means clusteing is used when groups are unknown. The implementation is available in the adegenet package in R.