The biological methods used to derive the genetic results of my study will be explained in this section. Here, I will describe the populations I used to derive genomic data and what was done to derive such results. As a reminder, one of the major knowledge gaps addressed in my study was that current literature fails to include 'Mexican populations' when discussing allele frequency (how common LP is based on total population DNA variants).
It is important to understand that whole genome sequencing (WGS) is a method scientists use to read a person's DNA. This allows many scientists to examine genetic differences accoss populations!
KEY TAKEAWAYS
Genomic data from Mexican Mayan speakers from the Bigham Genomic Lab and genomic data for MXL, IBS, and FIN were used for SNP comparative analyses to answer the question: What is the genetic prevalence of LP/LNP in Mexican populations?
Allele frequencies were derived from the Ensembl or computationally coded using Plink 1.9.
In order to fill in the knowledge gap concerning LP/LNP frequency in Mexican populations, four different genomic populations were used. These populations were Mexican Mayan speakers from Mexico, and Mexican Ancestry Populations Living in Los Angeles (MXL), Iberian Population in Spain (IBS), and Finnish (FIN) populations from the 1000 Genome Project. Mexican Mayan-speaking genomic data were derived from the Bigham Laboratory at UCLA.
This was used to identify whether or not Mexican Mayan speakers with higher indigenous ancestry had any similar genetic SNP mutations at known LP/LNP markers, such as MXL, who have a mixed European Spaniard and Indigenous ancestry. In other words, the comparison of genomic allele frequency of Mexican Mayan-speakers and MXL was used to identify if Mexican Mayan-speakers and MXL populations showed similar LP/LNP mutations or to determine patterns of inheritance.
Iberian Peninsula Populations and Finnish populations were also used comparatively against Mexican Mayan-speaking and MXL populations, as a result of European populations historically having higher rates of LP. So, comparing these groups helped examine whether MXL and Mexican Mayan populations have similar genetic patterns to each other and/or to European populations.
Mayan Speakers
48 Mayan-speaking (Tzeltal, Tzotzil, and Ch’ol)
a recruited in Palenque, Chiapas, Mexico
MXL
64 Mexican individuals living in Los Angeles, California
(from the 1000 Genomes Project)
IBS
107 Iberian Population in Spain individuals (from the 1000 Genomes Project)
FIN
99 Finnish individuals (from the 1000 Genomes Project)
In this study, Mexican Mayan-speaking genomic data at the base/DNA positions 136350481 - 136313666 (known MCM6 gene boundaries) were compared to the same positions in MXL, IBS, and FIN. It is important to note that the Mexican Mayan-speaking genomic data is the framework for the SNPs studied at the positions 136350481 - 136313666. In other words, the Mexican Mayan-speaking genomic data only has SNP (mutational) genomic data--- meaning only SNP data is presented instead of whole genomic/DNA sequencing like those in the Ensembl (1000 Genome Project online database).
So, to compare the prevalence of the MCM6 mutation in Mexican populations, I first derived the SNPs, their rsIDs, and their allele frequencies from Mexican Mayan-speaking genomic data at the base positional range known to be the MCM6 gene. Each SNP, such as those at the positions 13910, 13915, 14010, and 22018, has a unique identification called an 'rsID.' These rsIDs are used by scientists to track and compare specific genetic variants across studies. For example, the SNP at position 13910 goes by the rsID: rs4988235. The rsID can vary based on the Genome Reference Consortium Human Build-- in other words, the type of assembly version that was used for the human genome, with hg38 being the most used and recent one, and whole hg19 used in Tishkoff and colleagues (2007)being the first version of the assembly used. In this study, I used hg38 and used the same rsIDs as described by Ma and colleagues (2007) paper: − 14010: G>C (rs145946881), −13915*G (rs41380347), −13910:C>T (rs4988235), −13907*G (rs41525747), −22018*A (rs182549) (Ma et al., 2007).
After deriving all of the SNPs within the MCM6 gene boundary for Mexican Mayan-speaking genomic data, the same SNPs were derived for MXL, IBS, and FIN. The 1000 Genome Project, through Ensembl, already has the allele frequencies listed for specific SNPs; however, this is not the case for the Mexican Mayan-speaking genomic population. To derive the allele frequencies for the SNPs in this population, I used the bioinformatic/computer program PLINK 1.9. This is a program that is used to analyze WGS and genomic data through computational coding.
After all of the SNP allele frequencies for Mexican Mayan-speakers, MXL, IBS, and FIN populations were derived, comparisons of all SNP allele frequencies was done-- with a special focus to the rsIDS/SNPs: − 14010: G>C (rs145946881), −13915*G (rs41380347), −13910:C>T (rs4988235), −13907*G (rs41525747), −22018*A (rs182549) (Ma et al., 2007).