The Taino Genome Project

In the wake of progress made by the 1,000 Genome Project, with the whole genome sequences of dozens of individuals publicly available to download and analyze, it is now possible to use the information about genetic variation in human populations for studies of population processes and evolutionary history of people living in different environmental conditions, with different histories of migration, admixture, and selection for climate, food,  resistance to infectious diseases, and other local adaptations.  Characterization of human population diversity  in the context of recent evolutionary history will lead to a deeper interpretation of the biomedically relevant DNA variation brought forth by major human genome reference projects like 1,000 Genomes and others that are quickly emerging today thanks to the concerted effort of the international scientific community. 

Reconstruction of evolutionary history has been an essential objective of evolutionary biology for a long time, and some effort has already been devoted to the ancestral genome reconstructions of species, but recently the whole genome sequencing allowed the comparative analysis of genomic data and computational inference of structural composition, function, age and origins of different genomic sequences.  However, reconstruction of a genome of a population that has been extinct for several centuries through the admixture process is a relatively novel idea.  Among the obvious candidates for such reconstruction, is the genome of Tainos, the pre-Columbian population of Puerto Rico.  The Tainos have been widely believed extinct, until recent research showed that traces of Amerindian genomes are still present in the genomes of the current inhabitants of the island (Martinez-Cruzado et al. 2001; 2005).  The recovery of these trace genomes can be accomplished if origins of the Taino segments in the genome sequences within the genomes of the modern population could be identified.

The origins of the native Puerto Ricans is linked to two distinct groups of peoples.  The first one, usually referred to as the Arcaicos, has settled on the island as far back as 3,000-4,000 BC.  Later, the population has been complemented by the second wave of emigrants, probably form the basin the Orinoco in South America.  Whether in competition, cooperation or admixture between the two groups, the Taino culture was fully established by the 10th century and survived until the arrival of Columbus to the island in November of 1493.  In the time that followed, the Tainos were decimated by violence, suicide, hunger, migration, and disease, especially smallpox that killed half of their population in 1519.  During centuries of Spanish dominance, the island received several waves of European settlers, mainly from the Mediterranean (Spain, Canary islands, France, and Italy), and many thousands of Sub-Saharan African slaves were brought to do forced labor.  As the three racial groups interacted in Puerto Rico, their genomes became an admixture representing amalgamation of  genomes from the three main sources: African, European, and Amerindian (Taino). With the average native ancestry for the nuclear genomes across the island established between 16 and 23%, the maternal ancestry in this highly mixed population is 61.3% Amerindian (Burchard, et al., 2005; Tang et al., 2007; Martinez-Cruzado et al., 2005).  Even today, while the segments of the Taino genomes are highly fragmented and dispersed by generations of admixture and recobination, some individuals may possess up to 50% of the Amerindian ancestry (Mratines Cruzado, unpublished data).

We would like to propose a project where genomes of modern Puerto Ricans would be used in order to reconstruct historic sequence and variation of the genomes of the pre-Colombian inhabitants of the island.  Applying the admixture mapping approach commonly used for individual ancestry estimates, it is possible to identify the chromosomal fragments associated with each one of the three origins.  When the  Taino fragments are identified, they can be assembled into the reference sequence, ideally covering the length of the entire genome.  Using this sequence, a detailed map of differences between the Taino, and genomes of other population can be constructed, including locations and frequencies of indels, copy number polymorphisms, and population-specific SNPs.  This information will prove crucial in study of the historic local adaptations based on the six millennia of humans inhabiting the Caribbean, including adaptations to local conditions and diseases prevalent before the arrival of European settlers.  Finally, availability of this data will strengthen the studies targeting diseases with high incidence in Puerto Ricans (asthma, eczema, and high blood pressure).