Research

We sequenced and assembled the first draft genome of the plant Mesua Ferrea

Mesua ferrea (Family: Calophyllaceae) is a tropical forest plant used for timber, biofuel, and traditional medicine. Colloquially, it is known as Nagkesar (Cobra saffron) and is the state flower of Tripura (India). You can read more about this in our paper "The genome sequence of Mesua ferrea and comparative demographic histories of forest trees".

Loss of PLGRKT gene in Galliformes. Outline images of representative species from each of the five galliform families (i.e., Phasianidae, Odontophoridae, Numididae, Cracidae, and Megapodiidae) that have lost the PLGRKT gene are shown in the top row. Regions of the gene that are lost have been demarcated by colored dotted lines. Exon 2 (shown by the brown dotted line) is missing in all galliform species, and exon 1 (shown by the red dotted line) is subsequently lost in Odontophoridae. In addition to the missing exon 1 and exon 2, exon 3 is truncated (shown by the orange dotted line) in Phasianidae and Numididae. Multiple sequence alignment of PLGRKT exon 3 and exon 4 amino acid sequences depicting exon 3 truncation in Phasianidae and Numididae species is shown in the expanded view

2. Gene Loss

Variation in the gene content between species contributes to phenotype changes. Such changes largely occur through gene duplication and gene loss events. We have demonstrated the presence of such gene loss events in birds and yeast. The loss of the PLGRKT gene in chicken and few other birds was an interesting find in a vertebrate system. More details can be found in the paper "Evidence for the loss of plasminogen receptor KT gene in chicken".

Gene loss has impacted human evolution and shaped us to be the species that we are today. 

Bootstrapped PSMC results after masking of repeats in Populus trichocarpa. PSMC curves for Populus trichocarpa showing the robustness of changes due to masking of repeats. Masked (blue) and unmasked (orange-red) shows completely distinctive trajectories whereas unmasking only LTR-Gypsy repeat class (pink) also shows a marked difference. The second y-axis (red) shows the Coefficient of variation (CV) across the bootstraps across all the repeat classes. This indicates changes in Ne due to repeats are robust to bootstrap replications.

3. Computational biology methods evaluation

"We used the Populus trichocarpa genome (Pop_tri_v3) to show that masking of repeat regions leads to lower estimates of effective population size (Ne) in the distant past in contrast to an increase in Ne estimates in recent times. However, in human datasets, masking of repeats resulted in lower estimates of Ne at all time points. We demonstrate that repeats affect demographic inferences using diverse methods like PSMC, MSMC, SMC++, and the Stairway plot. Our genomic analysis revealed that the biases in Ne estimates were dependent on the repeat class type and its abundance in each atomic interval. Notably, we observed a weak, yet consistently significant negative correlation between the repeat abundance of an atomic interval and the Ne estimates for that interval, which potentially reflects the recombination rate variation within the genome.". More details can be found in the paper "Repetitive genomic regions and the inference of demographic history".

The evolutionary history of the COA1 gene in various clades. Different colours represent different events. The blue colour represents the functional COA1 gene. The pink colour represents the COA1 in evolutionary break-point (EBR). The green colour represents the duplication event of the COA1 gene, and the red colour represents the independent loss of the COA1 gene.

4. Evolutionary history of the COA1/MITRAC15 gene

"Skeletal muscle fibers rely upon either oxidative phosphorylation or glycolytic pathway to achieve muscular contractions that power mechanical movements. Species with energy-intensive adaptive traits that require sudden bursts of energy have a greater dependency on fibers that use the glycolytic pathway. Glycolytic fibers have decreased reliance on OXPHOS and lower mitochondrial content compared to oxidative fibers. Hence, we hypothesized that adaptive gene loss might have occurred within the OXPHOS pathway in lineages that largely depend on glycolytic fibers. The protein encoded by the COA1/MITRAC15 gene with conserved orthologs found in budding yeast to humans promotes mitochondrial translation. We show that gene disrupting mutations have accumulated within the COA1/MITRAC15 gene in the cheetah, several species of galliforms, and rodents. The genomic region containing COA1/MITRAC15 is a well-established evolutionary breakpoint region in mammals. Careful inspection of genome assemblies of closely related species of rodents and marsupials suggests two independent COA1/MITRAC15 gene loss events co-occurring with chromosomal rearrangements. Besides recurrent gene loss events, we document changes in COA1/MITRAC15 exon structure in primates and felids. The detailed evolutionary history presented in this study reveals the intricate link between skeletal muscle fiber composition and dispensability of the chaperone-like role of the COA1/MITRAC15 gene.". More details can be found in the pre-print "Recurrent erosion of COA1/MITRAC15 demonstrates gene dispensability in oxidative phosphorylation".

Comparative demographic history of forest trees PSMC inferred Ne trajectories of 15 forest plant species with bootstrap support. Rectangles at the top show the periods with important predicted glaciation events. Betula pendula species exhibit a highly discordant trajectory compared to all other species. Whereas, tropically distributed species have a common trend of decline during and after Mid-Pleistocene glaciations. Some of the species, such as Faidherbia albida were able to recover from these bottlenecks and might reflect their adaptation to dryer environments. However, most of the other plants were not able to recover from the same.

5. Demographic history reconstruction

"We collate the genomic datasets of 14 additional forest tree species to compare the temporal dynamics of Effective Population Size (Ne) and find evidence of a substantial bottleneck in all tropical forest plants during Mid-Pleistocene glaciations..". More details can be found in the paper "The genome sequence of Mesua ferrea and comparative demographic histories of forest trees".

6. Population genetics of chicken (Gallus gallus)

Population structure analysis: (A) Geographical locations of different BBC breeds used in this study are shown on the map using different shapes. The map was generated using rworldmap, map, and mapdata R packages. (B) Genome-wide principal component analysis reveals the genetic relationship of 34 BBC individuals. PC1 and PC2 explained 18.9% and 13% variance, respectively. (C) Population genetic structure and individual ancestry were estimated using NGSadmix for 34 BBCs from different breeds based on best K = 7.

"Black-bone chicken (BBC) meat is popular for its distinctive taste and texture. A complex chromosomal rearrangement at the fibromelanosis (Fm) locus on the 20th chromosome results in increased endothelin-3 (EDN3) gene expression and is responsible for melanin hyperpigmentation in BBC. We use public long-read sequencing data of the Silkie breed to resolve high-confidence haplotypes at the Fm locus spanning both Dup1 and Dup2 regions and establish that the Fm_2 scenario is correct of the three possible scenarios of the complex chromosomal rearrangement. The relationship between Chinese and Korean BBC breeds with Kadaknath native to India is underexplored. Our data from whole-genome re-sequencing establish that all BBC breeds, including Kadaknath, share the complex chromosomal rearrangement junctions at the fibromelanosis (Fm) locus. We also identify two Fm locus proximal regions (∼70 Kb and ∼300 Kb) with signatures of selection unique to Kadaknath. These regions harbor several genes with protein-coding changes, with the bactericidal/permeability-increasing-protein-like gene having two Kadaknath-specific changes within protein domains. Our results indicate that protein-coding changes in the bactericidal/permeability-increasing-protein-like gene hitchhiked with the Fm locus in Kadaknath due to close physical linkage. Identifying this Fm locus proximal selective sweep sheds light on the genetic distinctiveness of Kadaknath compared to other BBC.". More details can be found in the paper "Decoding the fibromelanosis locus complex chromosomal rearrangement of black-bone chicken: genetic differentiation, selective sweeps and protein-coding changes in Kadaknath chicken".

We analyze large genomic datasets to understand the patterns of evolution and the processes driving these patterns. To answer questions related to this, we use computational approaches to obtain new insights.

How does the accumulation of genetic changes along the genome affect the phenotype? What is the role of changes in gene expression and the regulatory program in evolution? When during evolution did various novelties arise and diversify? Evolution of RNA modifications (like splicing and editing) provide interesting new insights about the genotype-phenotype link. We are pursuing such questions using computational approaches, including generating new datasets.

In addition to this, we are focused on exploring new strategies and tools to analyze large-scale datasets (not necessarily genomic).