Research

I am a bioinformatics researcher. I use computational tools (programs and software) to provide solutions to biological questions. My main focus is genomics. I have worked on various facets of genomics - gene regulation mediated by small RNA (miRNA), population genetics, gene regulation etc. Below are brief descriptions of the various projects that I have worked on (and currently involved in).

ECHO (Epigenetic Characterization and Observation)

I am the Lead Investigator in the multi-million dollar DARPA funded project called ECHO.

Our lab is focussed on generating and analyzing single cell DNA methylation data from blood cells of individuals who have been exposed to various chemical, biological, radioactive, nuclear or explosive agents. We employ single cell (single nucleus) resolution DNA methylation profiling of various sub-types of PBMCs to profile the DNA methylation within individual cells (snmC-seq2). The SAFE (Single cell Analysis for Forensic Epigenetics) project headed by our lab, in collaboration with Greenleaf Lab at Stanford will also generate snATAC-seq data. Together these high-resolution, high-coverage dataset will enable discovery of epigenetic signatures that would be used as probes for field-deployable devices to assess the agent of exposure and its extent.

For more information about the ECHO project, please visit items here and here.

Image: DARPA-ECHO logo

PHITE (Precision High Intensity Training through Epigenetics)

I am also the Lead Investigator in another multi-million dollar Department of Defense (Office of Naval Research) funded project called PHITE.

This program aims to assess the effect of high and moderate physical training on the epigenetic profile of individuals. We perform base-resolution DNA methylation assay (methylC-seq) on blood and muscle samples from individuals who have undertaken either of the two exercise regimens. Time-course based sampling provides a unique insight into the DNA methylation dynamics.

For more information about the PHITE project, please visit items here and here.

Image Courtesy: UAB

Center of Excellence in Stem Cell Genomics

I play a lead role in the activities of the Center of Excellence in Stem Cell Genomics at the Salk Institute. In 2014, the California Institute of Regenerative Medicine (CIRM) established (through a $40 million award) two nodal centers (at Stanford and Salk) to initiate collaborative research with stem cell researchers in California. Incidentally, the co-directors are Prof. Mike Snyder (my post-doc advisor) and Prof. Joe Esker (my current boss)..!

We collaborate with four of the seven applicants selected for this program: (1) Prof. Gay Crooks at UCLA to identify and overcome transcriptome barriers to generating hematopoietic stem cells from pluripotent stem cells, (2) Prof. Guoping Fan at UCLA for genomic analysis of stem cell differentiation in human overgrowth syndrome, (3) Prof. Kelly Frazer at UCSD for population wide study of functional genomics of drug induced electrophysical phenotypes in human cardiomyocites and (4) Prof. Benoit Bruneau at UCSF to study the epigenomics of human cardiac differentiation and congenital heart disease.

For more information about Salk Center of Excellence in Stem Cell Genomics, please visit Salk CESCG website here.

For more information about California's Stem Cell program and several interesting information about stem cells and medical applications, please click here.

DNA Methylation and Transcriptome Profiling (part of ENCODE3)

At the Ecker lab, where I been working since Oct 2012, we study the widespread effect of DNA methylation in gene regulation. This is a specific chemical modification where a methyl group is added at the 5th carbon on the normal "C" (cytosine) of DNA. This is a normal biological phenomenon mediated by enzymes and is a stably inherited process. This is also one of a major "epigenetic" modification. For more info on epigenetics, please click here.

Below are four areas where I am involved in the effort to understand the various complexities of gene regulation mediated by DNA methylation and other factors:

a) Exploring the role of epigenetic factors in gene regulation dynamics during mouse development - this project is part of the ENCODE (phase 3) funded by the NHGRI. Human and mouse have a close resemblance during early developmental stages – morphologically and at the molecular level (fig 1). Several severe birth defects in humans [like orofacial clefts (which includes clefts of the lip and/or palate) and congenital heart defects] manifest at these early stages of development and have a concurrent timeline in mouse. Since it is difficult to obtain tissue samples from human fetuses, mouse tissues serve as surrogates to study the various molecular trajectories involved during development. We, along with collaborators at UCSD and the Lawrence Berkeley National Laboratory, are looking at various transcriptomic and epigenetic changes during development that in turn control the overall molecular aspects of gene regulation. We are building connections using these datasets that would eventually reveal cornerstones that govern normal development of an embryo to an adult.

b) Studying the role of DNA methylation on gene expression and long-term effects of neuropsychiatric disorders in a schizophrenia mouse model. This is research work in progress and I'll not be able to expand more on this topic at the stage, on a public webpage. Please contact me if you would like to learn more.

c) Elucidating the role of small RNA in embryonic stem cell and their derived cell types. This is research work in progress and I'll not be able to expand more on this topic at the stage, on a public webpage. Please contact me if you would like to learn more.

Figure 1

Image courtesy: Prof. Bing Ren, UCSD

Functional Annotation of the Human Genome (part of ENCODE2)

I was first exposed to the ENCODE project or the ENCyclopedia Of Dna Elements during my tenure as a postdoc at Michael Snyder's lab at Stanford University (and Yale). This is an international consortium project aimed to fully annotate (decode the functional aspects) the Human Genome.

Various regions of the genome have specific roles: serving as a master templates for synthesis of proteins (gene regions), regions where controlling proteins can bind (transcription factor binding sites), regions that are tightly folded or chemically modified so as to permanently disable turning on of genes (methylation and chromatin modification).

As described in the central dogma, the first step in the synthesis of proteins (actual functional molecules like enzymes and antibodies) is the copying of the information from DNA to RNA (called transcription). This is known as gene expression - a gene can be activated or repressed based on what transcription factors bind upstream to the transcription start site (TSS) of the gene. The figure on the right describes this process.

The members of the ENCODE project have conducted several high-throughput experiments in multiple tissues to study this intricate mechanism. Some groups focus on identifying the sites where the transcription factors bind to DNA, others perform experiments to quantify levels of RNA produced and so on.

The Snyder lab specializes in both experimental and computational analysis, my focus is the computational analysis of the data. The nice thing is that I get to see the cool things first..!!

As I mentioned above, gene regulation is mainly mediated by proteins called transcription factors (TFs). Most of these proteins have a defined role - some function as activators, some as repressors, some as insulators. They also have positional preferences - some bind close to a gene's transcription start site - TSS - (promoter region) while some prefer to act from a distance (like 5000 or 10000 bases away from TSSs, called enhancers. Figure 1a is a pictorial representation of the process.

One specific interest that I had is the combinatorial interaction of transcription factors. Although it is known that the TFs interact with each other, this is the first time that one gets a chance to do a systematic analysis (thanks to the dipping cost of sequencing and smart experiments).

Our contribution was threefold:

    • Identify transcription factors that form complexes thereby describing co-operativity and competition.

    • Study how the combinatorial interaction of transcription factors can result in regulation of gene expression.

    • Describe the interaction of transcription factors in the light of protein-protein interactions and in a network topology.

Figure 2a

Nature Reviews Genetics 5, 276-287

Effect of Variation in the Regulatory Region

As an off-shoot of the ENCODE project described above, we integrate the knowledge of regulatory regions and the distribution of polymorphisms within individuals.

Every individual genome is unique (as if you didn't know that..). But most of us have a very large similarity. In fact, there is 99.1% similarity of our genome and that of a chimpanzee (so don't be amazed why some people show more of a non-human behavior once-in-a-while). Anyway, the modest differences in the genome sequence too can have a pretty large effect on the individual, depending on where the variations are. Speaking of variation, they could be single base change (an A changing to a T or a C changing to a G), a few bases inserted or deleted (eg., a ATCAG gone missing or CAGACC getting inserted) or even a few hundred bases missing. The first two kinds (SNPs and Insertions-Deletions) fall into a class of variations that are referred to as Single Nucleotide Variants or SNVs while the latter is a called a Structural Variant (SV) or a Copy Number Variant (CNV). More about variation can be read here.

Now, imagine if these variants were in a very critical region - you would suppose that these would have a detrimental effect. Yes, it does. We specifically look at variants in the regions where transcription factors can bind. Since the binding of the TFs govern the on- or off- states of a gene (to a large extent), these sites are of great significance to gene regulation. See a pictorial representation in the right panel (Figure 2a).

In one of the more recent projects I am involved in, we annotate every possible variant in the human genome based on its presence or absence in a regulatory region.

The method proposes the following advantages:

    • Real-time computation.

    • A scoring scheme for variants in terms its potential to affect regulation

I joined Mike Snyder's lab when an interesting project was going on. Two other postdocs - Maya and Fabian - had been performing experiments to identify binding sites of two key transcription factors NFkB and PolII. The project was designed to identify the effect of large structural variants, sometimes referred to as Copy Number Variants (CNVs). These include inversions, deletions duplications etc., but involving hundreds of nucleotides.

This project specifically looked at the binding sites of NFkB and PolII in ten different individuals and one Chimpanzee. Close to 25% of the PolII sites and 7.5% of the NFkB sites exhibited significant binding differences among individuals, while the binding differences between the humans and the chimpanzee were about 32 percent for PolII. Many of these binding differences were due large regions of the genome gone missing or trans located.

We could also find small variants in the binding sites that could be result in the loss of binding of the transcription factor.

And what more? We performed experiments to measure the RNA produced from the genes (the result of transcription) and found that the variation that resulted in the loss of binding of transcription factor binding in turn resulted in the lack of production of the transcript.

First personal multi-omic Project (Syndrome)

More recently, I was involved in a project where a volunteer in the lab (the PI, Michael Snyder) offered his own blood so we could study perform a multiple -omics project. This project aimed at analyzing multiple facets of an individual - his genome, proteome, transcriptome, metabolome and autoantibodiome. This project is more significant because it was the first time that anybody had attempted to study these characterstics across several months (14 months). During this period, he acquired two viral infections (common cold - HRV and RSV). Although the genomic content is static all through life, one's gene expression profile (and resulting protein content) constantly change (like after taking your meals, several genes get turned on and different proteins are made to metabolize the food) and so does your antibodies during and after an infection. So the project was aimed at characterizing these dynamic shifts in the individual and also to reveal more basic biology (especially on the effect of variations both in the genic and regulatory regions). A summary of the finding can be pictorially represented as a "Circos" plot, shown as Figure 3a.

A bonus of the project was that Mike was able to detect the onset of diabetes during the course of this study. Regular biochemical assays were performed during every sample collection. After the second infection, he had surprisingly high levels of blood glucose. This knowledge alter his lifestyle considerably and was able to control the diabetes quite well.

As always, I was interested to see how the SNPs at the regulatory regions affect the transcription factor binding. We did find lots of examples, but two of them are shown in Figure 3b.

I am also involved in other projects which have similar objectives, I shall post more details later.

Figure 3a

Figure 3b (From our Science paper Vol. 328 no. 5975 pp. 232-235)

Figure 3c: Figure 3d:

(From our Cell paper 148, 1293-1307)

microRNAs and their Regulatory Potential

MicroRNAs or miRNAs or miRs are small non-coding RNAs, ie., they do not code for a protein.These are a recently discovered class of molecules that play a significant role in post-transcription gene regulation (ie., they act of mRNAs that have been already transcribed). This process is generally termed as RNA interference (or RNAi). In 1998, Andrew Fire and Craig Mello shared the Nobel Prize for their work on RNAi.

miRNAs are ~17-25 nt long when "mature". They are processed from longer strands of RNA which have a hairpin like structure. Figure 4a summarizes the long and complicated process which is mediated by several enzymes.

Once they are "exported" to the cytoplasm, they are free to interact with the other mature transcripts (mRNA) that are ready for the protein synthesis (translation). The miRNAs generally bind to the 3' UTRs (UnTranslated Regions) of the mRNAs. They do so by incomplete complementarity to the sequences in the mRNA. The end result is the reduced efficiency of the transcript to produce proteins. This could be either due to degradation of mRNA (because decapping or deadenylation), due to proteolysis of the polypeptide chain, due to an initiation block or due to the elongation block (ribosome fall-off). I guess people are still working on finding out the exact mechanism of RNAi mediated by miRNAs. Anyway, suffice to understand that miRNA can mediate a reduced protein production, efficiently.

Two key points to note are

  • A particular miRNA can target several different mRNAs (multiplicity)

  • A particular transcript can be targeted by several different miRNAs (cooperativity)

More details on miRNAs can be found here.

While we first set out to explore the marvels of miRNA mediated gene regulation, we stumbled on the fact that human microRNAs could actually target the genes of HIV. Logically, this is a very plausible scenario - the viral transcripts are naked in the cytoplasm - accessible to the entire repertoire of molecules that the human cell produce. miRNAs being one such molecule can find sites of incomplete complementarity, attach and do its magic - put brakes on the proteins synthesized by the viral transcript..! This was a total hit - using multiple computational tools designed to identify miRNA targets, we found that five human miRNAs - hsa-miR-29a, hsa-miR-29b, hsa-miR-149, hsa-miR-378 and hsa-miR-324-5p could target the HIV RNA - nef, vpr and vpu. This discovery was followed by experimental validation and is currently under trials for clinical development. An interesting story based on this discovery was published in The Telegraph (Calcutta). You can read it here. A defective nef (Negative EFfector of replication) gene is associated with long term non-progressors of AIDS. We derive that if the human miRNAs can also cause reduced Nef levels, it could have an effect on non-progression to AIDS, even if the individual is infected with HIV (Fig 4b).

This discovery also led to Indian and US patents.

FIGURE 4a: Biogenesis and Action of miRNAs

FIGURE 4b: Human miRNA can target HIV transcripts

(in part from our BBRC paper 337(4): 1214-8)

Intronic microRNAs and their Regulatory Potential

Why does nature produce intronic miRNAs? Do they have any specific advantage in gene regulation? These were questions that I addressed during the most part of my Ph.D. research.

To introduce some terminology (jargon), we refer to the gene that harbors the miRNA in its intron as a "source gene" or a "host gene".

The key idea of the project is that:

a set of functionally related genes can be silenced in a phased manner by miRNAs released from the source gene; coordinated and synchronized regulation of miRNA release with expression of the source gene. (Fig 5a).

Fig 5b is a wordle that summarizes my Ph.D work.

Some of the tools that we developed are here:

FIGURE 5a: Significance of intronic miRNAs

FIGURE 5b: Summary of my Ph.D. thesis

This page will be constantly updated.. do visit often .. Thanks..!