I am a bioinformatics researcher. I use computational tools (programs and software) to provide solutions to biological questions. My main focus is genomics. I have worked on various facets of genomics - gene regulation mediated by small RNA (miRNA), population genetics, gene regulation etc. Below are brief descriptions of the various projects that I have worked on (and currently involved in).
1. Epigenetics and Gene Expression: For the past several years I have been studying the role of epigenetic modifications, especially the effect of DNA methylation on gene regulation in human and mouse tissues. Using base-resolution, bisulfite sequencing we discovered several tissue specific methylation signatures that are correlated to gene expression. In the NIH-Epigenome project, we correlated gene expression profiles to DNA methylation changes, across major organs of four individuals, revealing novel modes of gene regulation. As part of ENCODE-3 project, I took a lead role on the transcriptome analysis of mouse tissues sampled at various fetal developmental time points that correspond to manifestation of human birth defects to explore such mechanisms during early development. Using multifactorial analysis, I was able to discern gene expression patterns in time and space and link these to the epigenetic changes. We anticipate these studies will open new avenues to understanding causes of human birth defects at the molecular level.
In another project called Epigenetic Characterization and Observation ECHO, funded by DARPA, I focus on identifying DNA methylation-based biomarkers in immune cell types of PBMCs as a response to exposure to various chemical and biological agents. We do so by profiling methylation levels within single cells of each of the cell types and identifying differentially methylated regions. As part of the Human Performance Alliance funded by the Wu-Tsai foundation, we performed single-nucleus methylation sequencing on three major skeletal muscle tissues of control and exercised male and female mice to identify differentially methylated sites and genes.
a. W Wang, M Hariharan (co-first author), W Ding,… Ecker JR (2025) – bioRxiv (second revision at Nature Genetics) Genetics and Environment Distinctively Shape the Human Immune Cell Epigenome
b. M Hariharan, S Patel, H Song,.. Ecker JR (2025) – bioRxiv DNA Methylation Dynamics of Dose-dependent Acute Exercise, Training Adaptation, and Detraining.
c. He Y, Hariharan M, .. Visel A, Pennachio L, Ren B, Ecker JR. (2020). Spatiotemporal DNA methylome dynamics of the developing mammalian fetus. Nature, 583(7818), 752-759.
d. Schultz MD, He Y, Whitaker JW, Hariharan M, .. Ren B, Sejnowski TJ, Wang W, Ecker JR. Human body epigenome maps reveal noncanonical DNA methylation variation. Nature. 2015 Jul 9;523(7559):212-6.
2. Mitochondrial Biology: I developed a new project to accurately assess mitochondrial heteroplasmy. Mitochondrial DNA (mtDNA) tend to accumulate somatic variations. The balance of wild-type versus mutant mtDNA within cells is a major factor underlying neurological and other disorders, cancers, differentiation and aging. Each cell contains hundreds to thousands of mtDNA molecules. Using molecular tags on each molecule within single cells, amplify these using a PCR-free approach and perform full-length sequencing of mtDNA. Preliminary experiments have been completed successfully and we are in the later stages of optimizing the method. We envision that these studies will open new horizons in the prevention, diagnosis and treatment of the numerous mitochondria-related disorders that manifest especially during aging (such as Alzheimer’s disease)
3. The Human Gene Regulatory Code: For my post-doctoral research, as part of the NHGRI-funded ENCODE2 project, I analyzed ChIP-seq datasets of 119 transcription factors (TFs) in multiple cell lines that revealed the combinatorial interaction of the TFs. This led to an in depth understanding of various context-dependent influences of TFs on gene expression and served as the basis of developing a regulatory code. We used the Random Forest algorithm (an unsupervised machine-learning method) to uncover these effects. The regulatory code was able to identify several novel cascades of gene regulation including new master TFs and their hierarchical structure. Related to this, I also assessed the impact of copy number and SNVs at regulatory sites on gene expression.
a. Gerstein MB, Kundaje A, Hariharan M (co-first author), .. Batzoglou S, Sidow A, Farnham PJ, Myers RM, Weissman SM, Snyder M. Architecture of the human regulatory network derived from ENCODE data. Nature. 2012 Sep 6;489(7414):91-100.
b. Kasowski M, Grubert F, Heffelfinger C, Hariharan M, .. Weissman SM, Gerstein MB, Korbel JO, Snyder M. Variation in transcription factor binding among humans. Science. 2010 Apr 9;328(5975):232-5.
4. Personal Genomes for Precision Medicine: As part of the ENCODE2 project, we also developed a web-server RegulomeDB to annotate newly discovered variants to enable researchers to evaluate their affects, especially in the non-coding regions of the genome. This is widely being used in the community, including for Genome Wide Association Studies. This server enables the functional annotation of SNVs and larger variations discovered through whole genome sequencing. During my postdoctoral research, I also played a major role in the first multi-omics profiling of a human. This research provided a clear insight into the altered patterns of gene expression, metabolites, antibodies and other molecular components at high resolution in a longitudinal study. This study also included sample collection during periods of common viral infections.
a. Boyle AP, Hong EL, Hariharan M, .. Cherry JM, Snyder M. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012 Sep;22(9):1790-7.
b. Chen R, Mias GI, Li-Pook-Than J, Jiang L, Lam HY, Chen R, Miriami E, Karczewski KJ, Hariharan M, .. Altman RB, Butte AJ, Ashley EA, Gerstein M, Nadeau KC, Tang H, Snyder M. Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell. 2012 Mar 16;148(6):1293-307.
5. Stem Cell Biology: I have worked on assessing the efficiency of reprograming using various methods based on epigenetic and transcriptomic profiles. In collaboration with Dr. Shoukhrat Mitalipov, I evaluated the quality of human embryos generated using two methods – somatic cell nuclear transfer and polar body transfer, comparing the transcriptome profiles of the cell types. These provide the basis for future implementation of these techniques for infertility treatment and also other stem cell-based treatments. I am also the team leader of the Salk Center of Excellence in Stem Cell Genomics funded by the California Institute of Regenerative Medicine (CIRM) where I collaborate with several stem cell researchers in California in specific stem cell projects.
a. Ma H, Morey R, O'Neil RC, He Y, Daughtry B, Schultz MD, Hariharan M, .. Ecker JR, Laurent LC, Mitalipov S. Abnormalities in human pluripotent cells due to reprogramming mechanisms. Nature. 2014 Jul 10;511(7508):177-83.
b. Ma H, O'Neil RC, Marti Gutierrez N, Hariharan M, .. Ecker JR, Mitalipov S. Functional Human Oocytes Generated by Transfer of Polar Body Genomes. Cell Stem Cell. 2017 Jan 5;20(1):112-119.
6. microRNA-mediated Gene Regulation in Humans: I have been working in the field of gene regulation for the past 14 years and a major duration was focused on the role of microRNAs (miRNAs) in post-transcriptional gene silencing. We were the first to discover human miRNA targets to HIV and H5N1 influenza virus (patents granted and filed, respectively). This miRNA is currently being tested as a biomarker based on its abundance in long-term non-progressors of AIDS. I have also pioneered the development of several tools for miRNA research such as, miRacle for target predictions, miRex for miRNA expression meta-analysis, database of consensus miRNA targets (TargetmiR) and a web-server for analysis of effect of SNPs on miRNA targets (dbSMR). These accomplishments were transformative to the field of miRNA biology. Based on my findings on the role of intronic miRNA hsa-mir-25 on cell cycle regulation, we used mathematical modeling to improve existing gene regulatory model of human cell cycle pathway. This model now provides accurate simulations of cell cycle as seen in cancers or other perturbations.
a. Gokhale S, Hariharan M, Brahmachari SK, Gadgil C. A simple method for incorporating dynamic effects of intronic miRNA mediated regulation. Mol Biosyst. 2012 Aug;8(8):2145-52.
c. Hariharan M, Scaria V, Brahmachari SK. dbSMR: a novel resource of genome-wide SNPs affecting microRNA mediated regulation. BMC Bioinformatics. 2009 Apr 16;10:108.
d. Hariharan M, Scaria V, Pillai B, Brahmachari SK. Targets for human encoded microRNAs in HIV genes. Biochem Biophys Res Commun. 2005 Dec 2;337(4):1214-8.
7. The Indian Genome Variation Project: Along with my Ph.D. research, I was also actively involved in the first pan-India genome variation analysis project. I was a key member of the data analysis team I developed an online sample tracking system called Nimbus. Annotating various SNPs across the discovery panel of samples was a major activity. I was also involved in development of consent forms, questionnaires and sample management, apart from computational analysis and interpretation of the genotyping data.
a. Indian Genome Variation Consortium. (2005) The Indian Genome Variation database (IGVdb): a project overview. Hum Genet. 1-11.
I am the Lead Investigator in the multi-million dollar DARPA funded project called ECHO.
Our lab is focussed on generating and analyzing single cell DNA methylation data from blood cells of individuals who have been exposed to various chemical, biological, radioactive, nuclear or explosive agents. We employ single cell (single nucleus) resolution DNA methylation profiling of various sub-types of PBMCs to profile the DNA methylation within individual cells (snmC-seq2). The SAFE (Single cell Analysis for Forensic Epigenetics) project headed by our lab, in collaboration with Greenleaf Lab at Stanford will also generate snATAC-seq data. Together these high-resolution, high-coverage dataset will enable discovery of epigenetic signatures that would be used as probes for field-deployable devices to assess the agent of exposure and its extent.
For more information about the ECHO project, please visit items here and here.
Image: DARPA-ECHO logo
I am also the Lead Investigator in another multi-million dollar Department of Defense (Office of Naval Research) funded project called PHITE.
This program aims to assess the effect of high and moderate physical training on the epigenetic profile of individuals. We perform base-resolution DNA methylation assay (methylC-seq) on blood and muscle samples from individuals who have undertaken either of the two exercise regimens. Time-course based sampling provides a unique insight into the DNA methylation dynamics.
For more information about the PHITE project, please visit items here and here.
Image Courtesy: UAB
I play a lead role in the activities of the Center of Excellence in Stem Cell Genomics at the Salk Institute. In 2014, the California Institute of Regenerative Medicine (CIRM) established (through a $40 million award) two nodal centers (at Stanford and Salk) to initiate collaborative research with stem cell researchers in California. Incidentally, the co-directors are Prof. Mike Snyder (my post-doc advisor) and Prof. Joe Esker (my current boss)..!
We collaborate with four of the seven applicants selected for this program: (1) Prof. Gay Crooks at UCLA to identify and overcome transcriptome barriers to generating hematopoietic stem cells from pluripotent stem cells, (2) Prof. Guoping Fan at UCLA for genomic analysis of stem cell differentiation in human overgrowth syndrome, (3) Prof. Kelly Frazer at UCSD for population wide study of functional genomics of drug induced electrophysical phenotypes in human cardiomyocites and (4) Prof. Benoit Bruneau at UCSF to study the epigenomics of human cardiac differentiation and congenital heart disease.
For more information about Salk Center of Excellence in Stem Cell Genomics, please visit Salk CESCG website here.
For more information about California's Stem Cell program and several interesting information about stem cells and medical applications, please click here.
At the Ecker lab, where I been working since Oct 2012, we study the widespread effect of DNA methylation in gene regulation. This is a specific chemical modification where a methyl group is added at the 5th carbon on the normal "C" (cytosine) of DNA. This is a normal biological phenomenon mediated by enzymes and is a stably inherited process. This is also one of a major "epigenetic" modification. For more info on epigenetics, please click here.
Below are four areas where I am involved in the effort to understand the various complexities of gene regulation mediated by DNA methylation and other factors:
a) Exploring the role of epigenetic factors in gene regulation dynamics during mouse development - this project is part of the ENCODE (phase 3) funded by the NHGRI. Human and mouse have a close resemblance during early developmental stages – morphologically and at the molecular level (fig 1). Several severe birth defects in humans [like orofacial clefts (which includes clefts of the lip and/or palate) and congenital heart defects] manifest at these early stages of development and have a concurrent timeline in mouse. Since it is difficult to obtain tissue samples from human fetuses, mouse tissues serve as surrogates to study the various molecular trajectories involved during development. We, along with collaborators at UCSD and the Lawrence Berkeley National Laboratory, are looking at various transcriptomic and epigenetic changes during development that in turn control the overall molecular aspects of gene regulation. We are building connections using these datasets that would eventually reveal cornerstones that govern normal development of an embryo to an adult.
b) Studying the role of DNA methylation on gene expression and long-term effects of neuropsychiatric disorders in a schizophrenia mouse model. This is research work in progress and I'll not be able to expand more on this topic at the stage, on a public webpage. Please contact me if you would like to learn more.
c) Elucidating the role of small RNA in embryonic stem cell and their derived cell types. This is research work in progress and I'll not be able to expand more on this topic at the stage, on a public webpage. Please contact me if you would like to learn more.
Figure 1
Image courtesy: Prof. Bing Ren, UCSD
I was first exposed to the ENCODE project or the ENCyclopedia Of Dna Elements during my tenure as a postdoc at Michael Snyder's lab at Stanford University (and Yale). This is an international consortium project aimed to fully annotate (decode the functional aspects) the Human Genome.
Various regions of the genome have specific roles: serving as a master templates for synthesis of proteins (gene regions), regions where controlling proteins can bind (transcription factor binding sites), regions that are tightly folded or chemically modified so as to permanently disable turning on of genes (methylation and chromatin modification).
As described in the central dogma, the first step in the synthesis of proteins (actual functional molecules like enzymes and antibodies) is the copying of the information from DNA to RNA (called transcription). This is known as gene expression - a gene can be activated or repressed based on what transcription factors bind upstream to the transcription start site (TSS) of the gene. The figure on the right describes this process.
The members of the ENCODE project have conducted several high-throughput experiments in multiple tissues to study this intricate mechanism. Some groups focus on identifying the sites where the transcription factors bind to DNA, others perform experiments to quantify levels of RNA produced and so on.
The Snyder lab specializes in both experimental and computational analysis, my focus is the computational analysis of the data. The nice thing is that I get to see the cool things first..!!
As I mentioned above, gene regulation is mainly mediated by proteins called transcription factors (TFs). Most of these proteins have a defined role - some function as activators, some as repressors, some as insulators. They also have positional preferences - some bind close to a gene's transcription start site - TSS - (promoter region) while some prefer to act from a distance (like 5000 or 10000 bases away from TSSs, called enhancers. Figure 1a is a pictorial representation of the process.
One specific interest that I had is the combinatorial interaction of transcription factors. Although it is known that the TFs interact with each other, this is the first time that one gets a chance to do a systematic analysis (thanks to the dipping cost of sequencing and smart experiments).
Our contribution was threefold:
Identify transcription factors that form complexes thereby describing co-operativity and competition.
Study how the combinatorial interaction of transcription factors can result in regulation of gene expression.
Describe the interaction of transcription factors in the light of protein-protein interactions and in a network topology.
Figure 2a
Nature Reviews Genetics 5, 276-287
As an off-shoot of the ENCODE project described above, we integrate the knowledge of regulatory regions and the distribution of polymorphisms within individuals.
Every individual genome is unique (as if you didn't know that..). But most of us have a very large similarity. In fact, there is 99.1% similarity of our genome and that of a chimpanzee (so don't be amazed why some people show more of a non-human behavior once-in-a-while). Anyway, the modest differences in the genome sequence too can have a pretty large effect on the individual, depending on where the variations are. Speaking of variation, they could be single base change (an A changing to a T or a C changing to a G), a few bases inserted or deleted (eg., a ATCAG gone missing or CAGACC getting inserted) or even a few hundred bases missing. The first two kinds (SNPs and Insertions-Deletions) fall into a class of variations that are referred to as Single Nucleotide Variants or SNVs while the latter is a called a Structural Variant (SV) or a Copy Number Variant (CNV). More about variation can be read here.
Now, imagine if these variants were in a very critical region - you would suppose that these would have a detrimental effect. Yes, it does. We specifically look at variants in the regions where transcription factors can bind. Since the binding of the TFs govern the on- or off- states of a gene (to a large extent), these sites are of great significance to gene regulation. See a pictorial representation in the right panel (Figure 2a).
In one of the more recent projects I am involved in, we annotate every possible variant in the human genome based on its presence or absence in a regulatory region.
The method proposes the following advantages:
Real-time computation.
A scoring scheme for variants in terms its potential to affect regulation
I joined Mike Snyder's lab when an interesting project was going on. Two other postdocs - Maya and Fabian - had been performing experiments to identify binding sites of two key transcription factors NFkB and PolII. The project was designed to identify the effect of large structural variants, sometimes referred to as Copy Number Variants (CNVs). These include inversions, deletions duplications etc., but involving hundreds of nucleotides.
This project specifically looked at the binding sites of NFkB and PolII in ten different individuals and one Chimpanzee. Close to 25% of the PolII sites and 7.5% of the NFkB sites exhibited significant binding differences among individuals, while the binding differences between the humans and the chimpanzee were about 32 percent for PolII. Many of these binding differences were due large regions of the genome gone missing or trans located.
We could also find small variants in the binding sites that could be result in the loss of binding of the transcription factor.
And what more? We performed experiments to measure the RNA produced from the genes (the result of transcription) and found that the variation that resulted in the loss of binding of transcription factor binding in turn resulted in the lack of production of the transcript.
More recently, I was involved in a project where a volunteer in the lab (the PI, Michael Snyder) offered his own blood so we could study perform a multiple -omics project. This project aimed at analyzing multiple facets of an individual - his genome, proteome, transcriptome, metabolome and autoantibodiome. This project is more significant because it was the first time that anybody had attempted to study these characterstics across several months (14 months). During this period, he acquired two viral infections (common cold - HRV and RSV). Although the genomic content is static all through life, one's gene expression profile (and resulting protein content) constantly change (like after taking your meals, several genes get turned on and different proteins are made to metabolize the food) and so does your antibodies during and after an infection. So the project was aimed at characterizing these dynamic shifts in the individual and also to reveal more basic biology (especially on the effect of variations both in the genic and regulatory regions). A summary of the finding can be pictorially represented as a "Circos" plot, shown as Figure 3a.
A bonus of the project was that Mike was able to detect the onset of diabetes during the course of this study. Regular biochemical assays were performed during every sample collection. After the second infection, he had surprisingly high levels of blood glucose. This knowledge alter his lifestyle considerably and was able to control the diabetes quite well.
As always, I was interested to see how the SNPs at the regulatory regions affect the transcription factor binding. We did find lots of examples, but two of them are shown in Figure 3b.
I am also involved in other projects which have similar objectives, I shall post more details later.
Figure 3a
Figure 3b (From our Science paper Vol. 328 no. 5975 pp. 232-235)
Figure 3c: Figure 3d:
(From our Cell paper 148, 1293-1307)
MicroRNAs or miRNAs or miRs are small non-coding RNAs, ie., they do not code for a protein.These are a recently discovered class of molecules that play a significant role in post-transcription gene regulation (ie., they act of mRNAs that have been already transcribed). This process is generally termed as RNA interference (or RNAi). In 1998, Andrew Fire and Craig Mello shared the Nobel Prize for their work on RNAi.
miRNAs are ~17-25 nt long when "mature". They are processed from longer strands of RNA which have a hairpin like structure. Figure 4a summarizes the long and complicated process which is mediated by several enzymes.
Once they are "exported" to the cytoplasm, they are free to interact with the other mature transcripts (mRNA) that are ready for the protein synthesis (translation). The miRNAs generally bind to the 3' UTRs (UnTranslated Regions) of the mRNAs. They do so by incomplete complementarity to the sequences in the mRNA. The end result is the reduced efficiency of the transcript to produce proteins. This could be either due to degradation of mRNA (because decapping or deadenylation), due to proteolysis of the polypeptide chain, due to an initiation block or due to the elongation block (ribosome fall-off). I guess people are still working on finding out the exact mechanism of RNAi mediated by miRNAs. Anyway, suffice to understand that miRNA can mediate a reduced protein production, efficiently.
Two key points to note are
A particular miRNA can target several different mRNAs (multiplicity)
A particular transcript can be targeted by several different miRNAs (cooperativity)
More details on miRNAs can be found here.
While we first set out to explore the marvels of miRNA mediated gene regulation, we stumbled on the fact that human microRNAs could actually target the genes of HIV. Logically, this is a very plausible scenario - the viral transcripts are naked in the cytoplasm - accessible to the entire repertoire of molecules that the human cell produce. miRNAs being one such molecule can find sites of incomplete complementarity, attach and do its magic - put brakes on the proteins synthesized by the viral transcript..! This was a total hit - using multiple computational tools designed to identify miRNA targets, we found that five human miRNAs - hsa-miR-29a, hsa-miR-29b, hsa-miR-149, hsa-miR-378 and hsa-miR-324-5p could target the HIV RNA - nef, vpr and vpu. This discovery was followed by experimental validation and is currently under trials for clinical development. An interesting story based on this discovery was published in The Telegraph (Calcutta). You can read it here. A defective nef (Negative EFfector of replication) gene is associated with long term non-progressors of AIDS. We derive that if the human miRNAs can also cause reduced Nef levels, it could have an effect on non-progression to AIDS, even if the individual is infected with HIV (Fig 4b).
This discovery also led to Indian and US patents.
FIGURE 4a: Biogenesis and Action of miRNAs
FIGURE 4b: Human miRNA can target HIV transcripts
(in part from our BBRC paper 337(4): 1214-8)
Why does nature produce intronic miRNAs? Do they have any specific advantage in gene regulation? These were questions that I addressed during the most part of my Ph.D. research.
To introduce some terminology (jargon), we refer to the gene that harbors the miRNA in its intron as a "source gene" or a "host gene".
The key idea of the project is that:
a set of functionally related genes can be silenced in a phased manner by miRNAs released from the source gene; coordinated and synchronized regulation of miRNA release with expression of the source gene. (Fig 5a).
Fig 5b is a wordle that summarizes my Ph.D work.
Some of the tools that we developed are here:
FIGURE 5a: Significance of intronic miRNAs
FIGURE 5b: Summary of my Ph.D. thesis
This page will be constantly updated.. do visit often .. Thanks..!