PANDA

(Passing Attributes between Networks for Data Assimilation)

Implementations and Resources:

PANDA is available in multiple programming languages. Current versions together with tutorials are available through netZoo. A GPU-optimized version of the algorithm is also available through gpuZoo.


In addition:


Some "PANDA-ready" motif mappings are available in the Resources section of this webpage.

Method Papers:

Passing Attributes between Biological Networks to Refine Predicted Interactions

Regulatory  network  reconstruction  is  a  fundamental  problem  in  computational  biology.  There  are significant limitations to  such reconstruction using individual datasets, and increasingly people attempt to  construct networks using  multiple, independent datasets obtained from complementary sources, but methods for such integration are lacking. We developed PANDA (Passing Attributes between Networks for  Data  Assimilation),  a  message-passing  model  using  multiple  sources  of  information  to  predict regulatory  relationships,  and  used  it  to  integrate  protein-protein  interaction,  gene  expression,  and sequence motif data to reconstruct genome-wide, condition-specific regulatory networks in yeast  as a model.  The  resulting networks were  not only  more accurate than those produced using individual data sets  and  other  existing  methods,  but  they  also  captured  information  regarding  specific  biological mechanisms and pathways that  were missed using other methodologies.  PANDA  is scalable  to higher eukaryotes, applicable to  specific tissue  or cell type  data  and  conceptually generalizable  to  include  a variety  of regulatory, interaction, expression, and other genome-scale data. 


High Performance Computing of Gene Regulatory Networks using a Message-Passing Model (also on the arXiv)

Gene regulatory network reconstruction is a fundamental problem in computational biology. We recently developed an algorithm, called PANDA (Passing Attributes Between Networks for Data Assimilation), that integrates multiple sources of 'omics data and estimates regulatory network models. This approach was initially implemented in the C++ programming language and has since been applied to a number of biological systems. In our current research we are beginning to expand the algorithm to incorporate larger and most diverse data-sets, to reconstruct networks that contain increasing numbers of elements, and to build not only single network models, but sets of networks. In order to accomplish these "Big Data" applications, it has become critical that we increase the computational efficiency of the PANDA implementation. In this paper we show how to recast PANDA's similarity equations as matrix operations. This allows us to implement a highly readable version of the algorithm using the MATLAB/Octave programming language. We find that the resulting M-code much shorter (103 compared to 1128 lines) and more easily modifiable for potential future applications. The new implementation also runs significantly faster, with increasing efficiency as the network models increase in size. Tests comparing the C-code and M-code versions of PANDA demonstrate that this speed-up is on the order of 20-80 times faster for networks of similar dimensions to those we find in current biological applications.


PyPanda: a Python Package for Gene Regulatory Network Reconstruction (also on the arXiv)

Summary: PANDA (Passing Attributes between Networks for Data Assimilation) is a gene regulatory network inference method that uses message-passing to integrate multiple sources of ‘omics data. PANDA was originally coded in C++. In this application note we describe PyPanda, the Python version of PANDA. PyPanda runs considerably faster than the C++ version and includes additional features for network analysis. Availability: The open source PyPanda Python package is freely available at https://github.com/davidvi/pypanda. Contact: mkuijjer at jimmy dot harvard dot edu or d.g.p.van ijzendoorn at lumc dot nl 


Estimating Gene Regulatory Networks with pandaR

PANDA (Passing Attributes between Networks for Data Assimilation) is a gene regulatory network inference method that begins with a model of transcription factor-target gene interactions and uses message passing to update the network model given available transcriptomic and protein-protein interaction data. PANDA is used to estimate networks for each experimental group and the network models are then compared between groups to explore transcriptional processes that distinguish the groups. We present pandaR (bioconductor.org/packages/pandaR), a Bioconductor package that implements PANDA and provides a framework for exploratory data analysis on gene regulatory networks.

Application Papers:

Understanding Tissue-specific Gene Regulation (also on the bioRxiv) (Resources)

Although all human tissues carry out common processes, tissues are distinguished by gene expression patterns, implying that distinct regulatory programs control tissue-specificity. In this study, we investigate gene expression and regulation across 38 tissues profiled in the Genotype-Tissue Expression project. We find that network edges (transcription factor to target gene connections) have higher tissue-specificity than network nodes (genes) and that regulating nodes (transcription factors) are less likely to be expressed in a tissue-specific manner as compared to their targets (genes). Gene set enrichment analysis of network targeting also indicates that regulation of tissue-specific function is largely independent of transcription factor expression. In addition, tissue-specific genes are not highly targeted in their corresponding tissue-network. However, they assume bottleneck positions due to changes in transcription factor targeting and the influence of non-canonical regulatory interactions. These results suggest that tissue-specificity is driven by the creation of new regulatory paths, providing transcriptional control of tissue-specific processes.


Regulatory network changes between cell lines and their tissues of origin (also on the bioRxiv)

BACKGROUND: Cell lines are an indispensable tool in biomedical research and often used as surrogates for tissues. Although there are recognized important cellular and transcriptomic differences between cell lines and tissues, a systematic overview of the differences between the regulatory processes of a cell line and those of its tissue of origin has not been conducted. The RNA-Seq data generated by the GTEx project is the first available data resource in which it is possible to perform a large-scale transcriptional and regulatory network analysis comparing cell lines with their tissues of origin. RESULTS: We compared 127 paired Epstein-Barr virus transformed lymphoblastoid cell lines (LCLs) and whole blood samples, and 244 paired primary fibroblast cell lines and skin samples. While gene expression analysis confirms that these cell lines carry the expression signatures of their primary tissues, albeit at reduced levels, network analysis indicates that expression changes are the cumulative result of many previously unreported alterations in transcription factor (TF) regulation. More specifically, cell cycle genes are over-expressed in cell lines compared to primary tissues, and this alteration in expression is a result of less repressive TF targeting. We confirmed these regulatory changes for four TFs, including SMAD5, using independent ChIP-seq data from ENCODE. CONCLUSIONS: Our results provide novel insights into the regulatory mechanisms controlling the expression differences between cell lines and tissues. The strong changes in TF regulation that we observe suggest that network changes, in addition to transcriptional levels, should be considered when using cell lines as models for tissues.


Differential Connectivity of Gene Regulatory Networks Distinguishes Corticosteroid Response in Asthma

BACKGROUND: Variations in drug response between individuals have prevented us from achieving high drug efficacy in treating many complex diseases, including asthma. Genetics plays an important role in accounting for such inter-individual variations in drug response. However, systematic approaches for addressing how genetic factors and their regulators determine variations in drug response in asthma treatment are lacking. METHODS: We used PANDA (Passing Attributes between Networks for Data Assimilations) to construct the gene regulatory networks associated with good responders and poor responders to inhaled corticosteroids based on a subset of 145 Caucasian asthmatic children who participated in the Childhood Asthma Management Cohort (CAMP). PANDA utilizes gene expression profiles and published relationships among genes, transcription factors (TFs), and proteins to construct the directed networks of TFs and genes. We assessed the differential connectivity between the gene regulatory network of good responders vs. that of poor responders. RESULTS: When compared to poor responders, the network of good responders has differential connectivity and distinct ontologies (e.g., pro-apoptosis enriched in network of good responders and anti-apoptosis enriched in network of poor responders). Many of the key hubs identified in conjunction with clinical response are also cellular response hubs. Functional validation demonstrated abrogation of differences in corticosteroid treated cell viability following siRNA knockdown of two TFs and differential downstream expression between good-responders and poor-responders. CONCLUSIONS: We have identified and validated multiple transcription factors influencing asthma treatment response. Our results show that differential connectivity analysis can provide new insights into the heterogeneity of drug treatment effects.


Diet-induced weight loss leads to a switch in gene regulatory network control in the rectal mucosa

Background: Weight loss may decrease risk of colorectal cancer in obese individuals, yet its effect in the colorectum is not well understood. We used integrative network modeling, Passing Attributes between Networks for Data Assimilation, to estimate transcriptional regulatory network models from mRNA expression levels from rectal mucosa biopsies measured pre- and post-weight loss in 10 obese, pre-menopausal women.  Results: We identified significantly greater regulatory targeting of glucose transport pathways in the post-weight loss regulatory network, including “regulation of glucose transport” (FDR = 0.02), “hexose transport” (FDR = 0.06), “glucose transport” (FDR = 0.06) and “monosaccharide transport” (FDR = 0.08). These findings were not evident by gene expression analysis alone. Network analysis also suggested a regulatory switch from NFΚB1 to MAX control of MYC post-weight loss. Conclusions: These network-based results expand upon standard gene expression analysis by providing evidence for a potential mechanistic alteration caused by weight loss. 


A Network Model for Angiogenesis in Ovarian Cancer

We recently identified two robust ovarian cancer subtypes, defined by the expression of genes involved in angiogenesis, with significant differences in clinical outcome. To identify potential regulatory mechanisms that distinguish the subtypes, we used PANDA, a method that uses an integrative approach to model information flow in gene regulatory networks. We find distinct differences between networks that are active in the angiogenic and non-angiogenic subtypes, largely defined by a set of key transcription factors that, although previously reported to play a role in angiogenesis, are not strongly differentially-expressed between the subtypes. Our network analysis indicates that these factors activate (or repress) different genes in the two subtypes, resulting in differential expression of their network targets. Mechanisms mediating differences between subtypes include a previously unrecognized pro-angiogenic role for increased genome-wide DNA methylation and complex patterns of combinatorial regulation. The models we develop require a shift in our interpretation of the driving factors in biological networks away from the genes themselves and toward their interactions. The observed regulatory changes between subtypes suggest therapeutic interventions that may help in the treatment of ovarian cancer.


Sexually-Dimorphic Targeting of Functionally-Related Genes in COPD

There is growing evidence that many diseases develop, progress, and respond to therapy differently in men and women. This variability may manifest as a result of sex-specific structures in gene regulatory networks that influence how those networks operate. However, there are few methods to identify and characterize differences in network structure, slowing progress in understanding mechanisms driving sexual dimorphism. Here we apply an integrative network inference method, PANDA (Passing Attributes between Networks for Data Assimilation), to model sex-specific networks in blood and sputum samples from subjects with Chronic Obstructive Pulmonary Disease (COPD). We used a jack-knifing approach to build an ensemble of likely networks for each sex. By adapting statistical methods to compare these network ensembles, we were able to identify strong differential-targeting patterns associated with functionally-related sets of genes, including those involved in mitochondrial function and energy metabolism. Network analysis also identified several potential sex- and disease-specific transcriptional regulators of these pathways. Network analysis yielded insight into potential mechanisms driving sexual dimorphism in COPD that were not evident from gene expression analysis alone. We believe our ensemble approach to network analysis provides a principled way to capture sex-specific regulatory relationships and could be applied to identify differences in gene regulatory patterns in a wide variety of diseases and contexts.


Haploinsufficiency of Hedgehog interacting protein causes increased emphysema induced by cigarette smoke through network rewiring

The HHIP gene, encoding Hedgehog interacting protein, has been implicated in chronic obstructive pulmonary disease (COPD) by genome-wide association studies (GWAS), and our subsequent studies identified a functional upstream genetic variant that decreased HHIP transcription. However, little is known about how HHIP contributes to COPD pathogenesis. We exposed Hhip haploinsufficient mice (Hhip (+/-) ) to cigarette smoke (CS) for 6 months to model the biological consequences caused by CS in human COPD risk-allele carriers at the HHIP locus. Gene expression profiling in murine lungs was performed followed by an integrative network inference analysis, PANDA (Passing Attributes between Networks for Data Assimilation) analysis. We detected more severe airspace enlargement in Hhip (+/-) mice vs. wild-type littermates (Hhip (+/+) ) exposed to CS. Gene expression profiling in murine lungs suggested enhanced lymphocyte activation pathways in CS-exposed Hhip (+/-) vs. Hhip (+/+) mice, which was supported by increased numbers of lymphoid aggregates and enhanced activation of CD8+ T cells after CS-exposure in the lungs of Hhip (+/-) mice compared to Hhip (+/+) mice. Mechanistically, results from PANDA network analysis suggested a rewired and dampened Klf4 signaling network in Hhip (+/-) mice after CS exposure. In summary, HHIP haploinsufficiency exaggerated CS-induced airspace enlargement, which models CS-induced emphysema in human smokers carrying COPD risk alleles at the HHIP locus. Network modeling suggested rewired lymphocyte activation signaling circuits in the HHIP haploinsufficiency state.