Research

I am interested in Computational Systems Biology. I love to use graph algorithms and data mining techniques to solve the problems originated from Systems Biology and/or Social Network Analysis. During my PhD I worked with Dr. T. M. Murali on mining unstable communities from a collection of networks. My research summary is given below.

Graphs have long been used for modeling/representing the functional, regulatory, and physical interactions among different molecules (genes, proteins, or metabolites) in a cell. Similarly, interactions/relationships between persons or their online identities have been modeled predominantly via graphs. A major limitation of graphs is that they can only represent pairwise interactions (a.k.a., "edges" in a graph). However, interactions among more than two entities is quite common in social/biological networks. For example, three or more proteins often combine together (as a "protein complex") to accomplish certain functions of the cell. Similarly, group-interactions and relations among three or more persons are also very common in a society. Since graphs can only represent pairwise relationships/interactions they are ill-suited for modeling such "higher-order" interactions.

Hypergraphs are better alternatives in such scenarios because "edges" in a hypergraph (called hyperedges) can represent n-ary (n >= 1) relationships among different objects. A hypergraph cannot be converted into a graph without losing some information [1] which further corroborates the importance of hypergraphs. In fact, it has been shown that analyses on hypergraph-based representations of cellular interaction networks can yield very different results than the corresponding graph-based representations [2,3]. All these facts motivated me to work on hypergraph-based models of social/biological networks.

One of the primary hindrances against using hypergraph based models of networks is: only a few hypergraph datasets are available in the systems biology and social network databases and literature. In my PhD work, I attempted to fill this gap by developing methods to reconstruct hypergraphs from existing data. This requires inference of hyperedges from existing datasets. There can be many ways to formulate the notion of hyperedges to achieve this goal. I proposed a particular formulation, as I describe below.

Consider a set of nodes that induces highly varying subgraphs across a set of social/biological networks. I call such a set of nodes an Unstable community. I deem an Unstable Community (or, UC in short) as a type of "higher order relationship" among a group of objects and therefore propose to compute UCs from existing data as a way of inferring hyperedges. My motivation to deem an UC as a "higher order relationship" comes from the following observation.

Relationships, communications, or interactions between people in a society (as well as molecules in a cell) are not static; rather they vary with time, circumstances, social settings, etc. Therefore it is natural to expect some variation among their corresponding networks. Thus a set of nodes, which induces highly varying subgraphs (in a set of networks), may still be related (or, have been related) to each other. For instance, three characters from "Friends" TV show, namely, Chandler (C), Joey (J), and Kathy (K) show considerable variation in their relationship networks over time (Figure 1) and as such represents an UC.

Examples of hyperedge ({K, C, J}, in cyan background) and frequent subgraph ({R, M, P}, in light-red background) from a set of relationship networks.

Figure 1: Time-varying relationship network between six characters of the popular US TV Show "Friends." Here a set of three characters, namely, Chandler (C), Joey (J), and Kathy (K) represents an Unstable Community. Their relationship network (cyan background) varies over time as follows: C and J have long been best friends (G1). K started to date with J in episode 5 of season 4 (G2). Later, K and C fall in love with each other even while she continued to date J (G3). Then, K broke up with J (G4). When J learned about C’s love for K, he shunned C (G5). After a while, J forgave C while C continued to date with K (G6). Finally, C broke up with K (G7). In contrast, a group of five long-time friends, namely, Ross (R), Monica (M), Phoebe (P), Chandler (C), and Joey (J) induces the same subgraph (light red background) in all but one graph (G5) in these networks. These set of characters can easily be represented by the most frequent subgraph they induce. Such a group of people is often termed in the literature as a "frequent subgraph".

Note that, this is the first work of its kind. In particular, this problem is in sharp contrast to the popular problem of finding frequent (dense) subgraphs or stable communities from a set of networks where the goal is to find sets of nodes, each of which induces the same (dense) subgraph in many of the input graphs (Figure 1). Clearly, the connections among the members of such a set of nodes can be faithfully represented as a certain subgraph and therefore graph representation is enough for modeling the relationship among those nodes. However this is not true for our UCs, for which hyperedges are more suitable.

I came up with three mathematical formulations for UCs. I developed efficient algorithms to compute UCs using these formulations. Finally, I analyzed the UCs I got from different datasets and showed that our UCs gave novel and useful insight about the underlying biology/sociology.

References:

    1. Klamt, S. and Haus, U.U. and Theis, F. Hypergraphs and cellular networks. PLoS Computational Biology. Vol. 5. 2009.
    2. Zhou, W. and Nakhleh, L. Properties of metabolic graphs: biological organization or representation artifacts? BMC bioinformatics. Vol. 12. 2011.
    3. Murali, T. M., Ritz, A. Pathway analysis with signaling hypergraphs IEEE Transaction on Computational Biology & Bioinformatics (TCBB), 2015 (to appear).