Auto clustering in DAGS

Problem

In compiling deep learning graphs from frameworks such as Tensorflow, one of the first steps is clustering islands of nodes together based on certain criteria, so that those groups can be "compiled" for an accelerator. The computational graphs (in most cases) is a directed acyclic graph (DAG).

Some examples of constraining criteria are discussed wrt the 2 sample graphs shown below.

Certain nodes are not supported. Eg, X0 marked red
- Some nodes are just not supportable in the accelerator, or maybe too slow on them, or during development all nodes are not supported though eventually they might be
Nodes must know the value of inputs (rather than just shape). Eg, B0 in orange must know the values of the dotted inputs
- A typical example is "reshape" node that takes 2 inputs, one the tensor to reshape, and the other the output shape. The exact values contained in the "output shape" input must be known (not just its shape). This is not known till runtime. Hence the "output shape" must be fed by a constant node, or it must lie in a separate cluster from the node that is feeding it.

The 2 sample graphs are exactly same, except that in graph0 B0 must know the value of A4 (hence they must lie in different clusters), while in graph1 A4 and B0 are not constrained in which cluster they may lie

Solution 1: Using Tensorflow graph cycles

Tensorflow handles the clustering using GraphCycles. It creates a copy graph and exposes a public function ContractEdge that attempts to contract an edge and bring 2 nodes in the same cluster. It fails if a cycle is formed (thus breaking DAG-ness). Let us walk through some steps of the contraction for graph0

First create the GraphCycles graph, which has the same structure as the original graph. Add nodes A0B0 and A4B0 as shown. These extra nodes will enforce the criteria that A0B0 and A4B0 should not lie in the same cluster. The bold edges represent edges that will never contract. Incoming and outgoing edges from unsupported nodes like X0 will never contract (because they will never be part of any cluster). Also A0->A0B0, A0B0->B0 will never contract. Thus if at some point A0->B0 edge is contracted, A0->A0B0 and A0B0->B0 will form a loop, thus violating DAG-ness requirement

First an example of successful contraction: A0 and A2 are successfully contracted into {A0,A2}. With this contraction no loops were introduced, hence it is a valid contraction

Now an example of unsuccessful contraction request: If we tried to cluster A4 and B0, it would result in a loop between the new cluster {A4, B0} and the node A4B0. Thus this contraction will not be allowed

In the end for graph0, we get one cluster {A0, A1, A2}. It is up to the user to decide to call for the fusion for computationally independent nodes A3 and A4, but from the point of clustering they could safely lie in a cluster of their own. X0 and B0 stays separate.

In case of graph1, since A4 and B0 are not constrained in which cluster the can lie, we could get the following clusters: {A0, A1, A2}, {X0}, {A4, B0}, {A3} or {A0, A1, A2}, {X0}, {A4, A3}, {B0}.

Observations

Note that the clustering is not unique. Heuristics could be used to break ties. All that the algorithm guarantees is that a valid clustering will be produced. Optimality is up to the user to define, who would call for the contraction of nodes/edges in a certain order as he/she sees fit to get the desired clustering.
The starting GraphCycles graph is not unique. One could express the isolation of X0 by creating dummy nodes between A2 and X0, and X0 and A3, A4 much like the dummy nodes A0B0 and A4B0. Then we could request contraction between X0,A2, but that would not be allowed because the dummy nodes would form a cycle, and that cycle would never disappear, because we would never request contraction for the dummy nodes

Solution 2: A different view using longest paths

From the above example we can see that there are different constraints in play, but in the representation we can boil them down to a single construct. The crux is we want to be able to express "Do not put node i and node j in the same cluster". This constraint can easily cover the "unsupported node" constraint (in that case we apply it for all incoming and outgoing edges of the unsupported node. For other nodes like "reshape", we say only a certain input needs to be constrained (eg, for reshape keep the node generating the "output shape" input and the reshape node in separate clusters, but the other input and reshape could be in the same cluster)

The idea I present here is simply:

Create a graph that has same node/edge structure as original computational graph
Assign edges weight 0 if they don't have any placement constraint. Assign them weight 1 if you need the input node of the edge and the output node of the edge to be in different clusters
1. For unsupported nodes, all their input and output edges have weight 1
2. For "reshape" like nodes, whose certain inputs are constrained, mark only those input edges with weight=1
Run "Longest Path" algorithm from source nodes to each node. Nodes that get the same longest path number belong to the same cluster

Longest path problem for general graphs is NP-hard, but for DAGs its linear time.