Schedule‎ > ‎

An introduction to Centrality measures



Figure 1: This figure denotes the friendship network of Sudarshan's 500+ friends on facebook.

Looking at Fig. 1, we note two different structural features:

1) Two large components.

2) Also note the presence of several pendant vertices (leaves) apart from a few isolated vertices.


Some questions to ponder:


  1. If I need to recruit 10 people for my newly found organization, whom should I consider?
  2. If I am to pass on a message to three people in this network so that they in turn convey it to their friends and so on. Which three people should I select? 
  3. If I am to rank all my friends based on how "central" they are in this network, how would I go about? 
  4. If I were to nominate a leader for this team of 500, whom should I pick? 
  5. Any more questions that we could ask?

When and how to term a vertex important?


  • Is the vertex which has maximum degree obviously the most important vertex?
  • If the network denoted a road network in a city and if there was a vertex connecting (and hence bridging) the two big components, would you not term this new vertex as the most important one? "Degree isn't all that handy a yardstick". 
  • A vertex adjacent to 100 other vertices which are all pendant, is intuitively considered better ranked than a vertex which has 50 adjacent vertices each of degree 10. Isn't it? This is a question that will motivate us to study how Google crawls and ranks the pages on the WWW.

There is a clear call for formalism!
Unfortunately, there is not much formal definition of centrality indices, but the following two features:

  1. A vertex centrality is a real-valued function assigning to each vertex in a network some value. The higher the value, the more central the vertex is for the network (whatever the definition of centrality used).
  2. If two graphs G, H are isomorphic and p(v) denotes the mapping function from a node v in G to some node v' in H, then centrality(v) in G needs to be the same as centrality(p(v)) in H. In other words, we require that the centrality of a vertex is ONLY depending on the structure of the graph and no other contextual information.

Centrality Measures

We will start with four different centrality measures:

Degree centrality:

"An important node is involved in large number of interactions"

Formally speaking, given u
its degree centrality is defined as:



Exercises:

(i)  Stationary probabilities in the canonical random walk on a graph are proportional to the degree of the vertex.
(ii) Give an example of a pair of graphs with the same degree sequence but that are non-isomorphic.
(iii) In a random graph G(n,p), how many vertices have the same degree 'd' ?
(iv) Give an example of a situation where it makes sense to rank the vertices based on their degree.

For an interesting application of degree centrality, take a look at Jeong et al. [Jeong2001] discussing the correlation between lethality of proteins and its degree in the protein-protein interaction network.

Closeness centrality:


"An important node is typically “close” to, and can communicate quickly with, the other
nodes in the network.
"

Closeness centrality is based on geodesic distance and tries to quantify the intuitive notion of what one terms central or peripheral in a two dimensional region.


Exercise:
(i)   Why are ties less likely here as compared to degree centrality?
(ii)  Give a situation where you would like to use closeness centrality and not degree centrality.
(iii) Design a family of graphs in which the node with highest degree and the one with highest closeness centrality are arbitrarily far apart from each other.


Betweenness centrality:

"An important node will lie on a high proportion of paths between other nodes in the network."


Where,



Here,
denotes the fraction of shortest paths between s and t that contain vertex v and denotes total number of shortest paths from s to t. denotes the fraction of shortest paths from s to t that pass through v.


  • Betweenness centrality is less obvious to compute (try doing it manually for a random graph on 8 vertices)
  • Below is an interesting example of two regular graphs one which is symmetric and the other one isn't. Why?




Betweenness values of the vertices in the first graph happen to be 0.0555 uniformly, whereas the vertices in the second graph have the following betweenness values:

{0: 0.046296296296296287,

 1: 0.055555555555555552,
 2: 0.0625,
 3: 0.055555555555555539,
 4: 0.046296296296296287,
 5: 0.0625,
 6: 0.053240740740740734,
 7: 0.060185185185185168,
 8: 0.053240740740740734,
 9: 0.060185185185185168}

Exercises:


(i) Compute the degree, closeness and betweenness centrality of the following graphs:
    a)   A path of length 5
    b)   A cycle of length 6
    c)   A complete graph on 10 vertices
    d)   A star with 5 petals

(ii) Take a look at this easy to read the paper by Simsek et al. [Simsek2009] which is about an analysis of the skill development process in the tic-tac-toe game. Analysis involves the usage of betweenness centrality.

Bonacich's centrality (also called the Eigenvector centrality)


"An important node is connected to important neighbors."

We will be spending one full lecture (third lecture from now) talking about this centrality measure and a few other variations. A variation of this technique is what is popularly called the "google pageranking".

Eccentricity, Center, Median:


The set of maximum centrality vertices for some given centrality measure c(v) is given by:




Eccentricity of a vertex is defined as:


 

Radius of a graph is defined:


With the help of the radius, we define the center of a graph as:




Exercise: Show that :
, where c denotes the eccentricity based ranking as defined above.

Theorem 1:
For any tree, the center of a tree consists of at most two adjacent vertices.

Theorem 2:
Let G be a connected undirected graph. There exists a 2-vertex connected subgraph in G containing all vertices of C(G).


There are several algorithms to determine the centrality indices and one can look into [Jakob2005] for more details.


We will now discuss a very popular algorithm by Brandes [Brandes2001] to find the betweenness centrality of a graph.


Exercise*
:  A slight modification of Dijkstra's algorithm will give us all possible shortest paths from a source s to all the vertices of a connected graph. Can you try designing an algorithm for the undirected, unweigted graph?

Hint: Use the fact that vertex
v is on the shortest path from s to t iff d(s,t)=d(s,v)+d(v,t)

Brandes' algorithm to find betweenness:

Understanding the following results would facilitate us to understand Brande's algorithm better:


1)
d(s,t)=d(s,v)+d(v,t) iff v is in s shortest path from s to t.

2) d(s,v)=d(s,w)+1, where v is the predecessor of w in a shortest part from s to w


3)


4) Given that
v is a predecessor of w in a shortest path from s to w (we will denote this by pred(s,w) from now onwards):

    show that . What goes into the empty bracket?

5) show that the following holds:



Theorem:


The dependency
of a source vertex on any other vertex satisfies:


Proof: Use the above five results to prove this theorem.


Advanced centrality measures:

Subgraph centrality:

Subgraph centrality was introduced by [Estrada2005] to identify the important proteins in the yeast protein-protein interaction network. This centrality measure is based on the participation of each node in all subgraphs in a network. Smaller subgraphs are given more weight than larger ones.

If A is the adjacency matrix of a graph with vertex set, , subgraph centrality of node is defined as :


Where, is the i-th entry of .


The status index of Katz:



Comments