igraph manual

Updated April 2023. You'll need some basic knowledge of R first.


The igraph package has its own user manual, as well as its proper textbook (Kolaczyk and Csárdi 2014, Statistical analysis of network data with R. Springer). The pertaining pages in my textbook (Bruggeman 2008, pp. 114-131) are now obsolete, as well as the Graphical User Interface from that time.


Network data

Data handling is easiest in a program you feel comfortable with, separate from R, for example in an editor such as Notepad or an Excel spreadsheet (see below). However, you can create and edit data in R, explained below.

Write on each line an arc from a vertex, in the first column in a spreadsheet, to another vertex, in the second column, separated by a space, a tab, or by putting them in two different columns. You may use spaces within names and accents such as ö or ó, but it’s better to avoid these frills and to use anonymous ID numbers or simple abbreviations, e.g.


name1 name2

An arc in the opposite direction is to be written in the same columns on a new line

name2 name1


Go on with the remainder arcs, each on its own line. For an undirected network, one arc for each edge will do; when importing the data in R, you’ll tell the computer that your network is undirected (see below), which saves half of the typing effort. For a valued graph, put the values, or weights, in a third position (e.g. the third column in a spreadsheet),

name1 name2 5


Working session: open R, and set the working directory.

When doing this for the first time, install the igraph package (once) by typing:

install.packages("igraph")

When closing, with quit(), respond no if asked to save the workspace image.


Import network data

Remove remains from earlier working sessions,

rm(list=ls())

library(igraph) # activate this package for each session

Import data from working directory or internet

D <- read.table("Example.txt")

head(D)

Turn data frame into graph object,

g <- graph(t(D)) # or specify columns: graph(t(D[ ,1:2]))

When proper names are used for the nodes, it might sometimes be necessary to use:

g <- graph.data.frame(D)

If the nodes have no proper names but identity numbers, the next step is not essential now but will avoid catastrophe when examining subgraphs later on, so always do:

V(g)$name <- V(g)

The default is a directed network; for it to be undirected (edges only),

g <- graph(t(D), directed=F)

At data entry, some ties may have accidentally been typed multiple times; check this out:

is.simple(g)

g <- simplify(g) # remove multiple ties

This action removes loops too, unless you add: remove.loops = FALSE


Network images

A layout to plot a graph,

L <- layout.kamada.kawai(g) # for not too large graphs

L <- layout.fruchterman.reingold(g) # nice for large graphs

plot(g, layout=L)

If the resulting layout displeases you, run again the layout command above.

The plot command yields large white margins. To get rid of them, use par(mar=c(1,1,1,1)) before the plot command.

To plot a large graph,

plot(as.undirected(g), layout=L, vertex.label=NA, vertex.size=2)

When drawing the ties a bit curved, you can see if ties (of the same kind) are reciprocal,

E(g)$curved <- 0.2 # plot again

To save a graph image, menu: File > save as.

If you have tie weights, 'glue' them to the ties after creating a graph object g, from the place you keep them in your data file D, for example in the third column:  E(g)$weight <- D[ ,3] This goes well for arcs, but might fail for edges.

To turn multiple unweighted ties between pairs of nodes into single weighted ties,

E(g)$weight <- count.multiple(g)

g <- simplify(g) # Don’t do this before the previous lines!

E(g)$width <- 1 # default line thickness

Suppose you have a data frame called N with tie strength in the third column, you could do:

E(g)[weight > mean(N[,3])]$width <- 2 # strong ties

For a range of tie values too wide to draw, narrow them down:

E(g)$width <- sqrt(abs(N[ ,3]))

For available colors for ties and nodes, check out colors()

If using color, it must be done for all ties or nodes, not just for some, for example:

E(g)$color <- "black" # default color

E(g)[1:39]$color <- "red" # color for a certain type of ties numbered 1:39

plot(g, layout=L, edge.arrow.size=.2, edge.width=E(g)$weight, edge.color=E(g)$color)


Properties, or traits, of nodes, written in the columns of a file imported as “F”, can be assigned different colors or different shapes in the graph image. If the order of rows (i.e. nodes) in F is the same as the order of nodes in g (type head(V(g)) and head(F) to check), do the following, else use the plyr package, see 8 lines below.

V(g)$gender <- F[ ,2] # from 2nd column of a file "F"

V(g)[gender == 0]$shape <- "square" # e.g. men

V(g)[gender == 1]$shape <- "circle" # e.g. women

In the plot command, use the argument: vertex.shape=V(g)$shape

For colored nodes, see color under ties.

Example: V(g)[trait == 4]$color <- "red"  Put vertex.color=V(g)$color in the plot command.

In order not to have a colored line around the nodes, use the argument vertex.frame.color = "white"

If the order of the rows (i.e. nodes) in F differs from the order of the nodes in g, check that (or rename) the pertaining column in F is called “name” and has the names or identity numbers in common with the identities of the nodes in g; missing values and even missing names are allowed.

V <- as.matrix(V(g)$name); colnames(V) <- "name"

library(dplyr) # install this package first

V <- left_join(V, F, by="name") # or right_join, depending which data has priority; from here on, continue as above

The size of nodes, V(g)$size, can be made proportional to their centrality (below), analogous to the width of the ties (above); then in the plot, vertex.size=V(g)$size 

Also vertex labels can be resized, label.size = 2   # default is 1

Show or hide, respectively, the names or ID numbers of the nodes by using vertex.label=V(g)$name or vertex.label=NA

To play around with the positions of the nodes—only if your graph is small—try tkplot() Another nice package for network images is ggraph.


Adjacency matrix and edge list

A <- read.table("Net.txt") # import matrix

A <- as.matrix(A)

Check it out with table(A) If the adjacency matrix has weighted ties then degree, igraph's ecount and density (see below) will produce values much higher than you would expect! To prevent this from happening, binarize the matrix for those particular – but not for all – measures, with respect to some threshold value t,

A_save_copy <- A; A[A >= t] <- 1; A[A < t] <- 0;

diag(A) <- 0

g <- graph_from_adjacency_matrix(A)

If a weighted adjacency matrix is required, use weighted=TRUE in the above command.

To turn a graph g into an adjacency matrix,

A <- get.adjacency(g, sparse= FALSE)  # with tie weights: attr="weight"

To get an adjacency matrix from an incidence matrix C, with individuals in the rows and their affiliations in the columns,

A <- C %*% t(C); diag(A) <- 0

To turn an adjacency matrix into a two-column file (edge list),

E <- as_edgelist(g)

To save an adjacency matrix without R’s indices,

write.table(A, "filename.txt", col.names=FALSE, row.names=FALSE)

 

For the measures below, try them first on a couple of very small networks, of 2 to 5 nodes, in order to understand perfectly what is being computed. For some measures below, if you have a directed or a weighted graph, your results might be different than you think, which you probably won’t notice for a large network.

 

Descriptive network measures

The number of nodes,

vcount(g)   

The number of ties—which does not work for weighted adjacency matrices as input!

ecount(g)

Is the graph directed?

is.directed(g)

If the graph is not meant to be directed, use the argument as.undirected(g) in the pertaining functions, e.g. for density,

graph.density(g) # or graph.density(as.undirected(g))

Round a result with respect to data quality, e.g. to two numbers after the decimal dot,

round(some_result, 2)

Degree, possibly mode="in" or mode="out"

din <- degree(g, mode="in")

deg <- degree(g) # undirected, or in and out together

To order a computed result, here of degree, in descending order

V <- as.matrix(V(g)$name)

result <- data.frame(V,deg)

In case of an error message, first define: V(g)$name <- V(g)

colnames(result) <- c("id", "Degree")

Check head(result) if it is a numeric variable; if not, correct this with fix(result)

result2 <- result[order(-result[ ,2]), ]; head(result2)

Ego networks of a set of individuals, e.g. x and y, or V(g) for everybody:

ego(g, order = 1, nodes = c("x","y"), mode = c("all")) # or "in" or "out"

Reciprocity

reciprocity(g)

If the result is 0 or low, it might be due to the data rather than indicating that people don’t reciprocate.

Clustering measured as transitivity, averaged over the entire network; automatically undirected!

transitivity(g)

Clustering for each node individually,[1] i.e. local clustering, only undirected!

LC <- transitivity(g, "local")

Average path length–of existing paths only, it ignores all gaps between components!

average.path.length(g)

Notice that the average path length is computed over the existing paths, ignoring a.o. disconnected components that are “infinitely” far away from each other. This means that in a network of, say, only disconnected dyads, the average distance equals one!

Distribution of path lengths

L <- distance_table(g, directed=T)

x <- seq(from=1, to=length(L$res), by=1)

plot(L$res ~ x, xlab="path length", ylab="frequency", type="l", las = 1)

points(L$res ~ x)

Similarity: assortment (homophily) or influence?

Neighbors can be similar because of assortment or influence, which usually can’t be distinguished by a descriptive measure, unless you know for sure that the attribute in question was there before the network ties, e.g. gender (and even for this case there are exceptions). Modeling this thorny issue can be done in RSiena. Here, assortment is a descriptive and works for ordinal, interval and ratio level data. Put them in a vector, here called trait,

assortativity(g, trait)

If the direction of ties is to be used explicitly, see the official igraph manual. Unfortunately this function can’t handle non-available data, but if the number of missings is low, you can implement the average for them after checking their distribution. For categorical data with or without missing values, see Additional Materials.

Distributions, here of degree, if you wish  as.undirected(g)

plot(table(degree(g)), xlab="degree", ylab="frequency") # or hist(degree(g), xlab="degree", ylab="frequency")

Use a logarithmic scale for large networks of more than, say, 1000 nodes or so,

plot(degree.distribution(g, mode="in", cumulative=T), log="xy", xlab="indegree", ylab="cumulative frequency")

Generate random graphs, with or without skewed degree distribution. Suppose you have a graph g and want to have its randomized counterpart with the same size and density,

er <- sample_gnm(n = vcount(g), m = ecount(g), directed = TRUE, loops = FALSE)

Watch out that directed = FALSE or TRUE matches the original graph.

Another way to create a random graph is to first create a random adjacency matrix with, say, 30% ones and 70% zeros. Don't forget to subsequently set the diagonal to zero.

R <- matrix(sample(0:1, size = n^2, replace=TRUE, prob=c(.7, .3) ), nrow = n, ncol=n); diag(R) <- 0

Random directed graph with approximately power law degree distribution,

pg <- sample_pa(n = vcount(g), m = mean(degree(g))/2)

For community detection, which (in igraph) is on undirected graphs, a comparison is made with a randomized and undirected version of the graph with the same degree distribution. This can of course also be used for different purposes,

gd <- sample_degseq(degree(g), method = "simple.no.multiple")

Check the effect of a portion 0 ≤ p ≤ 1 of randomly missing ties,

S <- sample(1:ecount(g), p*ecount(g));

E <- as_edgelist(g); E <- E[-S,]; f <- graph(t(E))

The effect of randomly adding ties can be done by means of a sparse random (or power law) graph superimposed on the actual graph.


How is some network trait distributed over, say, 100 randomized versions of graph g? In the example, the trait is transitivity[2], 

trait <- vector()

for (i in 1:100) { er <- sample_gnm(vcount(g), ecount(g),

     directed = TRUE, loops = FALSE);

     trait[[i]] <- transitivity(er) }

hist(trait, xlim=c(min(trait),transitivity(g))); 

points(x=transitivity(g), y=0, pch=19)  # or: summary(trait)

Individual positions and social inequality

Power or status by means of Bonacich’ centrality, which for various reasons (computation of eigenvalues; lack of influence at longer distances) is simplified to distance two (Pinheiro et al 2014 PRL 112: 098702) , AT1 + b AT(AT1), where 0 ≤ b ≤ 1 tells how heavily paths of length two contribute to someone’s power, and arcs are in the direction of deference, e.g. citation or asking advice. If this is too slow for extremely large networks; use page rank, page.rank(g). If g is weighted, also use attr = "weight" in the following:

A <- get.adjacency(g, sparse=FALSE)  # with tie weights: attr="weight"

perhaps with argument names = F

I <- matrix(1, nrow = vcount(g), ncol = 1);

b <- .25

This b value is based on Pinheiro’s simulations, generally yields good predictions and makes possible comparisons across networks. If you want to stay close to Bonacich,

b = 1/max(abs(eigen(A)$values))

deference <- t(A) %*% I + b*t(A) %*% (t(A) %*% I)

If the arcs are in the same direction as the influence, e.g. orders to be followed,

influence <- A %*% I + b*A %*% (A %*% I)

For an undirected network, use the deference calculation. To rank-order the result, see degree above.

Brokerage by betweenness for paths no longer than 2 (see Burt 2010),

betw2 <- betweenness.estimate(g, cutoff=2, weights = NULL)

The same command can be executed with the additional argument directed=FALSE.

The result can be rounded and ordered (see degree, above),

betw2r <- round(betw2,2)

If you need betweenness for citation data, i.e. only on the outgoing arcs, ask me for the code.

Correlation of centrality, e.g. betw2, and something else, e.g., deference,

cor.test(betw2, deference, use="pairwise.complete.obs")

Notice that this is not the rank order correlation; then use method = "spearman"

Social groups

As a first step, look at the already separate components and their sizes.

Warning: when taking a separate component for further inspection, the node index is automatically reset, starting with 1, and mismatches the original numerical index unless you define it at the beginning. Therefore do the following, but don’t do it if the nodes already have been assigned proper names.

V(g)$name <- V(g)

comp <- decompose.graph(g); # the components

table(sapply(comp, vcount)) # or simply: clusters(g)$no

Each component is a community; to get the largest one[3]:

c <- clusters(g);

lc <- which(which.max(c$csize)==c$membership);

g1 <- induced_subgraph(g, lc) # largest component

or any component indexed i, where 1 ≤ qno.clusters(g),

gq <- induced_subgraph(g, c$membership==q)

or any subgroup of nodes with id numbers, say, in between x = i and y = j,

gij <- induced.subgraph(h, vids = c(x:y))

A fast algorithm (Louvain) for networks with positive undirected (possibly weighted) ties. Here it is for one (e.g. the largest) component,

sp <- cluster_louvain(as.undirected(g1)); max(sp$modularity)

If the communities are fuzzy, with nodes that fit other communities (almost) equally well, one run might yield a somewhat different result than a next run. This is not a defect of the software, but an acknowledgement of the fact that in society, communities and their boundaries can be fuzzy. You may decrease or increase the resolution to find, respectively, less or more communities with the argument resolution, which has default value 1. 

Plot the communities in separate colors,

L <- layout.fruchterman.reingold(g1);

plot(g1, vertex.size=4, edge.color="black", vertex.color=sp$membership, vertex.label=NA, layout=L)

Sometimes you get better colors if you write sp$membership+4 or another number.

To use the assigned community number as node attribute,

g1 <- set.vertex.attribute(g1, "com", value = sp$membership)

Order the communities and their members,

C <- cbind(V(g1)$name,V(g1)$com);

colnames(C) <- c("id","com");

C <- as.data.frame(C);

resultC <- C[order(C[ ,2]), ]; head(resultC)

length(unique(C[ ,2])) # number of communities

To make a subgraph of everyone in a given community q,

gq <- induced_subgraph(g1, V(g1)$com==q)

The distance from one given node x to all others in the same component gq,

distances(gq, to = x) # or mean(distance(…))

For negative ties and group overlaps, use cluster_spinglass() that, however, is relatively slow and less good at positive ties; see official igraph manual.

The concept of k-connectivity for social cohesion is intended for undirected graphs, or directed graphs where all arcs are in both directions, or can be sensibly interpreted as if going in both directions.[4]

cb <- cohesive.blocks(as.undirected(g))  # or take largest component

Because nodes can belong to multiple subgroups, some more than others, there is no simple way to “data.frame” these results to the nodes as we did for centrality measures. A graphical result when the graph is small:

if (interactive()) {plot(cb, g, layout = L, vertex.label=NA, vertex.size=4, edge.color="black")}

Calculate the deepest nested level of each node, which can be used to predict solidarity or commitment, and to plot,

V(g)$co <- max_cohesion(cb); # deepest nestedness of each node

max <- max(max_cohesion(cb)); # deepest nestedness among all nodes

library("RColorBrewer");  # check the colors in this package

colors <- brewer.pal(n = max, name = 'OrRd');

V(g)$color <- colors[V(g)$co]

plot(g, layout=L, edge.color="black", vertex.size=4, vertex.label=NA, vertex.color=V(g)$color)

More options are in the Additional Materials.

 

Additional Materials

Tie weights inferred from the presence of multiple ties between the same individuals

E(g)$weight <- count.multiple(g)


Turn a weighted adjacency matrix A into an edge list with tie weights in the third column

E <- as_edgelist(g);

a <- vector();   # empty vector for tie weights

k = 1;  for (i in 1:vcount(g)); 

            { for (j in 1:vcount(g));  

                 { if (A[i,j] != 0) { a[k] <- A[i,j];  k = k+1 } } };

Edges <- cbind(E, a); colnames(Edges) <- c("from","to","value");  

Edges <- as.data.frame(Edges)


Network images

For names next to, rather than on top of, the vertices, use the argument vertex.label.dist = . . . ; try values in the range .5 to 2 until the result looks pleasing.

For edge labels, e.g. a gossip that person x tells about person z to y, in, say, column 4, of the data file F, E(g)$gossip <- F[ ,4] and use the argument edge.label=E(g)$gossip For edge weights, edge.width=E(g)$width

 

Assortment

For a categorical trait with missing data, there is no igraph command. Recode the data to non-negative integers and missings to NA. The plan is to code ties between nodes with different traits -1, with the same trait 1, and to calculate assortativity as the average of these recoded ties. The graph must be transformed into a symmetric binary adjacency matrix A.

A <- as.matrix(as_adjacency_matrix(as.undirected(g))); 

# see above how to make A binary, if necessary

n <- vcount(g); m <- ecount(as.undirected(g));

P <- matrix(0, ncol=n, nrow=n);

# define the pertaining trait before continuing, e.g., a column from a file

for (i in 1:(n-1)) {

  for (j in (i+1):n) {

    P[i,j] = (trait[i] + trait[j])/(trait[i] - trait[j]) }};

P[0 < abs(P) & P < Inf] <- -1;

P[P == Inf] <- 1;

P[P = NaN] <- 1; # in case 0 is used as category value

P <- A*P;

Assortativity <- sum(P, na.rm = TRUE)/m

 

Community detection.

To see how dissimilar two partitionings of a given network are, or two subsequent network cross sectional measurements, the vi distance measure can be used,

compare(sp1, sp2, method="vi")/log2(vcount(g))

The vi distance (Meila 2007) is a mutual information measure, which says how much information you need to describe sp2 given sp1, and the other way around. The more information you need, the more distant, or dissimilar, the partitions are; if they are identical, in contrast, the distance is zero. Because the maximum distance equals log2(n), the max distance can be normalized to 1.

 

Social cohesion

To show vertices where you can break (parts of) one component gs apart by removing these nodes,

V(g)[articulation.points(g)]$color <- "red";

V(g)[articulation.points(g)]$label <- articulation.points(g);

plot(g, layout = L, vertex.size=4, edge.color="black", vertex.label=NA) # or vertex.label=V(g)$label

Some numerical results for cb as calculated above:

graph.cohesion(as.undirected(g))

cohesion(cb); blocks(cb); print(cb)

Computations for a large graph may take forever, so it’s best to first assess the components (see above), then the communities within the components (see above), and then compute social cohesion for a given community (or component if it’s not too large). For a large graph you can find out quickly if there are at least two independent paths from each node to each other node - a minimum level of social cohesion,

biconnected.components(g)$no

If the resulting number is larger than 1, the graph is partitioned into multiple 2-connected subgraphs, and then it’s not a 2-connected graph as a whole.

 

Algebraic connectivity (2nd smallest eigenvalue) as a measure of social cohesion

g <- as.undirected(g);

A <- as_adjacency_matrix(g, sparse=FALSE);

Q <- -A/rowSums(A); Q[is.nan(Q)] <- 0; diag(Q) <- 1;

sort(eigen(Q)$values,partial=2)[2]

The algebraic connectivity of Q might be a complex number if Q is asymmetric; then symmetrize before calculating it, as above, but realize that the result over-estimates cohesion by assuming perfect reciprocity.

 

Normalized entropy of tie weights (N. Eagle, M. Macy, R. Claxton 2010 Science 328, 1029). 

diversity(g)

Models

SIR, play around with different beta and gamma values,

smodel <- sir(as.undirected(g), beta = 1, gamma = 1, no.sim = 100); plot(smodel)

 

RSiena, see https://www.stats.ox.ac.uk/~snijders/siena/

 

Social influence towards consensus, if for everyone outdegree > 0

library(deSolve); # first install this package

A <- as_adjacency_matrix(g, sparse=FALSE);

Q <- -A/rowSums(A); diag(Q) <- 1;

# This is a row-normalized, but not symmetrized Laplacian

# initial opinions at t_0, for example random

x <- rnorm(vcount(g), mean = 0, sd = 2)

yini <- c(y = x); # x is vector of initial opinions

times <- seq(from = 0, to = 20, by = 0.1); # time points

derivs <- function(t, y, parms) {

with (as.list(y), {

  dy = - Q %*% y ;

  list( c(dy) ) } ) } ;

out <- ode(y = yini, times = times, func = derivs, parms = NULL)

# Look at tail(out) and organize the results:

Ve <- vector(); # st. dev. of opinions

for (i in 1:nrow(out)) {Ve[i] <- sd(out[i, 2:ncol(out)]) };

plot(Ve ~ times)

For the most influential nodes in the convergence to consensus, see power centrality.

Footnotes

[1] Local transitivity can also be done for specific nodes, for example “1” and “3” by using the argument vids=c(1,3).

[2] This script is for pedagogy, as transitivity of a random graph can be simply calculated.

[3] In the unlikely case there are two largest components exactly the same size, the algorithm picks the first.

[4] For the number of independent paths connecting two specific vertices, see the official igraph manual on graph.cohesion.