R-igraph

Updated November 2025. You'll need some basic knowledge of R.

The igraph package has its own user manual, which often mismatches the actual code. The treatment in my textbook (Bruggeman 2008, pp. 114-131) is obsolete, as well as the Graphical User Interface from that time.

Network data

Data handling is easiest in a program you feel comfortable with, separate from R, for example in an editor such as Notepad or an Excel spreadsheet (see below). However, you can create and edit data in R, explained below.

Write on each line an arc from a vertex, in the first column in a spreadsheet, to another vertex, in the second column, separated by a space, a tab, or by putting them in two different columns. You may use spaces within names and accents such as ö or ó, but it’s better to avoid these frills and to use anonymous ID numbers or simple abbreviations, e.g.,

name1 name2

An arc in the opposite direction is to be written in the same columns on a new line

name2 name1

Go on with the remainder arcs, each on its own line. For an undirected network, one arc for each edge will do; when importing the data in R, you’ll tell the computer that your network is undirected (see below), which saves half of the typing effort. For a valued graph, put the values, or weights, in a third position (e.g. the third column in a spreadsheet),

name1 name2 5

Working session: open R, and set the working directory.

When doing this for the first time, install the igraph package (once) by typing:

install.packages("igraph")

When closing, with quit(), respond no if asked to save the workspace image.

Import network data

Remove remains from earlier working sessions,

rm(list=ls())

library(igraph) # activate this package for each session

Import data from working directory or internet

D <- read.table("Example.txt")

head(D)

Turn data frame into graph object,

g <- make_graph(t(D)) # or specify columns in D, say 1 and 2:

E <- D[ ,1:2] # now do the previous step: g <- make_graph(t(E))

If there is a node called 0 (from, say, a Python file), recode all nodes by adding 1: E <- E + 1

When proper names are used for the nodes, you may sometimes have to use:

g <- graph_from_data_frame(D)

If the nodes have no proper names but identity numbers, the next step is not essential now but will avoid catastrophe when examining subgraphs later on, so always do:

V(g)$name <- V(g)

The default is a directed network; for it to be undirected,

g <- as_undirected(g)

At data entry, some ties may have accidentally been typed multiple times; check this out:

is_simple(g)

Remove multiple ties, but only if you have no tie weights; else, see below

g <- simplify(g) # This removes loops too, unless you add: remove.loops = FALSE

Network images

A layout to plot a graph,

L <- layout_with_kk(g) # for not too large graphs

L <- layout_with_fr(g) # for large graphs

plot(g, layout=L)

If the resulting layout displeases you, run again the layout command above.

The plot command yields large white margins. To get rid of them, use par(mar=c(1,1,1,1)) before the plot command.

To plot a large graph,

plot(as_undirected(g), layout=L, vertex.label=NA, vertex.size=2)

When drawing the ties a bit curved, you can see if ties (of the same kind) are reciprocal,

E(g)$curved <- 0.2 # plot again

To save a graph image, menu: File > save as.

If you have tie weights, 'glue' them to the ties after creating a graph object g, from the place you keep them in your data file D, for example in the third column: E(g)$weight <- D[ ,3] This goes well for arcs, but might fail for edges.

To turn multiple unweighted ties between pairs of nodes into single weighted ties,

E(g)$weight <- count_multiple(g)

g <- simplify(g) # Don’t do this before the previous lines!

E(g)$width <- 1 # default line thickness

Suppose you have a data frame called D with tie strength in the third column, you could do:

E(g)[weight > mean(D[,3])]$width <- 2 # strong ties

For a range of tie values too wide to draw, narrow them down, e.g., take the log and ignore tie sign:

E(g)$width <- log(abs(N[ ,3]))

For available colors for ties and nodes, check out colors()

If using color, it must be done for all ties or nodes, not just for some, for example:

E(g)$color <- "black" # default color

E(g)[20]$color <- "red" # color for a certain tie (or range of ties begin:end)

plot(g, layout=L, edge.arrow.size=.1, edge.width=E(g)$weight, edge.color=E(g)$color)

Properties, or traits, of nodes, written in the columns of a file imported as “F”, can be assigned different colors or different shapes in the graph image. If the order of rows (i.e. nodes) in F is the same as the order of nodes in g (type head(V(g)) and head(F) to check), do the following, else use the plyr package, see 8 lines below.

V(g)$gender <- F[ ,2] # from 2nd column of a file "F"

Alternatively,

g <- set_vertex_attr(g, "gender", value = F[ ,2])

To distinguish traits (e.g., gender) in the plot,

V(g)[gender == 0]$shape <- "circle" # e.g., women

V(g)[gender == 1]$shape <- "square" # e.g., men

and other shapes for other genders. In the plot command, use the argument: vertex.shape=V(g)$shape

For colored nodes, see color under ties.

Example: V(g)[trait == 4]$color <- "red" Put vertex.color=V(g)$color in the plot command.

In order not to have a colored line around the nodes, use the argument vertex.frame.color = "white"

If the order of the rows (i.e. nodes) in F differs from the order of the nodes in g, check that (or rename) the pertaining column in F is called “name” and has the names or identity numbers in common with the identities of the nodes in g; missing values and even missing names are allowed.

V <- as.matrix(V(g)$name); colnames(V) <- "name"

library(dplyr)

V <- left_join(V, F, by="name") # or right_join, depending which data has priority; from here on, continue as above

The size of nodes, V(g)$size, can be made proportional to their centrality (below), analogous to the width of the ties (above); then in the plot, vertex.size=V(g)$size

Also vertex labels can be resized, label.size = 2 # default is 1

Show or hide, respectively, the names or ID numbers of the nodes by using vertex.label=V(g)$name or vertex.label=NA

To play around with the positions of the nodes—only if your graph is small—try tkplot() Another nice package for network images is ggraph.

Adjacency matrix and edge list

The usual way to get an adjacency matrix is to (i) import an edge list (see above), (ii) turn it into a graph object, and (iii) turn the graph object into an adjacency matrix,

A <- as_adjacency_matrix(g, type = "both", sparse = FALSE)

If g is undirected undirected, leave out type. With tie weights: attr="weight" and perhaps with names = F.

If unsure about weights, check it out with table(A). If the adjacency matrix has weighted ties, then calculating degree and density on the basis of it will produce values much different from what you may expect! To prevent this from happening, binarize the matrix for these particular – but not for all – measures with respect to some threshold value t,

A_save_copy <- A; A[A >= t] <- 1; A[A < t] <- 0;

diag(A) <- 0

Save an adjacency matrix without R’s indices,

write.table(A, "File_name.txt", col.names=FALSE, row.names=FALSE)

When importing a matrix-like object directly into R, make sure that R will treat it as a matrix object,

A <- as.matrix(A)

Working in the opposite direction, from A to edge list, first turn the adjacency matrix into a graph object,

g <- graph_from_adjacency_matrix(A)

If a weighted adjacency matrix is required, use weighted=TRUE in the above command.

Turn the graph object into a two-column edge list,

E <- as_edgelist(g)

For tie weights in a third column, see Additional Materials below.

To get an adjacency matrix from an incidence matrix C, with individuals in the rows and their affiliations in the columns,

A <- C %*% t(C); diag(A) <- 0

For the measures below, try them first on a couple of very small networks, of 2 to 5 nodes, in order to understand perfectly what is being computed. For some measures below, if you have a directed or a weighted graph, your results might be different than you think, which you probably won’t notice for a large network.

Descriptive network measures

The number of nodes,

vcount(g)

The number of ties—that does not work for weighted adjacency matrices as input! If you have weighted ties, ecount may yield a number much higher or lower than you expect!

ecount(g)

Is the graph directed?

is_directed(g)

If the graph is not meant to be directed, use the argument as_undirected(g) in the pertaining functions, e.g. for density,

edge_density(g) # or edge_density(as_undirected(g))

Round a result with respect to data quality, e.g. to two numbers after the decimal dot,

round(some_result, 2)

Degree, possibly mode="in" or mode="out"

din <- degree(g, mode="in")

deg <- degree(g) # undirected, or in and out together

Plot as histogram:

hist(deg, xlab="Degree", ylab="Frequency", breaks=max(deg), xlim=c(min(deg)-1,max(deg)+3))

and adjust the breaks and xlim to the data at hand.

Plot as distribution, if you wish as_undirected(g)

plot(table(degree(g)), xlab="Degree", ylab="Frequency")

Use a logarithmic scale for large networks of more than, say, 1000 nodes or so,

plot(degree_distribution(g, mode="in", cumulative=T), log="xy", xlab="indegree", ylab="cumulative frequency")

To find the individuals with the highest degree, order the result in descending order:

V <- as.matrix(V(g)$name)

result <- data.frame(V,deg)

In case of an error message, first define: V(g)$name <- V(g)

colnames(result) <- c("id", "Degree")

Check head(result) if it is a numeric variable; if not, correct this with fix(result)

The people with the highest degree:

result2 <- result[order(-result[ ,2]), ]; head(result2)

Ego networks of a set of individuals, e.g. two individuals {x, y}, or V(g) for everybody:

ego(g, order = 1, nodes = c("x","y"), mode = c("all")) # or "in" or "out"

Reciprocity

reciprocity(g)

If the result is 0 or low, it might be due to the data rather than indicating that people don’t reciprocate.

Clustering measured as transitivity, averaged over the entire network; automatically undirected!

transitivity(g)

Clustering for each node individually,[1] i.e. local clustering; calculated as if graph is undirected!

LC <- transitivity(g, "local")

Average path length–of existing paths only, it ignores all gaps between components!

The current version of igraph (2.1.1) has a bug, hence the following does not work, but you can use distance_table (below) and calculate it on the basis of the frequencies of each distance listed in the table.

mean_distance(g, details=TRUE)

Notice that the average path length is computed over the existing paths, ignoring a.o. disconnected components that are “infinitely” far away from each other. This means that in a network of, say, only disconnected dyads, the average distance equals one!

Distribution of path lengths as histogram (above), which is nicest for relatively small graphs, or:

L <- distance_table(g, directed=T)

x <- seq(from=1, to=length(L$res), by=1)

plot(L$res ~ x, xlab="Path length", ylab="Frequency of occurrence", type="l", las = 1)

points(L$res ~ x)

Similarity: assortment (homophily) or influence?

Neighbors can be similar on a given trait (e.g., political opinion) because of assortment or influence, which usually can’t be distinguished by a descriptive measure, unless you know for sure that the attribute in question was there before the network ties, e.g. gender (and even for this case there are exceptions). Modeling this thorny issue can be done in RSiena. Here, assortment is a descriptive, not an explanation, and works for ordinal, interval and ratio level data. Put the trait (from a column in a data.frame) in a vector, here called "trait", and put the network in an edge list format, called E, and remove missing data (coded NA)

clean_trait <- na.omit(trait)

assortativity(g, clean_trait)

If the direction of ties is to be used explicitly, see the official igraph manual.

Generate random graphs, with or without skewed degree distribution. Suppose you have a graph g and want to have its randomized counterpart with the same size and density,

er <- sample_gnm(n = vcount(g), m = ecount(g), directed = TRUE, loops = FALSE)

directed = FALSE often does not work, so if you have to, make it undirected afterwards, and check if the density is what you expect.

Another way to create a random graph is to first create a random adjacency matrix with, say, 30% ones and 70% zeros. Don't forget to subsequently set the diagonal to zero.

R <- matrix(sample(0:1, size = n^2, replace=TRUE, prob=c(.7, .3) ), nrow = n, ncol=n); diag(R) <- 0

Random directed graph with approximately power law degree distribution; first decide how steep the line should descend in the exponent.out argument,

pg <- sample_fitness_pl(no.of.nodes = n, no.of.edges = m, exponent.out = 2.3)

For community detection, which (in igraph) is on undirected graphs, a comparison is made with a randomized and undirected version of the graph with the same degree distribution. This can of course also be used for different purposes,

gd <- sample_degseq(degree(as_undirected(g)), method = "vl")

The robustness of a network measure can be tested by adding or deleting a portion 0 ≤ p ≤ 1 of ties randomly. Delete ties:

S <- sample(1:ecount(g), p*ecount(g));

E <- as_edgelist(g); E <- E[-S,]; f <- make_graph(t(E))

The effect of randomly adding ties can be done by means of a sparse graph (given p) superimposed on the actual graph.

How is some network trait distributed over, say, 100 randomized versions of graph g? In the example, the trait is transitivity[2],

trait <- vector()

for (i in 1:100) { er <- sample_gnm(vcount(g), ecount(g),

directed = TRUE, loops = FALSE);

trait[[i]] <- transitivity(er) }

hist(trait, xlim=c(min(trait)-.01,transitivity(g)+0.05));

# adjust the numbers in xlim according to your data and taste

points(x=transitivity(g), y=0, pch=19) # or: summary(trait)

Individual positions and social inequality

Power or status by means of Bonacich’ centrality, which for various reasons (computation of eigenvalues; lack of influence at longer distances) is simplified to distance two,

cp = AT1 + b AT(AT1), where 0 ≤ b ≤ 1 tells how heavily paths of length two contribute to someone’s power, and arcs are in the direction of deference, e.g. citation or asking advice. If this is too slow for extremely large networks; use page rank, page.rank(g). If g is weighted, also use attr = "weight" in the following:

A <- as_adjacency_matrix(g, sparse=FALSE)

I <- matrix(1, nrow = vcount(g), ncol = 1);

b = 1/max(abs(eigen(A)$values));

deference <- t(A) %*% I + b*t(A) %*% (t(A) %*% I)

If the arcs are in the same direction as the influence, e.g. orders to be followed,

influence <- A %*% I + b*A %*% (A %*% I)

For an undirected network, use the deference calculation. To rank-order the result, see degree above.

Brokerage by betweenness for paths no longer than 2 (see Burt 2010),

betw2 <- betweenness(g, cutoff=2, weights = NULL)

The same command can be executed with the additional argument directed=FALSE.

The result can be rounded and ordered (see degree, above),

betw2r <- round(betw2,2)

If you need betweenness for citation data, i.e. only on the outgoing arcs, ask me for the code.

Correlation of centrality, e.g. betw2, and something else, e.g., deference,

cor.test(betw2, deference, use="pairwise.complete.obs")

Notice that this is not the rank order correlation; then use method = "spearman"

Social groups

As a first step, look at the already separate components and their sizes.

Warning: when taking a separate component for further inspection, the node index is automatically reset, starting with 1, and mismatches the original numerical index unless you define it at the beginning. Therefore do the following, but don’t do it if the nodes already have been assigned proper names.

V(g)$name <- V(g)

comp <- decompose(g); # the components

table(sapply(comp, vcount)) # or simply: components(g)$no

Each component is a community; to get the largest one[3]:

c <- components(g);

lc <- which(which.max(c$csize)==c$membership);

g1 <- induced_subgraph(g, lc) # largest component

A component indexed q, where 1 ≤ q ≤ no.clusters(g),

gq <- induced_subgraph(g, c$membership == q)

or any subgroup of nodes with id numbers, say, in between x and y,

gxy <- induced_subgraph(g, vids = c(x:y))

The distance from one given node x to all others in the same component gq,

distances(gq, to = x) # or mean(distance(…))

A fast algorithm (by my former student Vincent Traag) for community detection in a network component with positive, undirected, possibly weighted, ties is "Leiden", called after the place of creation.

sp <- cluster_leiden(as_undirected(g))

length(unique(sp)) # number of communities

modularity(g, membership(sp))

You may decrease or increase the resolution to find, respectively, less (and less dense), or more (and more dense) subcommunities with the argument resolution, which has default value 1. Define the resolution, r, and

sp2 <- cluster_leiden(as_undirected(g), resolution=r)

Plot the communities,

V(g)$com <- sp$membership

L <- layout_with_fr(g);

plot(g, vertex.size=10, edge.color="black", vertex.color=V(g)$com, vertex.label=NA, layout=L)

Sometimes you get better colors if you write sp$membership+4 or another number.

If you have directed ties and/or negative ties and/or want to delve into community overlaps, use the slower and somewhat less accurate cluster_spinglass() If you know in advance how many communities there are (e.g., two, say, Republicans and Democrats), you can set this number by the argument spins = 2

For further calculations where community is a control variable, you may want to order the communities and their members,

C <- cbind(V(g)$name,V(g)$com);

colnames(C) <- c("id","com");

C <- as.data.frame(C);

communities <- C[order(C[ ,1]), ]

To make a subgraph of everyone in a given (sub)community q,

V(g)$com <- sp$membership # now choose q

gq <- induced_subgraph(g, V(g)$com == q)

The social cohesion of a given graph g as its algebraic connectivity [weighing both redundancy and proximity]

A <- as_adjacency_matrix(as_undirected(g), sparse=FALSE);

Q <- -A/rowSums(A); Q[is.nan(Q)] <- 0; diag(Q) <- 1;

sort(eigen(Q)$values,partial=2)[2]

The algebraic connectivity of Q might be a complex number if Q is directed (indicated by +0i in the outcome); then, symmetrize the ties before calculating, but realize that the result over-estimates cohesion by assuming perfect reciprocity. Alternatively, select the bi-directed ties and discard the non-reciprocated ties, and proceed from there.

One can also calculate separately the two main components of algebraic connectivity: the mean distance (see above) and the minimum number of node-independent paths,

cohesion(as_undirected(g))

Additional Materials

Tie weights inferred from the presence of multiple ties between the same individuals

E(g)$weight <- count_multiple(g)

Turn a weighted adjacency matrix A into an edge list with tie weights in the third column

E <- as_edgelist(g)

a <- vector() # empty vector for tie weights

k = 1

for (i in 1:vcount(g))

{ for (j in 1:vcount(g))

{ if (A[i,j] != 0) { a[k] <- A[i,j]; k = k+1 } } }

Edges <- cbind(E, a); colnames(Edges) <- c("from","to","value")

Edges <- as.data.frame(Edges)

Network images

For names next to, rather than on top of, the vertices, use the argument vertex.label.dist = . . . ; try values in the range .5 to 2 until the result looks pleasing.

For edge labels, e.g. a gossip that person x tells about person z to y, in, say, column 4, of the data file F, E(g)$gossip <- F[ ,4] and use the argument edge.label=E(g)$gossip For edge weights, edge.width=E(g)$width

Community detection.

To see how dissimilar two partitionings of a given network are, or two subsequent network cross sectional measurements, the vi distance measure can be used. This makes sense when using the spinglass algorithm.

compare(sp1, sp2, method="vi")/log2(vcount(g))

The vi distance (Meila 2007) is a mutual information measure, which says how much information you need to describe sp2 given sp1, and the other way around. The more information you need, the more distant, or dissimilar, the partitions are; if they are identical, in contrast, the distance is zero. Because the maximum distance equals log2(n), the max distance can be normalized to 1.

Normalized entropy of tie weights (N. Eagle, M. Macy, R. Claxton 2010 Science 328: 1029).

diversity(g)

Models

SIR, play around with different beta and gamma values, or program it by yourself (not too difficult),

smodel <- sir(as_undirected(g), beta = 1, gamma = 1, no.sim = 100); plot(smodel)

RSiena, see https://www.stats.ox.ac.uk/~snijders/siena/

Social influence towards consensus, if for everyone outdegree > 0

library(deSolve); # first install this package

A <- as_adjacency_matrix(g, sparse=FALSE);

Q <- -A/rowSums(A); diag(Q) <- 1;

# This is a row-normalized, but not symmetrized Laplacian

# initial opinions at t_0, for example random

x <- rnorm(vcount(g), mean = 0, sd = 2)

yini <- c(y = x); # x is vector of initial opinions

times <- seq(from = 0, to = 20, by = 0.1); # time points

derivs <- function(t, y, parms) {

with (as.list(y), {

dy = - Q %*% y ;

list( c(dy) ) } ) } ;

out <- ode(y = yini, times = times, func = derivs, parms = NULL)

# Look at tail(out) and organize the results:

Ve <- vector(); # st. dev. of opinions

for (i in 1:nrow(out)) {Ve[i] <- sd(out[i, 2:ncol(out)]) };

plot(Ve ~ times)

For the most influential nodes in the convergence to consensus, see power centrality.

Footnotes

[1] Local transitivity can also be calculated for specific nodes, for example “1” and “3” by using the argument vids=c(1,3).

[2] This script is for pedagogy, as transitivity of a random graph can be simply calculated.

[3] In the unlikely case there are two largest components exactly the same size, the algorithm picks the first.

[4] For the number of independent paths connecting two specific vertices, see the official igraph manual on graph.cohesion.

Page updated

Google Sites

Report abuse