Updated November 2024. You'll need some basic knowledge of R.
The igraph package has its own user manual, which often mismatches the actual code. The treatment in my textbook (Bruggeman 2008, pp. 114-131) is obsolete, as well as the Graphical User Interface from that time.
Network data
Data handling is easiest in a program you feel comfortable with, separate from R, for example in an editor such as Notepad or an Excel spreadsheet (see below). However, you can create and edit data in R, explained below.
Write on each line an arc from a vertex, in the first column in a spreadsheet, to another vertex, in the second column, separated by a space, a tab, or by putting them in two different columns. You may use spaces within names and accents such as ö or ó, but it’s better to avoid these frills and to use anonymous ID numbers or simple abbreviations, e.g.,
name1 name2
An arc in the opposite direction is to be written in the same columns on a new line
name2 name1
Go on with the remainder arcs, each on its own line. For an undirected network, one arc for each edge will do; when importing the data in R, you’ll tell the computer that your network is undirected (see below), which saves half of the typing effort. For a valued graph, put the values, or weights, in a third position (e.g. the third column in a spreadsheet),
name1 name2 5
Working session: open R, and set the working directory.
When doing this for the first time, install the igraph package (once) by typing:
install.packages("igraph")
When closing, with quit(), respond no if asked to save the workspace image.
Import network data
Remove remains from earlier working sessions,
rm(list=ls())
library(igraph) # activate this package for each session
Import data from working directory or internet
D <- read.table("Example.txt")
head(D)
Turn data frame into graph object,
g <- make_graph(t(D)) # or specify columns in D, say 1 and 2:
E <- D[ ,1:2] # now do the previous step: g <- make_graph(t(E))
If there is a node called 0 (from, say, a Python file), recode all nodes by adding 1: E <- E + 1
When proper names are used for the nodes, you may sometimes have to use:
g <- graph_from_data_frame(D)
If the nodes have no proper names but identity numbers, the next step is not essential now but will avoid catastrophe when examining subgraphs later on, so always do:
V(g)$name <- V(g)
The default is a directed network; for it to be undirected,
g <- as_undirected(g)
At data entry, some ties may have accidentally been typed multiple times; check this out:
is_simple(g)
g <- simplify(g) # remove multiple ties
This action removes loops too, unless you add: remove.loops = FALSE
Network images
A layout to plot a graph,
L <- layout_with_kk(g) # for not too large graphs
L <- layout_with_fr(g) # for large graphs
plot(g, layout=L)
If the resulting layout displeases you, run again the layout command above.
The plot command yields large white margins. To get rid of them, use par(mar=c(1,1,1,1)) before the plot command.
To plot a large graph,
plot(as_undirected(g), layout=L, vertex.label=NA, vertex.size=2)
When drawing the ties a bit curved, you can see if ties (of the same kind) are reciprocal,
E(g)$curved <- 0.2 # plot again
To save a graph image, menu: File > save as.
If you have tie weights, 'glue' them to the ties after creating a graph object g, from the place you keep them in your data file D, for example in the third column: E(g)$weight <- D[ ,3] This goes well for arcs, but might fail for edges.
To turn multiple unweighted ties between pairs of nodes into single weighted ties,
E(g)$weight <- count_multiple(g)
g <- simplify(g) # Don’t do this before the previous lines!
E(g)$width <- 1 # default line thickness
Suppose you have a data frame called D with tie strength in the third column, you could do:
E(g)[weight > mean(D[,3])]$width <- 2 # strong ties
For a range of tie values too wide to draw, narrow them down, e.g., take the log and ignore tie sign:
E(g)$width <- log(abs(N[ ,3]))
For available colors for ties and nodes, check out colors()
If using color, it must be done for all ties or nodes, not just for some, for example:
E(g)$color <- "black" # default color
E(g)[20]$color <- "red" # color for a certain tie (or range of ties begin:end)
plot(g, layout=L, edge.arrow.size=.1, edge.width=E(g)$weight, edge.color=E(g)$color)
Properties, or traits, of nodes, written in the columns of a file imported as “F”, can be assigned different colors or different shapes in the graph image. If the order of rows (i.e. nodes) in F is the same as the order of nodes in g (type head(V(g)) and head(F) to check), do the following, else use the plyr package, see 8 lines below.
V(g)$gender <- F[ ,2] # from 2nd column of a file "F"
Alternatively,
g <- set_vertex_attr(g, "gender", value = F[ ,2])
To distinguish traits (e.g., gender) in the plot,
V(g)[gender == 0]$shape <- "circle" # e.g., women
V(g)[gender == 1]$shape <- "square" # e.g., men
and other shapes for other genders. In the plot command, use the argument: vertex.shape=V(g)$shape
For colored nodes, see color under ties.
Example: V(g)[trait == 4]$color <- "red" Put vertex.color=V(g)$color in the plot command.
In order not to have a colored line around the nodes, use the argument vertex.frame.color = "white"
If the order of the rows (i.e. nodes) in F differs from the order of the nodes in g, check that (or rename) the pertaining column in F is called “name” and has the names or identity numbers in common with the identities of the nodes in g; missing values and even missing names are allowed.
V <- as.matrix(V(g)$name); colnames(V) <- "name"
library(dplyr)
V <- left_join(V, F, by="name") # or right_join, depending which data has priority; from here on, continue as above
The size of nodes, V(g)$size, can be made proportional to their centrality (below), analogous to the width of the ties (above); then in the plot, vertex.size=V(g)$size
Also vertex labels can be resized, label.size = 2 # default is 1
Show or hide, respectively, the names or ID numbers of the nodes by using vertex.label=V(g)$name or vertex.label=NA
To play around with the positions of the nodes—only if your graph is small—try tkplot() Another nice package for network images is ggraph.
Adjacency matrix and edge list
The usual way to get an adjacency matrix is to (i) import an edge list (see above), (ii) turn it into a graph object, and (iii) turn the graph object into an adjacency matrix,
A <- as_adjacency_matrix(g, type = "both", sparse = FALSE)
If g is undirected undirected, leave out type. With tie weights: attr="weight" and perhaps with names = F.
If unsure about weights, check it out with table(A). If the adjacency matrix has weighted ties, then calculating degree and density on the basis of it will produce values much different from what you may expect! To prevent this from happening, binarize the matrix for these particular – but not for all – measures with respect to some threshold value t,
A_save_copy <- A; A[A >= t] <- 1; A[A < t] <- 0;
diag(A) <- 0
Save an adjacency matrix without R’s indices,
write.table(A, "File_name.txt", col.names=FALSE, row.names=FALSE)
When importing a matrix-like object directly into R, make sure that R will treat it as a matrix object,
A <- as.matrix(A)
Working in the opposite direction, from A to edge list, first turn the adjacency matrix into a graph object,
g <- graph_from_adjacency_matrix(A)
If a weighted adjacency matrix is required, use weighted=TRUE in the above command.
Turn the graph object into a two-column edge list,
E <- as_edgelist(g)
For tie weights in a third column, see Additional Materials below.
To get an adjacency matrix from an incidence matrix C, with individuals in the rows and their affiliations in the columns,
A <- C %*% t(C); diag(A) <- 0
For the measures below, try them first on a couple of very small networks, of 2 to 5 nodes, in order to understand perfectly what is being computed. For some measures below, if you have a directed or a weighted graph, your results might be different than you think, which you probably won’t notice for a large network.
Descriptive network measures
The number of nodes,
vcount(g)
The number of ties—that does not work for weighted adjacency matrices as input! If you have weighted ties, ecount may yield a number much higher or lower than you expect!
ecount(g)
Is the graph directed?
is_directed(g)
If the graph is not meant to be directed, use the argument as_undirected(g) in the pertaining functions, e.g. for density,
edge_density(g) # or edge_density(as_undirected(g))
Round a result with respect to data quality, e.g. to two numbers after the decimal dot,
round(some_result, 2)
Degree, possibly mode="in" or mode="out"
din <- degree(g, mode="in")
deg <- degree(g) # undirected, or in and out together
Plot as histogram:
hist(deg, xlab="Degree", ylab="Frequency", breaks=max(deg), xlim=c(min(deg)-1,max(deg)+3))
and adjust the breaks and xlim to the data at hand.
Plot as distribution, if you wish as_undirected(g)
plot(table(degree(g)), xlab="Degree", ylab="Frequency")
Use a logarithmic scale for large networks of more than, say, 1000 nodes or so,
plot(degree_distribution(g, mode="in", cumulative=T), log="xy", xlab="indegree", ylab="cumulative frequency")
To find the individuals with the highest degree, order the result in descending order:
V <- as.matrix(V(g)$name)
result <- data.frame(V,deg)
In case of an error message, first define: V(g)$name <- V(g)
colnames(result) <- c("id", "Degree")
Check head(result) if it is a numeric variable; if not, correct this with fix(result)
The people with the highest degree:
result2 <- result[order(-result[ ,2]), ]; head(result2)
Ego networks of a set of individuals, e.g. two individuals {x, y}, or V(g) for everybody:
ego(g, order = 1, nodes = c("x","y"), mode = c("all")) # or "in" or "out"
Reciprocity
reciprocity(g)
If the result is 0 or low, it might be due to the data rather than indicating that people don’t reciprocate.
Clustering measured as transitivity, averaged over the entire network; automatically undirected!
transitivity(g)
Clustering for each node individually,[1] i.e. local clustering; calculated as if graph is undirected!
LC <- transitivity(g, "local")
Average path length–of existing paths only, it ignores all gaps between components!
The current version of igraph (2.1.1) has a bug, hence the following does not work, but you can use distance_table (below) and calculate it on the basis of the frequencies of each distance listed in the table.
mean_distance(g, details=TRUE)
Notice that the average path length is computed over the existing paths, ignoring a.o. disconnected components that are “infinitely” far away from each other. This means that in a network of, say, only disconnected dyads, the average distance equals one!
Distribution of path lengths as histogram (above), which is nicest for relatively small graphs, or:
L <- distance_table(g, directed=T)
x <- seq(from=1, to=length(L$res), by=1)
plot(L$res ~ x, xlab="Path length", ylab="Frequency of occurrence", type="l", las = 1)
points(L$res ~ x)
Similarity: assortment (homophily) or influence?
Neighbors can be similar on a given trait (e.g., political opinion) because of assortment or influence, which usually can’t be distinguished by a descriptive measure, unless you know for sure that the attribute in question was there before the network ties, e.g. gender (and even for this case there are exceptions). Modeling this thorny issue can be done in RSiena. Here, assortment is a descriptive, not an explanation, and works for ordinal, interval and ratio level data. Put the trait (from a column in a data.frame) in a vector, here called "trait", and put the network in an edge list format, called E, and remove missing data (coded NA)
clean_trait <- na.omit(trait)
assortativity(g, clean_trait)
If the direction of ties is to be used explicitly, see the official igraph manual.
Generate random graphs, with or without skewed degree distribution. Suppose you have a graph g and want to have its randomized counterpart with the same size and density,
er <- sample_gnm(n = vcount(g), m = ecount(g), directed = TRUE, loops = FALSE)
directed = TRUE often does not work, so if you have to, make it undirected afterwards, and check if the density is what you expect (it doubles w.r.t. the directed graph).
Another way to create a random graph is to first create a random adjacency matrix with, say, 30% ones and 70% zeros. Don't forget to subsequently set the diagonal to zero.
R <- matrix(sample(0:1, size = n^2, replace=TRUE, prob=c(.7, .3) ), nrow = n, ncol=n); diag(R) <- 0
Random directed graph with approximately power law degree distribution; first decide how steep the line should descend in the exponent.out argument,
pg <- sample_fitness_pl(no.of.nodes = n, no.of.edges = m, exponent.out = 2.3)
For community detection, which (in igraph) is on undirected graphs, a comparison is made with a randomized and undirected version of the graph with the same degree distribution. This can of course also be used for different purposes,
gd <- sample_degseq(degree(as_undirected(g)), method = "vl")
The robustness of a network measure can be tested by adding or deleting a portion 0 ≤ p ≤ 1 of ties randomly. Delete ties:
S <- sample(1:ecount(g), p*ecount(g));
E <- as_edgelist(g); E <- E[-S,]; f <- make_graph(t(E))
The effect of randomly adding ties can be done by means of a sparse graph (given p) superimposed on the actual graph.
How is some network trait distributed over, say, 100 randomized versions of graph g? In the example, the trait is transitivity[2],
trait <- vector()
for (i in 1:100) { er <- sample_gnm(vcount(g), ecount(g),
directed = TRUE, loops = FALSE);
trait[[i]] <- transitivity(er) }
hist(trait, xlim=c(min(trait)-.01,transitivity(g)+0.05));
# adjust the numbers in xlim according to your data and taste
points(x=transitivity(g), y=0, pch=19) # or: summary(trait)
Individual positions and social inequality
Power or status by means of Bonacich’ centrality, which for various reasons (computation of eigenvalues; lack of influence at longer distances) is simplified to distance two,
cp = AT1 + b AT(AT1), where 0 ≤ b ≤ 1 tells how heavily paths of length two contribute to someone’s power, and arcs are in the direction of deference, e.g. citation or asking advice. If this is too slow for extremely large networks; use page rank, page.rank(g). If g is weighted, also use attr = "weight" in the following:
A <- as_adjacency_matrix(g, sparse=FALSE)
I <- matrix(1, nrow = vcount(g), ncol = 1);
b = 1/max(abs(eigen(A)$values));
deference <- t(A) %*% I + b*t(A) %*% (t(A) %*% I)
If the arcs are in the same direction as the influence, e.g. orders to be followed,
influence <- A %*% I + b*A %*% (A %*% I)
For an undirected network, use the deference calculation. To rank-order the result, see degree above.
Brokerage by betweenness for paths no longer than 2 (see Burt 2010),
betw2 <- betweenness(g, cutoff=2, weights = NULL)
The same command can be executed with the additional argument directed=FALSE.
The result can be rounded and ordered (see degree, above),
betw2r <- round(betw2,2)
If you need betweenness for citation data, i.e. only on the outgoing arcs, ask me for the code.
Correlation of centrality, e.g. betw2, and something else, e.g., deference,
cor.test(betw2, deference, use="pairwise.complete.obs")
Notice that this is not the rank order correlation; then use method = "spearman"
Social groups
As a first step, look at the already separate components and their sizes.
Warning: when taking a separate component for further inspection, the node index is automatically reset, starting with 1, and mismatches the original numerical index unless you define it at the beginning. Therefore do the following, but don’t do it if the nodes already have been assigned proper names.
V(g)$name <- V(g)
comp <- decompose.graph(g); # the components
table(sapply(comp, vcount)) # or simply: components(g)$no
Each component is a community; to get the largest one[3]:
c <- components(g);
lc <- which(which.max(c$csize)==c$membership);
g1 <- induced_subgraph(g, lc) # largest component
A component indexed q, where 1 ≤ q ≤ no.clusters(g),
gq <- induced_subgraph(g, c$membership == q)
or any subgroup of nodes with id numbers, say, in between x and y,
gxy <- induced_subgraph(g, vids = c(x:y))
The distance from one given node x to all others in the same component gq,
distances(gq, to = x) # or mean(distance(…))
A fast algorithm (by my former student Vincent Traag) for community detection in a network component with positive, undirected, possibly weighted, ties is "Leiden", called after the place of creation.
sp <- cluster_leiden(as_undirected(g))
length(unique(sp)) # number of communities
modularity(g, membership(sp))
You may decrease or increase the resolution to find, respectively, less (and less dense), or more (and more dense) subcommunities with the argument resolution, which has default value 1. Define the resolution, r, and
sp2 <- cluster_leiden(as_undirected(g), resolution=r)
Plot the communities,
V(g)$com <- sp$membership
L <- layout_with_fr(g);
plot(g, vertex.size=4, edge.color="black", vertex.color=V(g)$com, vertex.label=NA, layout=L)
Sometimes you get better colors if you write sp$membership+4 or another number.
If you have directed ties and/or negative ties and/or want to delve into community overlaps, use the slower and somewhat less accurate cluster_spinglass() If you know in advance how many communities there are (e.g., two, say, Republicans and Democrats), you can set this number by the argument spins = 2
For further calculations where community is a control variable, you may want to order the communities and their members,
C <- cbind(V(g)$name,V(g)$com);
colnames(C) <- c("id","com");
C <- as.data.frame(C);
resultC <- C[order(C[ ,2]), ] # check: sometimes, this yields more rows than the original!!
To make a subgraph of everyone in a given (sub)community q,
gq <- induced_subgraph(g, V(g)$com==q)
The social cohesion of a given subgraph gq as its algebraic connectivity [weighing both redundancy and proximity] is
Ln <- laplacian_matrix(gq, normalized=TRUE, sparse = FALSE);
sort(eigen(Ln)$values,partial=2)[2]
Alternatively, one can proceed through the row-normalized adjacency matrix:
A <- as_adjacency_matrix(as_undirected(gq), sparse=FALSE);
Q <- -A/rowSums(A); Q[is.nan(Q)] <- 0; diag(Q) <- 1;
sort(eigen(Q)$values,partial=2)[2]
Ln is not the same as Q but their eigenvalues are the same.
The algebraic connectivity of Q might be a complex number if Q is directed (indicated by +0i in the outcome); then, symmetrize the ties before calculating, but realize that the result over-estimates cohesion by assuming perfect reciprocity. Alternatively, select the bi-directed ties and discard the non-reciprocated ties.
One can also calculate separately the components of algebraic cohesion: the mean distance (see above) and the number of node-independent paths,
cohesion(as_undirected(g))
Additional Materials
Tie weights inferred from the presence of multiple ties between the same individuals
E(g)$weight <- count.multiple(g)
Turn a weighted adjacency matrix A into an edge list with tie weights in the third column
E <- as_edgelist(g);
a <- vector(); # empty vector for tie weights
k = 1;
for (i in 1:vcount(g));
{ for (j in 1:vcount(g));
{ if (A[i,j] != 0) { a[k] <- A[i,j]; k = k+1 } } };
Edges <- cbind(E, a); colnames(Edges) <- c("from","to","value");
Edges <- as.data.frame(Edges)
Network images
For names next to, rather than on top of, the vertices, use the argument vertex.label.dist = . . . ; try values in the range .5 to 2 until the result looks pleasing.
For edge labels, e.g. a gossip that person x tells about person z to y, in, say, column 4, of the data file F, E(g)$gossip <- F[ ,4] and use the argument edge.label=E(g)$gossip For edge weights, edge.width=E(g)$width
Community detection.
To see how dissimilar two partitionings of a given network are, or two subsequent network cross sectional measurements, the vi distance measure can be used. This makes sense when using the spinglass algorithm.
compare(sp1, sp2, method="vi")/log2(vcount(g))
The vi distance (Meila 2007) is a mutual information measure, which says how much information you need to describe sp2 given sp1, and the other way around. The more information you need, the more distant, or dissimilar, the partitions are; if they are identical, in contrast, the distance is zero. Because the maximum distance equals log2(n), the max distance can be normalized to 1.
Normalized entropy of tie weights (N. Eagle, M. Macy, R. Claxton 2010 Science 328: 1029).
diversity(g)
Models
SIR, play around with different beta and gamma values, or program it by yourself (not too difficult),
smodel <- sir(as_undirected(g), beta = 1, gamma = 1, no.sim = 100); plot(smodel)
RSiena, see https://www.stats.ox.ac.uk/~snijders/siena/
Social influence towards consensus, if for everyone outdegree > 0
library(deSolve); # first install this package
A <- as_adjacency_matrix(g, sparse=FALSE);
Q <- -A/rowSums(A); diag(Q) <- 1;
# This is a row-normalized, but not symmetrized Laplacian
# initial opinions at t_0, for example random
x <- rnorm(vcount(g), mean = 0, sd = 2)
yini <- c(y = x); # x is vector of initial opinions
times <- seq(from = 0, to = 20, by = 0.1); # time points
derivs <- function(t, y, parms) {
with (as.list(y), {
dy = - Q %*% y ;
list( c(dy) ) } ) } ;
out <- ode(y = yini, times = times, func = derivs, parms = NULL)
# Look at tail(out) and organize the results:
Ve <- vector(); # st. dev. of opinions
for (i in 1:nrow(out)) {Ve[i] <- sd(out[i, 2:ncol(out)]) };
plot(Ve ~ times)
For the most influential nodes in the convergence to consensus, see power centrality.
Footnotes
[1] Local transitivity can also be calculated for specific nodes, for example “1” and “3” by using the argument vids=c(1,3).
[2] This script is for pedagogy, as transitivity of a random graph can be simply calculated.
[3] In the unlikely case there are two largest components exactly the same size, the algorithm picks the first.
[4] For the number of independent paths connecting two specific vertices, see the official igraph manual on graph.cohesion.