Repeating subgroup analysis on multiple datasets

Post date: Jan 27, 2013 5:5:48 PM

[For UCINET 6.458 or later]

Suppose you need find subgroups in 200 separate networks. The networks involve different nodes, so they can't be stacked into one UCINET dataset. Instead, you have 200 separate datasets. So the question is, how to run 200 separate analyses conveniently.

One way is to use the clustering capabilities in Matrix Algebra -- UCINET's command-line capability. This only works if you like either the Factions method (optimizing Q modularity) or Newman's community detection (NCD) algorithm, because those are the only two clustering algorithms available in the command line interface. To run it on a single dataset, such as UCINET's CampNet, you would type something like this:

-->campnet-sym = symmet(campnet maximum)

-->clusters2 = ncd(campnet-sym 2)

-->clusters3 = ncd(campnet-sym 3)

-->clusters4 = ncd(campnet-sym 4)

The first command just symmetrizes the data. The NCD algorithm is not really designed for directed graphs. The FACTIONS routine does better with directed graphs, so you could just do this:

-->clusters2 = factions(campnet 2)

-->clusters3 = factions(campnet 3)

-->clusters4 = factions(campnet 4)

However, in general if reciprocity is low you should probably consider symmetrizing or runnning blockmodeling instead.

As you can see, both FACTIONS and NCD require you to specify the number of clusters you want, so what most people do is run it for a range of numbers of clusters. The output of any particular run is a cluster id variable that gives the cluster that each node belongs to. For example, for the campnet dataset, running NCD with 3 clusters gets you this:

-->clusters3 = ncd(campnet-sym 3)

Q modularity scores saved as dataset CAMPNET-SYM-3q

-->dsp campnet-sym-3q

Q modularity scores

-----

1 Q 0.550

-->dsp clusters3

1 HOLLY 2

2 BRAZEY 3

3 CAROL 1

4 PAM 1

5 PAT 1

6 JENNIE 1

7 PAULINE 1

8 ANN 1

9 MICHAEL 2

10 BILL 2

11 LEE 3

12 DON 2

13 JOHN 3

14 HARRY 2

15 GERY 3

16 STEVE 3

17 BERT 3

18 RUSS 3

The clustering corresponds to the colors in the picture below.

BATCH METHOD

One way to automate this process is to put the commands in a text file, and then use the batch command to execute the commands in the file. For example, create a text file called runclustering.txt that looks like this:

campnet-sym = symmet(campnet maximum)

camp2 = ncd(campnet-sym 2)

camp3 = ncd(campnet-sym 3)

camp4 = ncd(campnet-sym 4

campq = joinasrows(camp2-2q camp3-3q camp4-4q)

karate-sym = symmet(zacke maximum)

karate2 = ncd(karate-sym 2)

karate3 = ncd(karate-sym 3)

karate4 = ncd(karate-sym 4)

karateq = joinasrows(karate2-2q karate3-3q karate4-4q)

Then, in matrix algebra type:

-->run runclustering.txt

This will execute all of the commands inside the text file, storing all the results on disk in the current folder.

FORFILES METHOD

If you have named all of your datasets with common prefix, such as net1, net2, net3 etc, you can use the forfiles command instead of creating a batch file. With forfiles, you can have the program run the same analyses on all datafiles with the same prefix. For example, suppose you create a folder called Smoking Study under your UCINET Data folder, and you fill it with files net1, net2, etc. In matrix algebra you would then type something like this:

-->cd "C:\Users\sborgatti\Documents\UCINET data\Smoking Study\"

-->forfiles net* $$-id2 = ncd($$ 2)

-->forfiles net* $$-id3 = ncd($$ 3)

-->forfiles net* $$-id4 = ncd($$ 4)

The first FORFILES command says 'for each dataset you find whose name starts with net, run an ncd analysis and call the result the file name plus "-id2". So if the folder contains datasets net1, net2, net3 and net4, this one forfiles command is the equivalent of typing:

-->net1-id2 = ncd(net1 2)

-->net2-id2 = ncd(net2 2)

-->net3-id2 = ncd(net3 2)

-->net4-id2 = ncd(net4 2)

Page updated

Google Sites

Report abuse