Project

Topic: Visualization and analysis of large-scale networks using two data sets (one from group A and another from Group B, see below) posted at Stanford Large Network Dataset Collection.

The project report should be structured as follows:

1. Introduction [Minimum word count: 300]

In this section describe the two networks assigned to you. The text here should definitely answer these two questions: what are these networks about and what are the nodes and edges. Of course, you are free to add more text.

2. Data visualization [Screenshot not permitted]

Use some visualization software to produce clean images of the data. Save the image and add the saved image in the report (you can't insert screenshot). Add a paragraph of text describing some of the features of the graphs.

3. Analysis [Minimum word count: 300]

Explain the following concepts: random graphs, scale-free (power-law) graphs and one model of growth of scale-free graphs. Plot the degree distribution. Does the distribution follow power law? Fit a power law curve and from the fitted curve, find the power law exponent. If you are using some software package for analysis, you may wish to include other metrics that the package can calculate in a separate table. If you do so, you need to explain this data. At the end, try to explain why the two data sets representing very different things obey the same law. Feel free to add your own thoughts and other findings. End the paper with bibliography. Make sure to site ALL sources.

Group A

[Select the data set numbered your class serial number modulo 19]

Social networks

Networks with ground-truth communities

Communication networks

Group B

[Select the data set numbered your class serial number modulo 17]

Collaboration networks

Web graphs

Product co-purchasing networks

Citation networks