The stackoverflow-answers network is a hypergraph where hyperedges are sets of questions answered by users on Stack Overflow as collected byNate Veldt, Austin R. Benson, and Jon Kleinberg. Nodes are labeled by the tags used in the questions, and nodes often have multiple labels. Some summary statistics of the dataset are:

  • number of nodes: 15,211,989

  • number of hyperedges: 1,103,243

  • mean / median hyperedge size: 23.7 / 5

  • maximum hyperedge size: 61,315

  • number of node classes: 56,502

Data files:

If you use this data, please cite the following paper:

  • Minimizing Localized Ratio Cut Objectives in Hypergraphs.
    Nate Veldt, Austin R. Benson, and Jon Kleinberg.
    Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2020.
    [bibtex]