Data Mining Datasets

Labeled Unweighted Transactional Graph Datasets

Labeled Weighted Transactional Graph Datasets

[weight is a function of label + synthetic weight]

If you use these datasets, please cite us:

Bibtex:

@inproceedings{islam2018wfsm,
  title={WFSM-MaxPWS: An Efficient Approach for Mining Weighted Frequent Subgraphs from Edge-Weighted Graph Databases},
  author={Islam, Md Ashraful and Ahmed, Chowdhury Farhan and Leung, Carson K and Hoi, Calvin SH},
  booktitle={Pacific-Asia Conference on Knowledge Discovery and Data Mining},
  pages={664--676},
  year={2018},
  organization={Springer}
}

Labeled Weighted Transactional Graph Datasets

[weight is not a function of label + synthetic weight]

  1. Compound_Graph

Weighted Call Graph

[weight is not a function of label + real weight]

About

  • Original Source
  • Each node represent a function in call graph
  • Each directed edge(u,v) represent a call from u to v

Dataset Conversion Method

  • Node label is set considering most frequent opcode in the function['mov', 'call', 'lea', 'jmp', 'push', 'add', 'xor', 'cmp', 'int3', 'nop', 'pushl', 'dec', 'sub', 'insl', 'inc','jz', 'jnz', 'je', 'jne', 'ja', 'jna', 'js', 'jns', 'jl', 'jnl', 'jg', 'jng']
  • Edge weight is calculated by taking average of endpoint node's total opcode calls
  • Edges are unlabeled

  • Number of graphs : 546
  • Mean number of nodes : 648.1
  • Mean degree : 3.3
  • Median degree : 2.7
  • Maximum degree : 10.1
  • Number of isolated nodes : 130812
  • Mean of isolated nodes : 239.6
  • Number of self loops : 0

  • Number of graphs : 815
  • Mean number of nodes : 871.5
  • Mean degree : 3.6
  • Median degree : 3.7
  • Maximum degree : 34.4
  • Number of isolated nodes : 231990
  • Mean of isolated nodes : 284.7
  • Number of self loops : 0

Weighted Sequential Datasets

1. SIGN (weight is a function of item)

2. LEVIATHAN (weight is a function of item)

3. FIFA (weight is a function of item)

4. Synthetic Dataset (from spmf website, positively and negatively skewed weight distribution)