Data Mining Datasets
Data Mining Datasets
Labeled Unweighted Transactional Graph Datasets
Labeled Unweighted Transactional Graph Datasets
- Seven bio- and chemo-informatics datasets and two social network datasets. (download) (original source)
- AIDS antiviral Screening Data. (download), (Raw Data)
- Cancer datasets. (original source)
Labeled Weighted Transactional Graph Datasets
Labeled Weighted Transactional Graph Datasets
[weight is a function of label + synthetic weight]
[weight is a function of label + synthetic weight]
- Cancer Dataset: MCF-7. (normal weight distribution) (negative exponential weight distribution)
- Cancer Dataset: P388. (normal weight distribution) (negative exponential weight distribution)
- Cancer Dataset: Yeast. (normal weight distribution) (negative exponential weight distribution)
If you use these datasets, please cite us:
If you use these datasets, please cite us:
Bibtex:
@inproceedings{islam2018wfsm,
title={WFSM-MaxPWS: An Efficient Approach for Mining Weighted Frequent Subgraphs from Edge-Weighted Graph Databases},
author={Islam, Md Ashraful and Ahmed, Chowdhury Farhan and Leung, Carson K and Hoi, Calvin SH},
booktitle={Pacific-Asia Conference on Knowledge Discovery and Data Mining},
pages={664--676},
year={2018},
organization={Springer}
}
Labeled Weighted Transactional Graph Datasets
Labeled Weighted Transactional Graph Datasets
[weight is not a function of label + synthetic weight]
[weight is not a function of label + synthetic weight]
- Compound_Graph
- Normal Distribution (download) (distribution curve)
- Positively Skewed Normal Distribution (download) (distribution curve)
- Negatively Skewed Normal Distribution (download) (distribution curve)
Weighted Call Graph
Weighted Call Graph
[weight is not a function of label + real weight]
[weight is not a function of label + real weight]
About
About
- Original Source
- Each node represent a function in call graph
- Each directed edge(u,v) represent a call from u to v
Dataset Conversion Method
Dataset Conversion Method
- Node label is set considering most frequent opcode in the function['mov', 'call', 'lea', 'jmp', 'push', 'add', 'xor', 'cmp', 'int3', 'nop', 'pushl', 'dec', 'sub', 'insl', 'inc','jz', 'jnz', 'je', 'jne', 'ja', 'jna', 'js', 'jns', 'jl', 'jnl', 'jg', 'jng']
- Edge weight is calculated by taking average of endpoint node's total opcode calls
- Edges are unlabeled
- Number of graphs : 546
- Mean number of nodes : 648.1
- Mean degree : 3.3
- Median degree : 2.7
- Maximum degree : 10.1
- Number of isolated nodes : 130812
- Mean of isolated nodes : 239.6
- Number of self loops : 0
- Number of graphs : 815
- Mean number of nodes : 871.5
- Mean degree : 3.6
- Median degree : 3.7
- Maximum degree : 34.4
- Number of isolated nodes : 231990
- Mean of isolated nodes : 284.7
- Number of self loops : 0
Weighted Sequential Datasets
Weighted Sequential Datasets
1. SIGN (weight is a function of item)
2. LEVIATHAN (weight is a function of item)
3. FIFA (weight is a function of item)
4. Synthetic Dataset (from spmf website, positively and negatively skewed weight distribution)