TorMalTraffic2019 Dataset

These are the datasets used for my research entitled, "Characterization and Classification of Malware Traffic over the Tor Network." Please read the conference paper to determine how the datasets were generated.


TorTraffic2019

The dataset contains the following traffic:

  • Web - http, https

  • Mail - gmail, uplb

  • Chat - hangouts, messenger, utox

  • Audio - soundcloud, streamsquid

  • Video - tedtalks, youtube

  • FTP - mmnt, rebex, wftserver

  • VoIP - messenger, mumble, utox

  • P2P - qBittorent, deluge

  • Whonix - TLS


TorMal2019

The dataset contains the following traffic:

  • Web - http, https

  • Malware - dexter, kazy, locky, parite, wannacry


License

These datasets are publicly available for researchers. If you are to use this dataset, you should cite our research paper which discusses the details of how the datasets were generated:


Marie Betel B. de Robles, Joseph Anthony C. Hermocilla, and Jaderick P. Pabico. Characterization and classification of malware traffic over the tor network. In Proceedings of the 20th Philippine Computing Science Congress (PCSC 2020), ISSN: 1908-1146, pages 78--87, Philippines, 2020. Computing Society of the Philippines.


@inproceedings{derobles-pcsc2020-characterization,

author = {de Robles, Marie Betel B. and Hermocilla, Joseph Anthony C. and Pabico, Jaderick P.},

title = {Characterization and Classification of Malware Traffic over the Tor Network},

booktitle = {Proceedings of the 20th Philippine Computing Science Congress (PCSC 2020)},

year = 2020,

issn = 1908-1146,

month = march,

publisher = {Computing Society of the Philippines},

address = {Philippines},

pages = {78-87}

}


Download

Download and verify the file (Linux):

  1. Choose and download the dataset:

[TorTraffic2019](6.76 GB) [SHA][MD5]

[TorMal2019](575.3 MB) [SHA][MD5]

  1. Open a terminal and change to the directory with the downloaded file.

cd <path_to_file>

  1. Type the following commands:

md5sum --check <dataset_filename>.tar.xz.md5

sha256 --check <dataset_filename>tar.xz.sha256

  1. When both commands return an OK, the checksums match. If the checksums do not match, your downloaded dataset file is broken. Please try to download again to get a valid file.