This dataset provides a curated collection of 20 code-level metrics extracted from Android malware applications, each labeled with its corresponding malware family. It comprises 29,162 malicious apps categorized into 202 malware families. The metrics capture structural and behavioral properties of the malicious apps and are grouped into four categories: complexity, dimensional, object-oriented, and Android-oriented metrics.
These metrics were extracted by decompiling each malicious application into Smali code and analyzing it with a modified static analysis tool. This dataset enables researchers to study malware family detection, characterization, and evolution with a compact and efficient feature set.
Further details can be found in our paper “Lightweight, Effective Detection and Characterization of Mobile Malware Families” [PDF], IEEE Transactions on Computers, 2022.
If you end up using this dataset as part of a project or publication, please cite our paper:
@ARTICLE {DroidMalVet,
author={Elish, Karim and Elish, Mahmoud and Almohri, Hussain},
journal={IEEE Transactions on Computers},
title={Lightweight, Effective Detection and Characterization of Mobile Malware Families},
year={2022},
volume={71},
number={11},
pages={2982-2995}
}
We are happy to share our dataset. Please send us an email to kelish@floridapoly.edu stating your identity and research scope. We will then send you the link where you can download the dataset.
Do not share the data with any others (except your co-authors for the project). We are happy to share with other researchers based upon their requests.
If you are in academia, contact us using your institution email and provide us a webpage registered at the university domain that contains your name and affiliation.
If you are in industry, send us an email from your company’s email account and introduce yourself and company. In the email, please attach a justification letter in official letterhead, and state clearly the reasons why this dataset is being requested.
Please note that an email not following the conditions might be ignored. We will keep the public list of organizations requesting this dataset at the bottom.