Mystique: Evolving Android Malware for Auditing Anti-Malware Tools

Benchmark Comparison

In this webpage, we provide a comparison between Mystique and DroidBench. DroidBench (https://github.com/secure-software-engineering/DroidBench) is a data set of crafted malware of privacy leakage. There are 13 types of malware and 120 malware samples totally. And static analysis tools FlowDroid, IccTA and DroidSafe are using these samples to test their tools. We use the concept of our feature model to describe the malicious behaviors in these malware samples as shown in Table 1. Each column in Table 1 means:

- Type. The type of malware
- Name. The name of the app
- Source. Which source is contained in the malware sample.
- Flow. If there is one flow from source to sink. "T" means there exists one flow, while "F" means no.
- Sink. Which sink is contained in the malware sample.
- Trigger. How to trigger the malicious behaviors in malware.
- Evasion. Evasion features employed in the malware samples.
- Privacy Leakage?. If it is a valid malware sample.

In DroidBench, authors only use three kinds of sources in our feature model (i.e., TELEPHONY::IMEI, TELEPHONY::SIM_SERIAL, LOCATION::REAL_TIME_LOCATION). They consider the some credentials of apps, for example, Username and Password of specific apps. However, data from GUI elements cannot be easily determined as privacy. So we does not consider them in sources of the feature model; We have nine types of sinks in our feature model, while there are five types of sinks in DroidBench. Different from DroidBench, Mystique does not contain LOG, FILE, and OTHERS as sinks. Because LOG, FILE and OTHERS only expose sensitive information locally, so we do not contain them in Mystique benchmark; There are only two types of triggers in DroidBench, i.e., MAIN::STARTUP and GUI Triggers. In the description of Trigger features in Section 4.1 of the paper, we explain why we do not consider GUI triggers. Besides MAIN::STARTUP, we have include another 3 types of triggers (80 totally); For evasion features, all evasion techniques can be categorized into data based evasion, control based evasion or transformation in our paper. The evasion techniques employed in DroidBench mainly focus on implementation, for example, they use many techniques of string operation, collection operation. Different from DroidBench, we put more focus on evasion in behavior level. We provide five ways to transmit data, such as File, SharedPreferences, SQLite.

Mystique benchmark have summarized the feature model, and the ontology system. Malware samples in Mystique benchmark have different combination of attack features and evasion features. Therefore, Mystique benchmark can be used to explore the AMTs' capabilities of detecting different features, determine the significance of attack features, and the effect of evasion features.

One interesting finding is that the decomposition and analysis of DroidBench can help to valid the correctness of our feature model. According to the feature model in our paper, source, sink, trigger and configuration (i.e., permission) are mandatory. One malware sample has to include all these features to carry on an attack. There are some fake privacy leakage (not a really malicious app) in the samples of DroidBench marked in red in Table 1. From these apps, we conclude that either there is no trigger in the apps, or there is no link from source to sink. Therefore, the understanding and construction of malicious behavior in Mystique benchmark is the same with that in DroidBench.

Table 1. The Comparison between Mystique benchmark and DroidBench.

Page updated

Google Sites

Report abuse