In this webpage, we begin with a motivation example to show how our work is inspired, and then present an overview of Mystique to give readers a direct and explicit illustration.
Figure 1. A motivation example to illustrate the generation of new variant of DroidKungFu
To illustrate the process of the construction, we take DroidKungFu in Genome as an example. DroidKungFu belongs to malware of privacy leakage, which steals sensitive information (e.g., IMEI code, phone number). Figure 1 shows the feature model for malicious behaviors contained in DroidKungFu. The feature model consists of necessary features for launching attacks as well as their relationship in between. For example, DroidKungFu obtains IMEI code and phone number, which belong to the feature Source. Feature Source has 4 optional sub-features, among which feature IMEI and PHN_NUM require permission feature READ_PHN_STE that relates to permission.
After obtaining the feature model, we encode the attack features (malicious behaviors) as chromosome in gene. Specifically, each bit of the chromosome represents the existence of an attack feature —
1 for existent and 0 for non-existent. The crossover operation and mutation operations are performed on the chromosome during the evolution. In step 2 of Fig. 1, we mutate features IMEI and PHN_NUM to feature LOCATION . According to feature dependency, the original permission feature READ_PHN_STE for IMEI is not required any more, while feature LOCATION acquires permission feature ACC_LOCATION . After that, the gene of a new malware variant is produced. In step 3 of Fig. 1, the code of each feature in gene will be assembled. Details on assembling triggers and manifest file are in Section 5. In addition, to audit the capabilities of AMTs, we also employ obfuscation to evade the detection, then one malware variant is created and can be used for auditing AMTs. Owing to the gene of the generated malware variant, we can evaluate what feature combinations can evade what defence strategies.
Figure 2. Multi-objective guided malware generation
As shown in Figure 3, Mystique takes a malware feature model as the input and generates various malware variants. During the process, each malware is encoded as a DNA sequence based on their values for AF and EF in the feature model. The malware evolution is an iterative process in accordance with the principle of survival of the fittest, where we randomly choose an initial population of malware satisfying the input feature model, and select better malware in the each generation afterwards based on the fitness function, i.e., the malware measurement function in our case. The five steps in each iteration of malware evolution are listed as follows.
Step 1: Feature Selection. Give the previous generation n of malware, step 1 is to produce new malware by applying crossover and mutation operations on malware from generation n. Considering the three objectives, the Indicated-Based Evolutionary Algorithm (IBEA) is used to choose the fittest ones in each iteration.
Step 2: Construction of Malicious Behaviors. This step constructs the logic of malicious behaviors, according to the selected features in step 1. To bridge the gap between feature model and code, we present an intermediate language, BDL, to model the logic of malicious behaviors.
Step 3: Code Assembly. To make the malware run on a real device, Mystique conducts to validate the assembled code and package it into a deployable app. It includes the setup of triggers that act as entry points of malicious behaviors, the configuration of manifest file and malware packaging.
Step 4: Evasion Application. After evasion features are selected, the corresponding evasion techniques are applied. Note that the evasion is based on the source code, without changing the malicious intent or behaviors of the constructed malware.
Step 5: Objective Evaluation. In this step, we calculate the objective functions and choose the fittest for the next iteration. We also need to check whether the evolution is finished due to convergence of EA or reaching the upper limit of iteration times.