Mystique: Evolving Android Malware for Auditing Anti-Malware Tools

Introduction

A Story About Android Malware

Android malware undergoes a stunningly rapid increase in a short time of last five years. In 2010, we witness the industrial age of mobile malware, and also in that year Geinimi was one of the first found malware that attacked the Android platform and used the infected phone as part of a mobile botnet. Ever since then, a larger number of sophisticated mobile malware is created by attackers due to the prevalence of Android phones. With regard to the defence side, traditional approaches relying on textual or hexadecimal signatures are incapable of detecting variants of existing malware and new ones. To catch up the trend, in research community, machine learning based approaches (e.g., Drebin, DroidSIFT, DNADroid, RevealDroid) and information-flow analysis based approaches (e.g., FlowDroid, DroidSafe, IccTA) are proposed to detect obfuscated malware and their variants. Recently, new attacks (transformation attacks and collusion attacks) are revealed, and they fail the existing detection approaches. Thus the similar arms race between malware and anti-malware is unexceptionally observed on Android platform.

Recently, Android malware exhibits a variety of attack behaviors, including leaking user privacy, elating privilege without permission, conducting unknown financial charge, and abusing application functionality. In real malware, these attack behaviors may coexist in order to increase the damage and success probability of the attack. Even worse, evasion techniques (e.g., multiple-level obfuscation) are further applied on the code or deployment package to evade the scanning of Anti-Malware Tools (AMTs). Hence, the combination of different attack behaviors and evasion techniques indeed exist in real-world Android malware — such a fact hinders the understanding of the reason why AMTs fail in detection.

Then, how are defenders working?

It cannot be denied that Anti-malware tools (AMTs) are getting more advanced. However, surviving malware is always evolving and getting increasingly sophisticated. Generally speaking, the development of AMTs usually lags behind the advance of new malware, since new malware variants (similar malware generated by software obfuscation or configuration techniques) and zero-day vulnerabilities (a weakness that allows an attacker to exploit the target system) keep emerging every day. Therefore, there are basically three weaknesses of AMTs in the modern society: 1) current AMTs are specifically for addressing one or several types of attacks; 2) AMTs can be easily defeated by some advanced techniques, such as obfuscation, dynamic loading; 3) some of them lack of comprehensive understanding for Android malware, so false positive and false negatives are common in the detection.

Are you going to propose an overwhelming approach to address all the problems? Not Exactly, Partially Yes

In this work, we aim to explore the essential composition of Android malware, and explore the capabilities of current anti-malware tools to further identify their strengths and weaknesses at the front of different Android malware. After that, we give a proposal for the countermeasures to resist the malware. The main challenge in auditing the AMTs is the lack of the evaluation criteria and well labeled benchmark such that we cannot evaluate and the strength and weakness of AMTs systematically. To our best knowledge, existing AMT evaluation is based on benchmarks like GENOME and DREBIN, which classify malware based on families only. GENOME and DREBIN are not suitable for auditing AMTs due to three reasons: 1) the malware are old and well recorded in malware repository of AV tools, 2) there is no a comprehensive coverage of different attacking and evasion techniques, 3) there is no index (or label) to different malware attributes (like attack behavior, obfuscation techniques, anti-debugging techniques, etc.), except for (inaccurate) family names. By far, only one existing work discusses the resilience of different Defence strategys (DS) against the obfuscation techniques.

In this work, we propose to separate different attack behaviors and evasion techniques into basic reusable features to summarize the attack features and evasion feature of malware on Android. Here, attack feature (AF) means malicious behavior of a certain attack, which links to implementation of the functional requirements (intention) of malware. Evasion feature (EF) means the ability of malware to evade the scanning of AMTs, including a variety of code complication and transformation techniques that change no functional requirements of malware. In this way, we can develop a meta model for Android malware by modularizing various attack features into the various building blocks. Hence this meta model allows us to generate malware variants to cover different AFs and EFs, and evaluate how different DSs react to each individual feature as well as their different combinations.

Technically, we start with detecting and analyzing the malicious code among the similar yet different malware variants in an Android malware sample. We then modularize the malicious code into different AFs. Once features are modularized, we adopt the concept of Software Product Line Engineering (SPLE) and build a feature-oriented architecture as the malware meta model to capture the different features and their constraints inside malware. With the meta model, we apply a multi-objective evolutionary algorithm to generate variants of malware samples, to mimic the evolution of Android malware. To guide the evolution to generate better malware, we define the fitness function for selecting next generation by maximizing the number of attack behaviors, minimizing of evasion techniques needed and the expected detection rate. To automatically generate malware, we propose an intermediate behavior modelling language, Behavior Description Language (BDL), to perform model-drive malware generation by gluing the code snippets of different features. In this way, we can generate the malware from simple to complex, from plain malware to highly evasive ones.

Finally, we develop the proposed malware generation process into a tool named Mystique. Mystique helps to automatically generation over 10,000 malware samples by selecting different combinations of AFs and EFs. The generated malware is used audit the capabilities of 57 state-of-the-art anti-virus tools and 9 academic solutions in terms of detection ratio. With the experiment results, we test four commonsense hypothesis of AMTs. We also conduct an experiments to audit the capability of online vetting of app stores. In most cased, our malware passes the online vetting process.

To sum up, the contributions of our work are fourfold:

We recognize the Android malware as attack features and evasion features and present them in a meta model. Different from focusing on requirements in malware ontology analysis, we consider and maintain the traceability between the features and their corresponding code in Mystique.
Based on the meta model, we develop an SPL architecture to generate new Android malware by a Multi-objective Evolutionary Algorithm (MOEA). Our approach is implemented in an automated framework named Mystique. In malware generation, we propose a BDL to support model-driven code assembly and verify constraints in code assembly.
We survey and evaluate the state-of-the-art AMTs using our generated malware. The experiments show that the existing AMTs are quite week at detecting the new mawlare, a detection ratio of less than 30% on average. We propose the countermeasures to detect the malware generated by Mystique.
We have generated over 10,000 samples of Android malware by combining different attack and evasion features. They can serve as a benchmark to assess detection capabilities of AMTs as they cover different representative combinations of features, which are missing in the current malware benchmark.

What are the difficulties of your work?

To generate the sound and workable malware, we face the following technical challenges:

C1: Construction of feature model. A variety of features are contained in malware, and used in different ways. It is difficult to identify and extract the representative features in malware.
C2: Malware measurement. With the feature model (i.e., the meta model of malware), we still need some goals to guide automated malware generation. Thus, we define three objective: aggressiveness, evasiveness and detectability for measuring malware. Since these objectives are competing (e.g., highly offensive malware is generally more detectable), we use MOEA o select features from feature model.
C3: Automated code generation for malware. A valid malicious app conducts certain malicious behaviors, with no errors during compilation and runtime. Between feature model and the corresponding workable implementation, there is still a big gap. Hence, we mitigate the challenge of gluing code of features into valid malware via model-drive code generation.
C4: Validation of generated malware. As the malware is generated according to the feature model, we want to prove their maliciousness - whether malicious behaviors can be triggered and carried out on real Android devices. To address this challenge, we conduct the following experiments. For the attack of privacy leaks, we set up a dummy server or device to receive the sensitive information sent from malware.

Page updated

Google Sites

Report abuse