We validate the generated malware from three aspects.
Validity of Single Feature. For each of 266 AFs, we handcraft a blank Android program and generate the malware with the single AF. Then we execute the malware to verify whether it can successfully steal the information. For each of 14 EFs, we also separately apply it to complicate a basic information flow between a source and a sink. In the experiments, we finally assure that each AF can leak information and each EF can complicate the information flow.
Proof of Program Synthesis. We assure that the flows of privacy leakage in malware are logically true. In detail, we verify the three phases of the automated malware generation: p1, feature selection, p2, transformation from features to BDL model, and p3, transformation from BDL model to code.
• Proof of p1. In the process of feature selection, we select appropriate candidate features, conforming to the constraints in the feature model. It guarantees there are sufficient and necessary features to construct malware.
• Proof of p2. In privacy leakage, BDL models define trigger features and behavior features, while the corresponding required permission features are missing.
• Proof of p3. Constraints on the unique runtime environment of Android should be satisfied. For example, consuming operations in Android apps cannot be executed in the main thread, and hence we have to create a child thread to execute consuming operations. The BDL makes abstract presentation of flows valid in a real app. In code assembly, we write scripts to make sure all the implementation constraints are satisfied.
To sum up, for given features, this step is to assure that all the requirement and implementation constraints are satisfied.
Malware App Validation. The last step is to valid the final malware app to test whether it can leak privacy information. To this end, we set the target URL and phone number to our honeypot that the information would be sent to. We use the running example to illustrative the validation process. We generate a malicious app using the features of malware in the running example. The selected features are as follows:
Selected Features for Malware App Validation
Triggers - MAIN::STARTUP
Sources - TELEPHONY::IMEI, TELEPHONEY::PHONE_NUMBER
Sinks - HTTP::APACHE_POST
(dependencies) PERMISSION::READ_PHONE_STATE
Mystique constructs one flow of privacy leakage based on selected features, which is presented below in the form of BDL.
Flow of malicious behaviors
ACTIVITY::POINTCUT_ONCREATE::SOURCE(TELEPHONY::IMEI, TELEPHONY::PHONE_NUMBER)→ACTIVITY::POINTCUT_ONCREATE::SINK(LOCAL_VARIABLE,HTTP::APACHE_POST)
We set the target URL to our honeypot web site, in which there is a responding web page written in PHP to store the received message from the generated malware. The script in the server to collect the stolen information is as follows:
PHP script in the honypot
$output = "info.txt";
file_put_contents($output, "==============".date('l jS \of F Y h:i:s A')."==============\n", FILE_APPEND | LOCK_EX);
file_put_contents($output, print_r($_SERVER, true) , FILE_APPEND | LOCK_EX);
file_put_contents($output, print_r($_REQUEST, true), FILE_APPEND | LOCK_EX);
Since there are 30 types of sources in the feature model, we use M YSTIQUE to generate 30 malicious apps accordingly, each of which contains one kind of sources. For simplicity, we construct one flow that satisfies the constraints defined in the feature model for the privacy leakage, by selecting one satisfiable trigger and sink, and setting up the acquired permissions. We execute them on a physical Android device. Our honeypot successfully collects all sensitive information sent by these malicious apps.