Summary: Working on the Android security team to develop a new method to detect Android malware in the Play store.
Unknown Malware Cluster Detection Using Machine Learning: Developed a workflow to sample potential malicious Android APK cluster for manual review.
Weak Signal Quality Evaluation: Evaluate the quality of the weak signals, which is from the results of different APK malware detection mechanisms (i.e. static/dynamic analysis, APK permissions, rules and signatures, machine learning, etc.), using their performances in the past.
Weak Signal Selection and Aggregation: Select the weak signal to include using evaluated quality and heuristic. Aggregate the signals at APK and cluster level.
Sample Positive and Negative Clusters: Test and finalize the criteria of the sampling positive and negative clusters. The more coverage the criteria have, the more challenge the model will face.
Perform Data Analysis and Feature Engineering: Evaluate adjusted mutual information (AMI) and AMI interaction of features, and visualize sample distributions to decide what weak signals are of the most important ones using internal machine learning tools.
Train Models: Carry out different machine learning models, including linear and tree-based models. The best testing accuracy is 98% with an average precision of 0.983.
Evaluate the Final Model: Conduct delayed evaluation and real-world evaluation on all the clusters formed in preset testing time window. In delayed evaluation, the best accuracy is 98.8% with an average precision of 0.939. In real-world evaluation, out of sampled 20 clusters, there are 2 clusters contains potential unweaponized malware APKs. The gap between the delayed evaluation and the real world evaluation may be caused by the immature clusters.
Query Design: Create queries in MySQL database for the requests from the frontend, including monthly sales data, order numbers of different categories, employee activity data, etc.
Develop Excel Data Importing Template: Develop and maintain automated Excel templates to import order details into database using VBA for the customer service and research departments to use.