PermPress: Machine Learning-Based Pipeline to Evaluate Permissions in App Privacy Policies

Muhammad Sajidur Rahman, Pirouz Naghavi, Blas Kojusner, Sadia Afroz*, Byron Williams, Sara Rampazzi,Vincent Bindschaedler

Department of Computer & Information Science & Engineering, University of Florida, Gainesville, Florida, USA

*Avast Software Inc., CA, USA



Abstract: Privacy laws and app stores (e.g., Google Play Store) require mobile apps to have transparent privacy policies to disclose sensitive actions and data collection, such as accessing the phonebook, camera, storage, GPS, and microphone. However, many mobile apps do not accurately disclose their sensitive data access that requires sensitive ('dangerous') permissions. Thus, analyzing discrepancies between apps’ permissions and privacy policies facilitates the identification of compliance issues upon which privacy regulators and marketplace operators can act. In this paper, we propose PermPress – an automated machine-learning system to evaluate an Android app’s permission-completeness, i.e., whether its privacy policy matches its dangerous permissions. PermPress combines machine learning techniques with human annotation of privacy policies to establish whether app policies contain permission-relevant information. PermPress leverages MPP-270, an annotated policy corpus, for establishing a gold standard dataset of permission completeness. This corpus shows that only 31% of apps disclose all dangerous permissions in privacy policies. By leveraging the annotated dataset and machine learning techniques, PermPress achieves an AUC score of 0.92 in predicting the permission-completeness of apps. A large-scale evaluation of 164,156 Android apps shows that, on average, 7% of apps do not disclose more than half of their declared dangerous permissions in privacy policies, whereas 60% of apps omit to disclose at least one dangerous permission-related data collection in privacy policies. Our investigation uncovers the non-transparent state of app privacy policies and highlights the need to standardize app privacy policies’ compliance and completeness checking process.

(Note: MPP-270 annotated corpus and the largest app policy corpus of 164k apps are available for download. 🚀 )

Fig. 1. Experimental setup used to investigate permission-completeness of app privacy policies. PermPress leverages both the outcomes from tasks 1 and 2 to accurately predict completeness label of an app (task 3), based on the privacy policy and manifest file.