After sampling 270 apps from 164,156 apps, two of the authors systematically annotated them to create a gold-standard policy corpus. The goal of this annotated corpus is to establish whether privacy policies contain meaningful, relevant information about apps accessing dangerous permissions to collect sensitive data. The following observations are made from the annotated corpus:
Dangerous permissions are almost never mentioned by their name in the app policies.
Privacy policies rarely disclose permissions by their semantic meaning or by the sensitive data/actions the permissions represent in terms of Android API pragmatics.
PERSISTENTID and LOCATION are the top frequently-requested permissions by apps, and these permissions are disclosed more clearly than other permissions in the app policies.
CALENDAR, CONTACTS, and SMS permissions are requested by apps, yet these permissions are harder to find in privacy policies than other permissions.
🔥🔥 The corpus is available for download ⏬
🛎️ Don't forget to cite our work if you use the dataset:
@ARTICLE{9861610,
author={Rahman, Muhammad Sajidur and Naghavi, Pirouz and Kojusner, Blas and Afroz, Sadia and Williams, Byron and Rampazzi, Sara and Bindschaedler, Vincent},
journal={IEEE Access},
title={PermPress: Machine Learning-Based Pipeline to Evaluate Permissions in App Privacy Policies},
year={2022},
volume={10},
pages={89248-89269},
doi={10.1109/ACCESS.2022.3199882}
}
This table contains annotated permission-policy snippets from the MPP-270 corpus. In general, permission-policy snippets explicitly or implicitly describe sensitive data access/collection, user interaction, or app feature(s) that the app could only achieve via invoking particular dangerous permission(s). In the table, each row represents an app and its permission-policy snippets corresponding to its declared dangerous permissions in the manifest file. '0' means the app did not declare the dangerous permission, and 'NOT_FOUND' means human annotators did not find any permission-policy snippets for the declared permission.
Fig. 6. Distribution of dangerous permissions and annotations in the MPP-270 policy corpus. Data collection related to PERSISTENTID and LOCATION permissions were mentioned in 97% and 92% of the app policies, respectively. By contrast, CALENDAR was found the least (3 out of 28 apps). CONTACTS and SMS permissions-related data access and collection were found in less than 50% of the apps’ policies.