Two dataset are applied in our experiments:
Malscan to reproduce their experimental results.
https://github.com/malscan-android/MalScan/tree/master/sha256
[Cite from Malscan: https://github.com/malscan-android/MalScan ]
All the datasets are derived from a growing collection, AndroZoo (https://androzoo.uni.lu/), which currently contains over nine million different APKs, each of which has been (or will soon be) analysed by several different AntiVirus products in VirusTotal (https://www.virustotal.com/) to know which applications are detected as malware. The datasets used in our experiments can be obtained from AndroZoo through the given sha256 (https://androzoo.uni.lu/api_doc).
2. Dataset in "Understanding android obfuscation techniques: A large-scale investigation in the wild"
Three malware families of malware: name encryption, obfuscation and reflection.
Thanks for the authors' sharing.