ATVHunter

Our database structure

Our framework of ATVHunter

ATVHunter takes an Android app as input, and can identify the components in apps and report vulnerable versions of the in-app TPLs with their specific versions and comprehensive vulnerability description. ATVHunter is implemented in 2,000+ lines of python code.

We employ a commonly-used reverse engineering tool Soot to decompile the Android apps.

We exploit Androguard to get the class dependency relations to get the independent TPL candidates.

We then employ Soot to generate CFG. We also modified the source code of Soot so that we can get the opcode sequence in each basic block of a CFG.

We use the fuzzy hash (ssdeep) method to generate the code feature and employ the edit distance algorithm to find the in-app TPLs. Our approach can pinpoint the specific TPL versions. We maintain a library database containing more than 3 million TPL files and construct a vulnerable TPL database that includes 224 security bugs from open-source Java software in Github, and 1,180 CVEs from 910 Android TPLs in NVD.

The main purpose of some steps in method design

We delete the host app is to improve the performance, we do not need to compare the host apps with other TPLs in the database.

We just use the class dependency graph, our module decoupling method does not need to depend on the package tree.

We extract two-phase features to precisely version identification.

We extract the CFG which is a stable sematis feature and can defend against some code obfuscation. Besides, we use the opcode of each basic block of each CFG as the supplement features and use fuzzy hash method to generate the code feature. Fuzzy hash can decrease the impacts of code signatures due to the effect of the code obfuscations.

The more details can be seen from the above figure.