Survey Content

Survey

Whether a tool is user-friendly is an essential factor in evaluating the usability of the tool. We attempt to compare the usability of the available tools (i.e., LibID, LibPecker, ORLIS, LibRadar, LibD, and LibScout) from three aspects: 1) the installation and setup process, 2) the usage steps, and 3) the result presentation.

To assess them objectively, we design a questionnaire and recruit participants to rate these tools from the three aspects.

Participant Recruitment. We recruit 20 people from different industrial companies and universities via word-of-mouth, who are developers in IT companies, post-doc, Ph.D. students, etc. To minimize the interference factors due to lack of professional factors of participants, all the participants we recruited have over three years of experience in Android app development, and they are from different countries such as Singapore, Germany, China and India. Besides, they did not install or use these tools before. The participants received a $50 coupon as a compensation of their time.

Experiment Procedure. We provided the links of the source code together with the instruction files that guide participants to install and use these tools. Note that since some tools require users to take apks (and TPLs) as input, we also provide another repository containing some sample apps and sample TPLs in case that participants have no idea about where to download the input data, which may hinder the process of using them. In fact, when conducting it in the real world, it is even more difficult since users have to find where to download and collect these input datasets, especially for the tools that require a ground-truth database beforehand. We ask the participants to install and use these six tools one by one, and rate each tool from the aforementioned three aspects. The specific rating criteria can be seen in TABLE 1. All participants carried out experiments independently without any discussions with each other and they were encouraged to write some comments about each tool. After finishing the tasks, we also interviewed them about the user experience with detailed records.

TABLE1. Rating options for each item in the questionnaire

Results

Figure1 shows the results of the questionnaire. For each rating item, we take the average of the rating stars from all participants. According to Figure1, we can find that LibRadar gets the most stars while ORLIS receives the lowest score.

Installation.

As for the installation process, LibPecker gets the highest score (4 stars) since it only needs one command, and LibID is regarded to be the most complicated one because it requires participants to install the Gurobi Optimizer and register the license by themselves.

Some users commented that:

"When I first installed LibID, I spent about 2 hours. Installing the necessary dependency (e.g., Gurobi Optimizer) takes most of my time since the instruction is confusing."

"The design of the website of Gurobi Optimizer is terrible. It is difficult to follow the instructions because some of them are scattered."

Both LibRadar and LibD get 3 scores because they just need to install basic Java and python running environment and then users can run them.

The installation of ORLIS and LibScout is almost acceptable (2.5 stars). Both of them require users to download the Android SDK. Besides, ORLIS requires users to download some dependencies and TPLs to make it run.

Usage.

As for the usage, LibRadar and LibD get the highest scores, while LibID is regarded as the most unsatisfactory tool. Participants said they consider more about the execution efficiency of the tools when using them:

"For LibID, LibPecker and ORLIS, I have to wait for several minutes for some apps, sometimes LibID even needs more than 30 minutes to get the results. That's too long for me. While the execution time of LibRadar and LibScout is more acceptable."

"LibID usually gets crash when detecting some TPLs. For me, it does work when processing the TPL named AppLovin, while crashes when processing the TPL named Dropbox."

In fact, the reason for such crashes of LibID is that the size of these TPLs is too large. LibID needs to load all the TPLs and apps into the memory first; its detection strategy usually consumes many computing resources, especially the memories and CPU. When users input many large size TPLs, it will crash when it exceeds the memory.

Another reason that may affect the rating is that LibID, ORLIS and LibPecker would report an error when processing some ``.aar'' files and users need to modify them manually to proceed with the detection process.

Output.

According to the results, participants thought the detection results of LibID, LibRadar, and LibScout are much easier to understand; and ORLIS gets the lowest score since it only provides the matching relations of the class name, without telling the apps or TPLs that the class belongs to, making participants confused.

"The results of LibID, LibRadar and LibScout are easy to understand. All of the results are represented in ``.json'' format, I can quickly find the in-app TPLs, the similarity value and other meta information."

The result provided by LibD is the MD5 of TPLs, which requires users to find the used TPLs by mapping with the database file provided by LibD:

"I can understand the result of LibD, but not very direct and clear. Some information that I really interested in is missing, e.g., the similarity value and library name."

Conclusion

LibRadar gets the highest score from participants mainly due to its simple usage and user-friendly output format, which is regarded as the most easy-to-use tool. ORLIS should be improved most, especially the result representation.