This page presents the evaluation of SafeSCA's source and binary code embedding, respectively, and report their performance with various thresholds π1 and π2.
Figure 1. Precision, recall, and F1 scores of source embedding model with π½ =100 and π½ =40.
Figure 2. Precision, recall, and F1 scores of binary embedding model.
Fig. 1 presents the precision, recall, and F1 scores of source code embedding with different π1 ranging from 0.01 to 1.0, and two settings of π½ (i.e., 40 and 100). When π1 is 1.0, all source functions are selected, i.e., SafeSCA is degenerated into Centris (DPCNN). Given a fixed π½, the precision decreases and the recall rises with the increase of π1. When π½ is relatively small, the precision will decrease notably with the increase of π1 (i.e., the blue dotted curve). However, when π½ is relatively large, the precision is not sensitive, but the recall is not satisfactory when π1 is small (i.e., the solid orange curve). Compared with precision and recall curves, the F1 score is not sensitive to π1. Table 3 shows the performance of the source embedding model when π1 = 2% and π½ = 40. Compared with SafeSCA, it fails to identify functions whose source code are unavailable in the repository, resulting in a low recall. Its cost is tripled since it also embeds functions with representative names.
Fig. 2 presents the precision, recall, and F1 scores of binary code embedding with different π2. Fig. 10 shows that the precision decreases significantly with the increase of π2. To avoid high false positive rates, we use a small π2 = 0.02. Table 3 shows the performance when only binary embedding is used. Besides, we also notice that recent works proposed advanced binary code similarity analysis techniques (e.g., jTrans [69 ]). Since the framework proposed in this paper does not hold any assumption to the underlying binary analysis technique, we envision that SafeSCA can be further improved by replacing the underlying binary embedding model with the latest research outputs.