The basic intuitive behind the score calculation is to quantify the API specific knowledge a model has to predict/concretize when generating a fuzz driver for a given API. We divide the API specific knowledge into the following four parts and the score is calculated as the sum of the score for these parts:
1) the number of unique project APIs used in the driver
2) the amount of the unique common API usage patterns used in the driver
3) the number of unique identifiers (not NAIVE literal values) excluding the common API usage code
4) the count of branches and loops excluding the common API usage code
For the 2), the common API usage patterns represent the usage of standard libraries APIs such as memcpy, mkstemp, etc. Our design choices are that the common API usages should be counted but with less score. This is because they are necessary parts for writing the driver but the concretization of these usages are easier than API specific usages'. The models learned a lot of these standard libraries API usages during its training. Therefore, we count them in the granularity of the usage pattern.
For the 3), identifiers include the project specific macros, project global variables, member field name of a project defined struct, and literal values such as integers or strings which are neither 0 nor NULL.
The following is a minimized fuzz driver of the API stun_is_command_message_full_check_str from project coturn.
extern int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
// Define variables for the function
int must_check_fingerprint;
int fingerprint_present;
// Call the function to check the message
stun_is_command_message_full_check_str(Data, Size, must_check_fingerprint, &fingerprint_present);
return 0;
}
Its score is 1 (1+0+0+0):
1) has score 1 for the used API stun_is_command_message_full_check_str
2) is 0 since no standard library APIs are involved
3) is 0 since no project defined identifiers are used
4) is 0 since no branches or loops are involved
1 is the lowest score an effective fuzz driver can have, it means no additional API specific usage have to be concretized for calling the target API.
The following is a minimized fuzz driver of the API GetINCHIKeyFromINCHI from project inchi.
extern int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
char *szINCHISource;
inchi_InputINCHI inpInChI;
szINCHISource = malloc(sizeof(char) * (Size + 1));
memcpy(szINCHISource, Data, Size);
szINCHISource[Size] = '\0'; // InChI string must be null-terminated
// Buffer lengths taken from InChI API reference, located at
// https://www.inchi-trust.org/download/104/InChI_API_Reference.pdf, page 24
char szINCHIKey[28], szXtra1[65], szXtra2[65];
GetINCHIKeyFromINCHI(szINCHISource, 0, 0, szINCHIKey, szXtra1, szXtra2);
free(szINCHISource);
return 0;
}
Its score is 5 (1+1+3+0):
1) has score 1 for the used API GetINCHIKeyFromINCHI
2) is 1 since there is one common usage pattern for duplicating a buffer's content using malloc + memcpy + free
3) is 3 since three distinct integer literals are used including 1, 28, and 65 ('\0' is not counted since it is equivalent to NULL)
4) is 0 since no branches or loops are involved
The following is a minimized fuzz driver of the API pj_stun_msg_decode from project pjsip.
extern int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
if (Size == 0) return 0;
pj_pool_t *pool;
pj_stun_msg *msg;
pj_caching_pool caching_pool;
char *buffer;
// Create a caching pool
pj_caching_pool_init(&caching_pool, NULL, 0);
// Create a memory pool
pool = pj_pool_create(&caching_pool.factory, NULL, 1024, 1024, NULL);
// Make sure the input data is null-terminated
buffer = (char *)malloc(Size+1);
memcpy(buffer, Data, Size);
// Call the function to be fuzzed
pj_stun_msg_decode(pool, (pj_uint8_t*)buffer, Size, PJ_STUN_CHECK_PACKET, &msg, NULL, NULL);
// Clean up
pj_pool_release(pool);
free(buffer);
return 0;
}
Its score is 10 (4+1+4+1):
1) has score 4 for the used APIs pj_caching_pool_init, pj_pool_create, pj_stun_msg_decode, and pj_pool_release
2) is 1 since there is one common usage pattern for duplicating a buffer's content using malloc + memcpy + free
3) is 4 since two integer literals (1, 1024), one project defined macro (PJ_STUN_CHECK_PACKET), and one member field name (caching_pool.factory) are involved
4) is 1 since there is one branch inside the driver
The following is a minimized fuzz driver of the API dns_master_loadbuffer from project bind9.
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
isc_buffer_t buf;
isc_result_t result;
isc_mem_t *mctx = NULL;
dns_name_t top, origin;
isc_mem_create(&mctx);
isc_buffer_constinit(&buf, data, size);
isc_buffer_add(&buf, size);
dns_name_init(&top, NULL);
dns_name_init(&origin, NULL);
dns_name_clone(dns_rootname, &top);
dns_name_clone(dns_rootname, &origin);
dns_rdatacallbacks_t callbacks;
dns_rdatacallbacks_init(&callbacks);
dns_db_t *db = NULL;
result = dns_db_create(mctx, "rbt", dns_rootname, dns_dbtype_zone,
dns_rdataclass_in, 0, NULL, &db);
if (result != ISC_R_SUCCESS) {
return 0;
}
result = dns_db_beginload(db, &callbacks);
if (result == ISC_R_SUCCESS) {
dns_master_loadbuffer(&buf, &top, &origin,
dns_rdataclass_in, DNS_MASTER_ZONE, &callbacks,
mctx);
dns_db_endload(db, &callbacks);
}
dns_db_detach(&db);
isc_mem_destroy(&mctx);
return 0;
}
Its score is 20 (12+0+6+2):
1) has score 12 for the used APIs isc_mem_create, isc_buffer_constinit, isc_buffer_add, dns_name_init, dns_name_clone, dns_rdatacallbacks_init, dns_db_create, dns_db_beginload, dns_master_loadbuffer, dns_db_endload, dns_db_detach, and isc_mem_destroy
2) is 1 since no standard library APIs are involved
3) is 6 since one string literal ("rbt"), two project defined macro (ISC_R_SUCCESS, DNS_MASTER_ZONE), and three project defined global variables (dns_rootname, dns_dbtype_zone, and dns_rdataclass_in) are involved
4) is 2 since there is two branches inside the driver