Leveraging Semantic Relations in Code and Data to Enhance Taint Analysis of Embedded Systems

1. Abstract

2. Evaluation

2.1 646 Known Vulnerabilities based on the Dataset

2.2 All Detail Source Extraction Result in the Dataset

2.3 245 0-day Vulnerabilities Discovered by Lara

3. Appendix

3.1 Case study-A: Non-hidden Data

3.2 Case study-B: Hidden Data

3.3 Case study-C: Multi-Layer Wrapper Function

4. Data and Source Code

1. Abstract

IoT devices have significantly impacted our daily lives, and detecting vulnerabilities in embedded systems early on is critical for ensuring their security. Among the existing vulnerability detection techniques for embedded systems, static taint analysis has been proven effective in detecting severe vulnerabilities, such as command injection vulnerabilities, which can cause remote code execution. Nevertheless, static taint analysis is faced with the problem of identifying sources comprehensively and accurately.

This paper presents Lara, a novel static taint analysis technique to detect vulnerabilities in embedded systems. The design of Lara is inspired by an observation that pertains to semantic relations within and between the code and data of embedded software: user input entries can be categorized as URIs or keys (data), and identifying their handling code (code) and relations can help systematically and comprehensively identify the sources for taint analysis. Transforming the observation into a practical methodology poses challenges. To address these challenges, Lara employs a combination of pattern-based static analysis and large language model(LLM)-aided analysis, aiming to replicate how human experts would utilize the findings during analysis and enhance it. The pattern-based static analysis simulates human experience, while the LLM-aided analysis captures the way human experts perceive code semantics. We implemented Lara and evaluated it on 203 IoT devices from 21 vendors. In general, Lara detects 556 and 602 more vulnerabilities than SaTC and Karonte while reducing false positives by 57.0% and 54.3%. Meanwhile, with more sources and sinks from Lara, EmTaint can detect 245 more vulnerabilities. To date, Lara has found 245 0-day vulnerabilities in 26 devices, all of which were confirmed or fixed with 162 CVE IDs assigned.

2. Evaluation

Firmware Dataset

203 firmware samples collected from 21 vendors, including 10 different types.

Known Vulnerabilities

646 known vulnerabilties collected form CVE based on the firmware dataset.

2.1 646 Known Vulnerabilities based on the Dataset

N-day.xlsx

2.2 All Detail Source Extraction Result in the Dataset

source.xlsx

It this file, the detail device and type were included.

2.3 245 0-day Vulnerabilities Discovered by Lara

Upcoming update

3. Appendix

3.1 Case study-A: Non-hidden Data

There are two main scenarios where SATC fails to detect vulnerabilities caused by non-hidden data. First, instead of just using the key alone as an identifier to extract user input, it will also combine the URI and key. For vulnerability CVE-A in motivation, the key SubnetMask are identified in the frontend file SetVirtualServerSettings.xml. However, in the backend program prog.cgi, websGetVarString extracts user input based on the combined keyword of URI and key. Consequently, SATC fails to recognize this keyword, resulting in the inability to detect the vulnerability CVE-A. Second, incomplete predefined rules prevent non-hidden key extraction. For another vulnerability CVE-2022-45997, the URI setPortMirror and key portMirrorMirroredPorts that lead to the vulnerability can both be found in the frontend file portMirror.js. However, SATC fails to detect this vulnerability. Through manual confirmation, we have verified that SATC does not extract the corresponding keywords from portMirror.js, which further demonstrates the incomplete of keyword matching rules using by SATC.

3.2 Case study-B: Hidden Data

It shows the vulnerability CVE-2023-23270 caused by hidden URI and key detected by LARA. The main process of discovering this vulnerability was as follows: ❶ URI modifyDNSForward and key DNSDomainName were extracted from the frontend file DNSForward.html; ❷ In the backend program httpd, URI modifyDNSForward was used to extract the registration function websDefineAction; ❸ In the backend program httpd, key DNSDomainName were used to extract the key handling function websGetVar; ❹ In the backend program httpd, registration function websDefineAction was used to extract URI setDebugCfg and its corresponding handling function formSetDebugCfg; ❺ In function formSetDebugCfg, key handling function websGetVar was used to to extract key enable, level and module; ❻ In the function formSetDebugCfg, taint analysis was performed on the user inputs represented by key enable, level and module, and then the command injection vulnerability was discovered. When we manually examined the code, we found that the URI and key that caused the vulnerability were not present in the frontend file. Therefore, SATC was unable to detect this vulnerability, while LARA was able to extract hidden URI and key through the backend program handling logic and successfully discover this vulnerability.

3.3 Case study-C: Multi-Layer Wrapper Function

It shows the vulnerability CVE-2022-29328 caused by a 3-layer wrapper function checkValidUpgrade detected by LARA. LARA identified wrappers of dangerous functions in the shared library libhnap.so. When the function splite_cookie copies a string pointed to by a1 into buffer v4, it does not check the length of the string, resulting in a potential buffer overflow vulnerability. LARA found that a buffer overflow occurs when the second argument to function checkValidUpgrade is controllable. In program web_cgi.cgi which is linked to library libhnap.so, function main first reads the value of parameter cookie from HTTP request and then passes it to function checkValidUpgrade for processing. If we provide a very long value for parameter cookie, the buffer overflow vulnerability can be triggered.

4. Data and Source Code

Firmware Dataset

Lara_firmware

Known Vulnerability dataset

For the detail source and sink of the known vulnerability dataset, contact me and state your purpose.

Source Code

Upcoming update

Page updated

Google Sites

Report abuse