Prior works suffer from:
Reliability
Soundness (i.e. only constant propagation for global variables, missing other variables i.e. stack variables, and other data types such as pointers and structs)
Overhead
Therefore, they are impractical
User is interested in creating a specialized version of the scaled-down wc (Listing 1) to support line count (Listing 2) or character count (Listing 3).
Commented lines (in green) represent the unnecessary code that should be removed.
Challenges:
How to identify relevant code and unnecessary code?
How to perform effective simplification?
How to conduct sound analysis?
How to avoid analysis overhead?
Program debloating aims to enhance the performance and reduce the attack surface of bloat applications. Several techniques have been recently proposed to perform program debloating and remove (or secure) dead code from applications. These approaches are either based on unsound strategies or too strict assumptions on the program. Therefore, these approaches are hardly employable in practice. In this paper, we address these limitations by leveraging symbolic execution and partial evaluation to generate specialized applications. Our approach relies on a simple observation that applications comprise configuration and main logic. The configuration specifies which functionalities in the main logic that should be executed. The symbolic execution provides the partial evaluation of a set of variables that will become constants. The partial evaluation then propagates the constant variables, simplifies the application, and finally generates the specialized application. Our evaluation over different sets of real-world applications demonstrates that LMCAS can successfully remove all unwanted features. LMCAS achieves a 25% reduction in the binary size and can reduce the number of functions by 40% in the specialized applications. LMCAS enhances the security of specialized applications. It reduces the attack surface of code-reuse attacks by removing 51.7% of the total gadgets and eliminates 80% of known CSV vulnerabilities. LMCAS can run up to 1500x and 1.2x faster state-of-the-art debloating techniques CHISEL and OCCAM. Further, we demonstrate the scalability and robustness of LMCAS on three large applications: tcpdump, readelf, and objdump. LMCAS is thus a practical approach for debloating real-world programs.
LMCAS Debloating Pipeline
Neck Identification: the neck can optimize our analysis. Accordingly, we devise an algorithm for identifying the neck automatically.
Symbolic Execution: a counterpart to the partial evaluator. It executes the original application based on a set of supplied inputs, which represent the functionality that will be supported by the specialized application. The symbolic execution provides 1) a set of global and local variables and their corresponding values and 2) a record of the visited functions and their basic block.
Constant Conversion: converts captured variables in the previous step to constants based on the corresponding constant values.
Multi-Stage Simplification: we perform several compiler-assisted and customized LLVM passes for optimizing the program and remove unnecessary functionalities.
The neck splits the app to:
Configuration Logic:
Main Logic:
RQ1: What is the debloating performance of LMCAS with respect to quality of the specialized apps and analysis time?
RQ2: How scalable and robust is LMCAS in debloating large applications?
RQ3: Can LMCAS reduce the attack surface?
Dataset:
We used 3 datasets in our evaluation as shown on the table on the right
Implementation:
LLVM 6.0 to implement the optimization passes
KLEE 2.1 for the symbolic execution
Tcpdump: (version 4.10.0; 77.5k LOC) analyses network packets. We link against its accompanying libpcap library (version 1.10.0; 44.6k LOC).
GNU readelf from GNU Binutils (version 2.33; 78.3k LOC and 964.4k of library code) displays information about ELF object files
GNU objdump from GNU Binutils (version 2.33; 78.3k LOC and 964.4k of library code) displays information about object files.
Implementation (will be released soon)