LEOPARD -- An attack surface identification tool

Introduction

LEOPARD is a generic, lightweight and extensible framework to identify attack surfaces at the function level through program metrics.

It does not require any user inputs or any prior knowledge about known vulnerabilities, and does not need to compile the target application and thus supports partial code.

In particular, LEOPARD works in two steps by combining two sets of systematically derived metrics : complexity metrics and vulnerability metrics.

First, it uses complexity metrics to group the functions into a set of bins.
Second, it leverages vulnerability metrics to rank the functions in each bin, and identifies the top ones as potentially vulnerable.

Experimental results on 11 real-life projects have demonstrated that LEOPARD can cover 74.0% of vulnerabilities by identifying 20% of functions as vulnerable. Based on the identified vulnerable functions in current stable release of MJS, XED, radare2, FFmepg and PHP, we discovered 22 new bugs, among which 8 are zero-day vulnerabilities.

Ground Truth Sample -- Libav 11.8

The list of CVEs reported by July 2018 to version 11.8 of Libav is available here.

For all 14 CVEs, we succeeded to collect patches for 6 of them : CVE-2017-16803, CVE-2017-9051, CVE-2016-9821, CVE-2016-9820, CVE-2016-9819, CVE-2016-8675.

You can download the patches from here.

CVEs Discovered with the Help of LEOPARD

PHP 5.6.30 (6 CVEs) : CVE-2017-9224, CVE-2017-9225, CVE-2017-9226, CVE-2017-9227, CVE-2017-9228, CVE-2017-9229

FFmpeg 3.1.3 (1 CVE) : CVE-2018-4394

radare2 2.7.0 (1 CVE) : CVE-2018-14017

Cumulative Plot for the Distribution of Vulnerable and Overall Functions

In the following figures, x-axis is the complexity score, y axis is the cumulative portion.

From the plots we can see, the cumulative line of overall functions increases sharply in the beginning where bins are with smaller complexity functions. But the increase of vulnerable functions happens with bins with relatively larger complexity scores. Sparsity of bins with larger complexity-values benefits the selection of most vulnerable functions. However, there is a large set of functions falling into the bin with low complexity scores. To distinguish these functions, our vulnerability metrics help to identify the potential vulnerable functions in the bins with smaller complexity-values by selecting the functions with high vulnerability scores. Hence the complexity metrics and vulnerability metrics play complementary roles to find the potential vulnerable functions.

For vulnerable functions, they are spread around different bins with a relatively high vulnerability values. This shows that binning and ranking strategy can indeed help to identify the vulnerable functions with different complexity.

BIND

Binutils

FFmpeg

FreeType

Libav

LibTIFF

libxslt

Linux

OpenSSL

SQLite

Wireshark

p-values for each vulnerability metrics with statistics from 11 benchmark

For each vulnerability metrics, we calculate its average value separately of overall functions and vulnerable functions, and show the ratio in the following table. The average value for vulnerable functions are always larger, which indicates they can lift the ranking of vulnerable functions. We also present the average value of all 11 projects, which ranges from 1.5 to 7.3.

For each vulnerability metrics, as there are 11 projects, we get two lists of 11 values with average values. We use p-value to show one sequence is statistically larger than the other one. p-value is a standard measure used to quantify the idea of statistical significance of evidence, specifically with p<0.05. Obviously, the p-value for each vulnerability metrics is less than 0.05.

Initial Candidate Vulnerability Metrics Examples

We build our vulnerability metrics by extracting the common semantic abstraction from concrete vulnerability cases. We started from a comprehensive list with the intention of the completeness, and examples in initial candidate list are given below. In the end, we chose a subset of them based on the accuracy of the experimental results and reported in our paper.

Also, we find vulnerability metrics for too specific vulnerabilities types work no so well in our task. Once obtaining vulnerability type information about the ground truth, we can also test how B&R works on such task to identify specific vulnerabilities.

Metrics inspired by off-by-one vulnerabilities:

#of constants involved in loop predicates

# of constants involved in control predicates

Metrics inspired by array related buffer overread/overwrite:

# of array variables
# of array variables involved in loop predicates
of array variables involved in control predicates
# of array Variable

others sensitive operations:

# of shift operation
# of bit Operation
# variables involved in shift operations
# variables involved in bit operations

Complementary Materials for RQ1: Rationality of Binning before Ranking

BIND

Binutils

FFmpeg

FreeType

Libav

LibTIFF

libxslt

Linux

OpenSSL

SQLite

Wireshark

Complementary Materials for RQ2: Effectiveness of Binning-and-Ranking

Average recall when identifying different percentage of functions as vulnerable

Best performance are highlighted with bold font and records missing in the manuscript are highlighted with red color.

LEOPARD vs Random Forest

BIND

Binutils

FFmpeg

FreeType

Libav

LibTIFF

libxslt

Linux

OpenSSL

SQLite

Wireshark

LEOPARD vs Gradient Boosting

BIND

Binutils

FFmpeg

FreeType

Libav

LibTIFF

libxslt

Linux

OpenSSL

SQLite

Wireshark

LEOPARD vs Naive Bayes

BIND

Binutils

FFmpeg

FreeType

Libav

LibTIFF

libxslt

Linux

OpenSSL

SQLite

Wireshark

LEOPARD vs SVC

BIND

Binutils

FFmpeg

FreeType

Libav

LibTIFF

libxslt

Linux

OpenSSL

SQLite

Wireshark

Complementary Materials for RQ4: Scalability of LEOPARD

Complementary Materials for RQ5: Application of LEOPARD

Some queries used to calculate complexity and vulnerability metrics

Cyclomatic metrics

To get CFG edge information:

queryNodeIndex("functionId:(id1 id2 idN) AND isCFGNode:True").as("x").outE("FLOWS_TO").id.as("y").select{it.functionId}{it}.groupBy{it[0]}{it[1]}{it.size()}.cap

To get CFG Node information:

queryNodeIndex("functionId:(id1 id2 idN) AND isCFGNode:True").functionId.groupCount().cap

Number of loops

To get loop statements:

queryNodeIndex("functionId:(id1 id2 idN) AND type:(ForStatement WhileStatement DoStatement)").groupBy{it.functionId}{it.id}.cap

Number of nested loops/Max nesting loops

To get loop statements:

queryNodeIndex("functionId:(id1 id2 idN) AND type:(ForStatement WhileStatement DoStatement)").groupBy{it.functionId}{it.id}.cap

To get loop statements' AST children nodes:

idListToNodes([id1, id2, idN]).as("x").astNodes().as("y").select{it.id}{it.id}.groupBy{it[0]}{it[1]}.cap

Parameter variables

queryNodeIndex("functionId:(id1 id2 idN) AND type:Parameter").as("x").children().as("y").select{it.id}{it}

Number of pointer arithmetic & variables involved in pointer arithmetic

To get number of struct pointer member access operations(like a->b):

queryNodeIndex("functionId:(id1 id2 idN) AND type:PtrMemberAccess").functionId.groupCount().cap

To get variables in struct pointer member access operations:

queryNodeIndex("functionId:(id1 id2 idN) AND type:PtrMemberAccess").out("USE").groupBy{it.functionId}{it.code}.cap

To get address operations(like &a) and their variables involved:

queryNodeIndex("functionId:(id1 id2 idN) AND type:UnaryOp").as("op").children().filter{it.type=="UnaryOperator" && it.code=="&"}.back("op").astNodes().filter{it.type=="Identifier"}.code.as("sys").select

To get number of pointer dereference operations(like *a):

queryNodeIndex("functionId:(id1 id2 idN) AND type:UnaryOperator").filter{it.code=="*"}.functionId.groupCount().cap

To get variables in pointer dereference operations:

queryNodeIndex("functionId:(id1 id2 idN) AND type:UnaryOp").as("here").children().filter{it.type=="UnaryOperator" && it.code=="*"}.back("here").out("USE").groupBy{it.functionId}{it.code}.cap

To get relation expressions and their variables involved(to filter ones include pointer variables):

queryNodeIndex("functionId:(id1 id2 idN) AND type:(RelationalExpression EqualityExpression)").as("x").children().filter{it.type=="Identifier"}.code.as("y").select

To get additive expressions and their variables involved(to filter ones include pointer variables):

queryNodeIndex("functionId:(id1 id2 idN) AND type:AdditiveExpression").as("x").children().filter{it.type=="Identifier"}.code.as("y").select

To get increment/Decrement operations(to filter ones include pointer variables):

queryNodeIndex("functionId:(id1 id2 idN) AND type:(UnaryExpression IncDecOp)").groupBy{it.functionId}{it.code}.cap

Number of nested control structures/Max control structures

To get control structures statements:

OR(queryNodeIndex("functionId:(id1 id2 idN) AND type:Condition").in("IS_AST_PARENT"),queryNodeIndex("functionId:%s AND type:ForStatement")).dedup().groupBy{it.functionId}{it.id}.cap

To get control structures statements' AST children nodes:

idListToNodes([id1, id2, idN]).as("x").astNodes().as("y").select{it.id}{it.id}.groupBy{it[0]}{it[1]}.cap

Max control-dependent control structures

To get control structures nodes' ids:

getNodesWithType("Condition").id

To get CDG information to know whether the control structures are control dependent:

queryNodeIndex("functionId:(id1 id2 idN) AND isCFGNode:True").as("x").outE("CONTROLS").as("y").select{it}{it}

Max data-dependent control structures:

To get control structures nodes' ids:

getNodesWithType("Condition").id

To get DDG information to know whether the control structures are Involved with the same var:

queryNodeIndex("functionId:(id1 id2 idN) AND isCFGNode:True").as("x").outE("REACHES").as("y").select{it}{it}

To get var declare statements information to identify those pointer vars that share the same memory:

queryNodeIndex("functionId:(id1 id2 idN) AND type:IdentifierDeclStatement").as("x").uses().code.as("y").select{it}{it} & queryNodeIndex("functionId:(id1 id2 idN) AND type:IdentifierDeclStatement").as("x").defines().code.as("y").select

If statements without else

queryNodeIndex("functionId:(id1 id2 idN) AND type:Condition").parents().filter{it.type=="IfStatement"}.as("x").ithChildren('0').as("y").select{it}{it}

Variables involved in control predicates

To get control structures nodes' ids:

getNodesWithType("Condition").id

To get DDG information to know which var are used by the control structures:

queryNodeIndex("functionId:(id1 id2 idN) AND isCFGNode:True").as("x").outE("REACHES").as("y").select{it}{it}