M3 - Automated Taint Tracking for Accurate Detection of Security Weaknesses
Description
Use existing code to track a susceptible data value across a program.
Courses Where This Module Is Integrated
Software Quality Assurance (Auburn University, Spring 2023, Fall 2023)
Mobile Security (Tuskegee University, Fall 2023)
Activities
Pre-lab Content Dissemination
Our previous module taught us why bugs and security vulnerabilities must be proactively discovered. Even though there is a wide range of tools, we need to track the flow of data to find a bug or a vulnerability accurately. One approach to accurately finding a bug or a vulnerability is taint tracking, where we designate a taint and then track the taint across a program. In this workshop, we will develop a program that uses existing code provided by the PI (Akond Rahman) to track a taint, i.e., a data value.
In-class Hands-on Experience
Check the code out in `workshops/workshop2-calc.py` and `workshops/workshop2-analysis.py`. The code is located here: https://github.com/paser-group/ALAMOSE-PASER/tree/ALAMOSE/workshops/workshop3-taint
Understand the code in `workshops/workshop2-calc.py` to see manually how simpleCalculator() works
Write the flow of execution for `workshops/workshop2-calc.py`
Understand the code in `workshops/workshop2-analysis.py` to understand how the parse tree and relevant components from `workshops/workshop2-calc.py` have been extracted
Code Understanding: The cal.py program demonstrates that we have a main function that will call a function called "simpleCalculator." the function will calculate the provided data depending on its operator, which is given in the function parameter. So "simpleCalculator" functions receive three parameters: v1, v2, and operation.
# cal.py
def simpleCalculator(v1, v2, operation):
res = 0
if operation=='+':
res = v1 + v2
elif operation=='-':
res = v1 - v2
elif operation=='*':
res = v1 * v2
elif operation=='/':
res = v1 / v2
elif operation=='%':
res = v1 % v2
return res
Now, we will call this "simpleCalculator" from the main function as we have an entry function main and will get the result from the simpleCalculator function and print the values.
# cal.py
if __name__=='__main__':
val1, val2, op = 1000, 1, '+'
data = simpleCalculator(val1, val2, op)
print('Value#1:{} \nValue#2:{} \nOperation:{} \nResult:{}'.format( val1, val2, op, data ) )
Now we know that if "if __name__=='__main__': " this line of code exists in the python file, then this will be the entry point of that python program. Now, we will demonstrate the analysis.py program. In this program, we have to complete the "taintTrack" which will track the given value. If you see the main function we have a variable called data2track and another variable called input_program and then it calls a function called "checkFlow".
# analysis.py
if __name__=='__main__':
input_program = 'calc.py'
data2track = 1000
checkFlow( data2track, input_program )
The checkFlow function gets all the variables, function assignments, function calls, and full execution tree by calling other functions implemented in this code file. In this program, we heavily used a module named "ast." The `ast` module helps Python applications process trees of the Python abstract syntax grammar.
# analysis.py
def checkFlow(data, code):
full_tree = None
if os.path.exists( code ):
full_tree = ast.parse( open( code ).read() )
# First let us obtain the variables in forms of expressions
fullVarList = getVariables(full_tree, 'VAR_ASSIGNMENT')
# Next let us get function invocations by looking into function calls
call_list = getFunctionAssignments( full_tree )
# Now let us look into the body of the function and see of the parameter is used
funcDefList, funcvarList = getFunctionDefinitions( code )
#For the workshop please use fullVarList, call_list, funcDefList, funcvarList
# Then print a path like the following:
# 1000->val1->v1->res
var_df = pd.DataFrame( fullVarList, columns =['LHS', 'RHS', 'TYPE'] )
call_df = pd.DataFrame( call_list, columns =['LHS', 'FUNC_NAME', 'ARG_NAME', 'TYPE'] )
func_def_df = pd.DataFrame( funcDefList, columns =['FUNC_NAME', 'ARG_NAME', 'TYPE'] )
func_var_df = pd.DataFrame( funcvarList, columns =['LHS', 'RHS', 'TYPE'] )
info_df_list = [var_df, call_df, func_def_df, func_var_df]
trackTaint( data , info_df_list )
In this analysis.py program file, we implemented more functions to make the program more modular and to increase its readability. The following functions are
getVariables function gets all the variables and types
getFunctionAssignments function gets all the functions calling and assignments
getFunctionDefinitions function gets all the function definitions and variables
For the full version of code go to https://github.com/paser-group/ALAMOSE-PASER/blob/ALAMOSE/workshops/workshop3-taint/analysis.py