M3 - Automated Taint Tracking for Accurate Detection of Security Weaknesses

Description

Use existing code to track a susceptible data value across a program.

Courses Where This Module Is Integrated

Activities 

Pre-lab Content Dissemination 

Our previous module taught us why bugs and security vulnerabilities must be proactively discovered. Even though there is a wide range of tools, we need to track the flow of data to find a bug or a vulnerability accurately. One approach to accurately finding a bug or a vulnerability is taint tracking, where we designate a taint and then track the taint across a program. In this workshop, we will develop a program that uses existing code provided by the PI (Akond Rahman) to track a taint, i.e., a data value.  

In-class Hands-on Experience 


Code Understanding: The cal.py program demonstrates that we have a main function that will call a function called "simpleCalculator."  the function will calculate the provided data depending on its operator, which is given in the function parameter. So "simpleCalculator" functions receive three parameters: v1, v2, and operation.

# cal.py


def simpleCalculator(v1, v2, operation):

   res = 0

   if operation=='+':

       res = v1 + v2

   elif operation=='-':

       res = v1 - v2

   elif operation=='*':

       res = v1 * v2

   elif operation=='/':               

       res = v1 / v2

   elif operation=='%':               

       res = v1 % v2

   return res

Now, we will call this "simpleCalculator" from the main function as we have an entry function main and will get the result from the simpleCalculator function and print the values.

# cal.py

if __name__=='__main__':

   val1, val2, op = 1000, 1, '+'

   data = simpleCalculator(val1, val2, op)

   print('Value#1:{} \nValue#2:{} \nOperation:{} \nResult:{}'.format( val1, val2op, data  ) )

Now we know that if "if __name__=='__main__': " this line of code exists in the python file, then this will be the entry point of that python program. Now, we will demonstrate the analysis.py program. In this program, we have to complete the "taintTrack"  which will track the given value.  If you see the main function we have a variable called data2track and another variable called input_program and then it calls a function called "checkFlow".

# analysis.py

if __name__=='__main__':

   input_program = 'calc.py'

   data2track    = 1000

   checkFlow( data2track, input_program )


The checkFlow function gets all the variables, function assignments, function calls, and full execution tree by calling other functions implemented in this code file. In this program, we heavily used a module named "ast." The `ast` module helps Python applications process trees of the Python abstract syntax grammar.

# analysis.py


def checkFlow(data, code):

   full_tree = None

   if os.path.exists( code ):

      full_tree = ast.parse( open( code  ).read() ) 

      # First let us obtain the variables in forms of expressions

      fullVarList = getVariables(full_tree, 'VAR_ASSIGNMENT')

      # Next let us get function invocations by looking into function calls

      call_list = getFunctionAssignments( full_tree )

      # Now let us look into the body of the function and see of the parameter is used

      funcDefList, funcvarList = getFunctionDefinitions( code  )     

      #For the workshop please use fullVarList, call_list, funcDefList, funcvarList

      # Then print a path like the following:

      # 1000->val1->v1->res

      var_df       = pd.DataFrame( fullVarList, columns =['LHS', 'RHS', 'TYPE']  )

      call_df      = pd.DataFrame( call_list, columns =['LHS', 'FUNC_NAME', 'ARG_NAME', 'TYPE']   )

      func_def_df  = pd.DataFrame( funcDefList, columns =['FUNC_NAME', 'ARG_NAME', 'TYPE']   )

      func_var_df  = pd.DataFrame( funcvarList, columns =['LHS', 'RHS', 'TYPE']   )


      info_df_list = [var_df, call_df, func_def_df, func_var_df]

      trackTaint( data , info_df_list )


In this analysis.py program file, we implemented more functions to make the program more modular and to increase its readability. The following functions are

For the full version of code go to https://github.com/paser-group/ALAMOSE-PASER/blob/ALAMOSE/workshops/workshop3-taint/analysis.py