HPC Home‎ > ‎User Support‎ > ‎

Valgrind Utility

If you are unaware of the memory needs of a job, you might want to reserve an entire node to do what is called memory benchmarking (getting sense of how an application behaves to different constraints). For many simulation kinds, there is a direct correlation between the sample size and memory (for ex. memory allocation could be linear with the size of a 1D grid). If time is an issue, section "memory forecasting" below could be an alternative. Below are the steps to profiling memory usage (taking memory snapshots at certain intervals in time) of a particular process:

  • Reserve an exclusive node* (reserve the node) 
    qsub -I -l nodes=1:ppn=<x>:<keyword>,mem=<m>gb
    Here, the value of ppn and mem depends on the type of node defined by keyword.Visit Server & Storage for details. 
  • Identify the executable that you would running memory profiling on (say exec)
    valgrind --tool=massif --pages-as-heap=yes --time-unit=B exec <exec input arguments if any>
    returns massif.out.<process name>
    **Use --time-unit=B for job that have short life-time. The difference being : units of measurement are in bytes, and that gives higher granularity in measurement at finer time intervals.
  • Once the aforementioned process finishes, check total memory usage using:
    ms_print massif.out.<process name> | more
    produces an output as:
    Valgrind Snapshot

    and one could deduce peak memory usage from the vertical scale, and allocate 5-10% more (than what is estimated) while submitting the job (#PBS -l mem=<m>gb).

Forecasting Memory Resource Requirement (linear interpolation)

Memory estimation runs could be time-consuming in cases that have enormous sample sizes. "Fortune telling" could be of some help in cases where linear interpolation thrives. Even in chaotic, and non-linear cases, choice of an appropriate basis could be of some help. A variable 1D grid of various sample sizes, and memory consumption follows :

Size Memory
(in MB)
 5    255.9       
 10 258.7 
 15 263.1
 20 266.8
 25 269.7
    Estimating memory usage on the cluster

  • Reserve a node exclusively for memory benchmarking:
    qsub -I -l nodes=1:ppn=1,mem=<specify total job memory here>gb

  • Run valgrind for different memory samples, and tabulate the memory values:
    valgrind --tool=massif --pages-as-heap=yes --time-unit=B exec <exec input arguments if any>
  • Change the input sizes, and benchmark for various sample sizes, store the values in a file -- values.csv, the format of which is a comma-separated version (no blank entries!):
    5,255.9
    10,258.7
    15,263.1
    20,266.8
    25,269.7
    30,est
    35,est
    40,est
    As you see, the entries 30, 35, and 40 have a ",est"  imply memory estimation (unknown values for which the memory ought to be computed).

  • Estimate tool generates a line graph (outputs to VALGRIND.png), and prints the memory estimates to screen:
    estimate values.csv
  • To display the graph (works on login node only), use:
    display VALGRIND.png
Here are some values the last three of which are compared to actual memory usage:

 Size Actual Predicted
 5 255.9             
 10 258.7  
 15 263.1 
 20 266.8 
 25 269.7 
 30 273.5 273.3
 35 276.5 276.6
 40 281.4             280

Comments