Valgrind Utility
Memory Benchmarking
If you are unaware of the memory needs of a job, you might want to reserve an entire node to do what is called memory benchmarking (getting sense of how an application behaves to different constraints). For many simulation kinds, there is a direct correlation between the sample size and memory (for ex. memory allocation could be linear with the size of a 1D grid). If time is an issue, section "memory forecasting" below could be an alternative. Below are the steps to profiling memory usage (taking memory snapshots at certain intervals in time) of a particular process:
Reserve an exclusive node* (reserve the node)
srun -N 1 -n <x> --mem=<m>gb --pty bash
Here, the value of "n" and "mem" depends on the type of node defined by keyword. Visit Server & Storage for details.
Identify the executable that you would running memory profiling on (say exec)
valgrind --tool=massif --pages-as-heap=yes --time-unit=B exec <exec input arguments if any>
returns massif.out.<process name>
**Use --time-unit=B for job that have short life-time. The difference being : units of measurement are in bytes, and that gives higher granularity in measurement at finer time intervals.
Once the aforementioned process finishes, check total memory usage using:
ms_print massif.out.<process name> | more
produces an output as:
and one could deduce peak memory usage from the vertical scale, and allocate 5-10% more (than what is estimated) while submitting the job (#PBS -l mem=<m>gb).
Forecasting Memory Resource Requirement (linear interpolation)
Memory estimation runs could be time-consuming in cases that have enormous sample sizes. "Fortune telling" could be of some help in cases where linear interpolation thrives. Even in chaotic, and non-linear cases, choice of an appropriate basis could be of some help. A variable 1D grid of various sample sizes, and memory consumption follows :
Estimating memory usage on the cluster
Reserve a node exclusively for memory benchmarking:
srun -N 1 -n 1 --mem=<specify total job memory here>gb --pty bash
Run valgrind for different memory samples, and tabulate the memory values:
valgrind --tool=massif --pages-as-heap=yes --time-unit=B exec <exec input arguments if any>
Change the input sizes, and benchmark for various sample sizes, store the values in a file -- values.csv, the format of which is a comma-separated version (no blank entries!):
5,255.9
10,258.7
15,263.1
20,266.8
25,269.7
30,est
35,est
40,est
As you see, the entries 30, 35, and 40 have a ",est" imply memory estimation (unknown values for which the memory ought to be computed).
Estimate tool generates a line graph (outputs to VALGRIND.png), and prints the memory estimates to screen:
estimate values.csv
To display the graph (works on login node only), use:
display VALGRIND.png
Here are some values the last three of which are compared to actual memory usage: