[1] https://web.stanford.edu/class/archive/cs/cs107/cs107.1166/guide_callgrind.html
[2] http://valgrind.org/docs/manual/valgrind_manual.pdf
[2] https://web.stanford.edu/class/archive/cs/cs107/cs107.1166/guide_valgrind.html
1. Counters
Ir: I cache reads (instructions executed)
I1mr: I1 cache read misses (instruction wasn't in I1 cache but was in L2)
I2mr: L2 cache instruction read misses (instruction wasn't in I1 or L2 cache, had to be fetched from memory)
Dr: D cache reads (memory reads)
D1mr: D1 cache read misses (data location not in D1 cache, but in L2)
D2mr: L2 cache data read misses (location not in D1 or L2)
Dw: D cache writes (memory writes)
D1mw: D1 cache write misses (location not in D1 cache, but in L2)
D2mw: L2 cache data write misses (location not in D1 or L2)
2. Relative price
L1 miss will typically cost around 5-10 cycles
L2 miss can cost as much as 100-200 cycles
3. Tips
1. Callgrind measures only that code which is executed, so be sure you are making diverse and representative runs that exercise all appropriate code paths.
2. Callgrind records the count of instructions, not the actual time spent in a function. Costs associated with I/O won't show up in the profile.
3. About terminology. If routine A calls routine B
- Routine A is the caller
- Routine B is the callee