HPCToolkit is an integrated suite of tools for measurement and analysis of program performance on computers ranging from multicore desktop systems to the nation's largest supercomputers.
This website has list of available features and examples to use the HPCToolkit.
HPCToolkit wiki page hosts a suite of tools for tracing, profiling and analyzing parallel programs.
Perf is a profiler tool for Linux 2.6+ based systems that abstracts away CPU hardware differences in Linux performance measurements and presents a simple command line interface. Perf is based on the perf_events interface exported by recent versions of the Linux kernel. This wiki page gives complete details of using perf for profiling.
GDB, the GNU Project debugger, allows you to see what is going on `inside' another program while it executes -- or what another program was doing at the moment it crashed. This website provides links for software download, usage documentation etc.
Valgrind is an instrumentation framework for building dynamic analysis tools. There are Valgrind tools that can automatically detect many memory management and threading bugs, and profile programs in detail. One can also use Valgrind to build new tools. This website is a single point resource for all the Valgrind related learning.
Performance Application Programming Interface (PAPI) provides the tool designer and application engineer with a consistent interface and methodology for use of the performance counter hardware found in most major microprocessors. In addition, PAPI provides access to a collection of components that expose performance measurement opportunities across the hardware and software stack.
Memcheck is a memory error detector. Memcheck is part of Valgrind. This website details the complete usage of this tool.
Intel® Software Development Emulator or Intel® SDE is built upon the Pin dynamic binary instrumentation system and the XED encoder decoder. Intel SDE helps developers to gain familiarity with upcoming instruction set extensions.