For Developers‎ > ‎

Profiling Chromium and Blink

There are a few ways to profile Chromium and Blink.  Here are some of the tools that work well for diagnosing performance problems.
See also the Deep memory profiler.

Built-In Tools

For JavaScript issues, the built in profiler works very well.  To use this open up the Chrome Dev Tools (right click, Inspect Element) and select the 'Profiles' tab.

For a broader understanding of Chromium speed and bottlenecks, as well as understanding how posted-task and threads interact in aggregate, there is a cross-platform, task-level profiler built in.  Profiler results can be seen in about:profiler (or equivalently chrome://profiler)  For more details, visit  (http://www.chromium.org/developers/threaded-task-tracking).

See chrome://tracing for timelines showing TRACE_EVENT activity across all the different threads; originally used for GPU performance, and will probably require you to add TRACE_EVENT calls to the features you're interested in outside of compositing & rendering (this was named about://gpu through M14).

C++

For native C++ code the tools depend on the OS.
Note that basic printf debugging and using a general debugger (such as gdb) may be sufficient for some purposes. However, more specialized tools are available.

Linux

See LinuxProfiling for alternative discussion.

gperftools

The gperftools project, from which we get TCMalloc, also includes a very nice profiler: Google CPU Profiler.
  • Add profiling=1 release_extra_cflags=-fno-omit-frame-pointer disable_pie=1 to your GYP_DEFINES then rerun gyp (build/gyp_chromium) and rebuild. This will ensure that even your Release builds will have the necessary flags to be able to capture callstacks. Assuming you already have a GYP_DEFINES line in your configuration file, adding the following line after it should work:
GYP_DEFINES+=" profiling=1 release_extra_cflags=-fno-omit-frame-pointer disable_pie=1"
  • Chromium will accept three new command line flags:
    • --profiling-file
      This is a template for where the pprof data gets generated. It can contain the substrings {pid} and {count} which will be replaced by the pid of the process and the count of the profile run w/in a process. Chromium also adds a {type} substring replacement for the process type. The default is chrome-profile-{type}-{pid}
    • --profiling-at-start
      When specified profiling will be enabled from the very beginning of the program. An optional string argument can be given which can specify a process type to profile (e.g., zygote, renderer). This works for the renderer (Blink) (--profiling-at-start=renderer) as well if you also add --no-sandbox, so that the samples can be written out to disk.
    • --profiling-flush
      This causes the profile data to be flushed periodically. Typically this isn't needed as the data is flushed at exit, but sometimes there are bugs that prevent the atexit handlers from getting run. This is more often the case when --single-process is specified. It's also useful when profiling the renderer, as the renderer is more likely to be terminated quickly.
  • The perf data is generated by default in the current directory, so to avoid making a mess of the source tree, it's safest to run Chromium from outside the source directory, such as via:
src/out/Release/chrome

  • You can turn profiling on and off:
    • ...from within the program by using the chrome/common/Profiling class;
    • ...interactively via a new menu item (checkbox), MenuToolsProfiling Enabled.
  • Only the browser process (Chromium) is profiled. To profile the renderer (Blink), run in single-process mode via:
src/out/Release/chrome --single-process

pprof

Use pprof to analyze the results:

pprof src/out/Release/chrome chrome-profile-browser-NNN

Some tips:
  • pprof defaults to interactive use; you can use flags to just output a file.
You should be able to increase (or decrease) the sampling frequency (defaults to 100 Hz = every 10 milliseconds) via the CPUPROFILE_FREQUENCY environment variable, but 

For nice viewing, output in DOT format and view with one of these programs: XDot (packaged in Ubuntu), ZGRViewer.
pprof --dot src/out/Release/chrome chrome-profile-browser-NNN > NNN.dot
(if using --focus, there will also be an "After ..." line, so trim off the first two lines.)

You can also pipe directly to xdot if you don't want a temporary file:
pprof --dot src/out/Release/chrome chrome-profile-browser-NNN | xdot -

test_shell

Previously you could control when the profiler starts and stops from within test_shell (from revision 41218 until revision 132841). This can help a lot when trying to isolate a certain action without polluting the profile with a lot of startup/shutdown code.  To do this:
  • Compile test_shell with profiling enabled (same as above)
  • Run test_shell without the CPUPROFILE environment variable set and pass in the --profiler flag (for example, "out/Release/test_shell --profiler"
  • Start the profile by running "(new chromium.Profiler).start()", stop it by running "(new chromium.Profiler).stop()".  This can be done from JavaScript inside the page or by typing "javascript:(new chromium.Profiler).start()" into the URL bar and hitting enter
The profile will be saved in a file called "chrome-profile" in the working directory.  You can't stop and restart the profiler without blowing away the previously stored data currently.

perf

You can also use the standard Linux perf tool:
  • ps aux | grep chromium to find a particular browser/renderer/gpu process
  • perf record -f -g -p <pid> to capture that process
  • perf report for the profile output
  • perf annotate "<fully qualified function name>" for assembly language and (very approximate?) per-instruction cycle counts
By default this saves "perf.data" in the current working directory, which can be renamed. perf report may be able to run on older data, but perf annotate will be inaccurate if you've since rebuilt the executable.

Chrome OS

Profiling for Chrome OS is very similar to Linux, with a couple of key differences

  • The default directory for the profile output is that of the chrome executable, which is read only. So you'll either need to run chrome from elsewhere (an sshfs'd mount of your development directory) or specify the --profiling-file option.
  • You need to run the pprof program on your desktop so it will need access to not only the profile output and chrome executable but the .so's that chrome depends on. To address this you can sshfs your chromeos machine's root folder back to your desktop and provide a search path to it when you run pprof. So:
    • On your desktop
      • Build chrome with profiling=1, inside chroot:
        declare -x EXTRA_BUILD_ARGS="profiling=1"
      • Push it to the device
      • nkostylev: I was unable to generate pprof graph with meaningful function names for Chrome build inside chroot even though executable contained all symbols. Using chrome build outside of chroot works fine
        • Outside of chroot, update GYP_DEFINES, include sysroot=<path to chroot> i.e.
          GYP_DEFINES="chromeos=1 target_arch=ia32 profiling=1 sysroot=/usr/local/home/username/chromiumos/chroot/build/x86-mario"
        • Re-run gclient runhooks, build Chrome Release build
          gclient runhooks
          make BUILDTYPE=Release out/Release/chrome -j 15
        • Inside chroot, set up the environment
          declare -x BOARD=x86-mario
          declare -x BUILDTYPE=Release
          declare -x CHROME_ORIGIN=LOCAL_BINARY
          declare -x FEATURES="-usersandbox"
          declare -x USE="-build_tests"
          declare -x BUILD_OUT=out
        • Push chromeos-chrome to the device
    • On your chromeos machine
      • Modify /sbin/session_manager_setup.sh to pass the --profiling-file=/tmp/chrome-profile-{pid} as one of the arguments to chrome.
      • Enable / disable profiling, or specify --profiling-at-start or call Profiling::Start() / Profiling::Stop() from your code
    • On your desktop
      • mkdir /tmp/c
      • sshfs chronos@{chronos-ip}:/ /tmp/c
      • pprof --lib_prefix=/tmp/c/lib,/tmp/c/usr/lib,/tmp/c/opt/google/chrome/chromeos {chrome executable} /tmp/c/tmp/chrome-profile-{pid}
        • Note: There should be a way to use the debug symbols from your build root instead of the libraries from the device itself, but it's left as an exercise for the interested student.

OS X

DTrace and the pre-packaged "CPU Sampler" tool in XCode work well.  Shark or the command-line sample work also, though they both will spend an exceedingly long time processing symbols if you are running Leopard (10.5).  Anecdotally this is much faster in Snow Leopard (10.6)

Windows

SyzyProf is a made-to-order, license-free, instrumenting hierarchical performance profiler that works well with Chrome. The aim with SyzyProf is to allow comprehensive profiling of Chrome code, including profiling over tasks or IPC, as well as integrated profiling over JavaScript in V8 and C++. SyzyProf is implemented as a 20% labor of love by a small group of Chrome developers, and we're looking for more >= 20% help.

I've heard that Purify has a profiler but have no experience with this personally.

AMD Code Analyst is a free profiler that can run inside Visual Studio. It captures frequency counts for functions in every process on the computer. It can optionally capture call-stack information, %CPU, and memory usage statistics; even with the Frame Pointer Omission optimization turned off (build\internal\release_defaults.gypi; under 'VCCLCompilerTool' set 'OmitFramePointers':'false'?), the call stack capture can have lots of bad information, but at least the most-frequent-caller seems accurate in practice.

Intel's VTune 9.1 does work in the Sampling mode (using the hardware performance counters), but call graphs are unavailable in Windows 7/64.  Note also that drilling down into the results for chrome.dll is extremely slow (on the order of many minutes) and may appear hung.  It does work (I suggest coffee or foosball).  VTune has been essentially supplanted by Intel® VTune™ Amplifier XE, which is an entirely new code base and interface, AFAIK.

Very Sleepy (http://www.codersnotes.com/sleepy) is a light-weight standalone profiler that seems to works pretty well for casual use and offers a decent set of features.

Android

Similarly to Linux, perf is the recommended tool to profile native code on Android. First, make sure you have built the browser with the set of GYP_DEFINES described above. Then, use the following wrapper script to launch the browser and follow the instructions:

$ tools/perf/record_android_profile.py --browser=android-chromium-testshell --profiler=perf
Press enter to start profiling...
>> Starting profiler perf
Press enter or CTRL+C to stop

The script will automatically pull the profiling data from the device and print out instructions for viewing it. Note that several files will be generated: one for the browser process and ones for each renderer process.

To view the profile, run:
tools/telemetry/bin/prebuilt/android/perfhost report --symfs /tmp/tmpjySSsF -n -i /tmp/tmpjySSsF/perf.browser0
To view the profile, run:
tools/telemetry/bin/prebuilt/android/perfhost report --symfs /tmp/tmpjySSsF -n -i /tmp/tmpjySSsF/perf.renderer0

GPU profiling

Both nVidia PerfHUD and Microsoft PIX are freely available. They may not run without making minor changes to how the graphics contexts are set up; check with the chrome-gpu team for current details.

The OpenGL Profiler for OSX allows real-time inspection of the top GL performance bottlenecks, as well as call traces.  In order to use it with Chrome/Mac, you must pass --disable-gpu-sandbox on the command line.  Some people have had more luck attaching it to the GPU process after-the-fact than launching Chrome from within the Profiler; YMMV.

GPUView is a Windows tool that utilizes ETW (Event Tracing for Windows) for visualizing low-level GPU, driver and kernel interactions in a time-based viewer.  It's available as part of the Microsoft Windows Performance Toolkit, in  %ProgramFiles%\Microsoft Windows Performance Toolkit\GPUView.  There's a README.TXT in there with basic instructions, or see http://graphics.stanford.edu/~mdfisher/GPUView.html.  N.B.:  There's a known bug which causes GPUView to crash when visualizing traces captured on machines with more than 8 cores.  On an HP Z600, disabling hyperthreading in the BIOS is enough to work around this issue.
Comments