Purify is a commercial runtime analysis tool that detect memory errors, such as reading freed memory, writing past array bounds, reading uninitialized memory or leaking memory. On Linux, a similar free package is called Valgrind. The Chromium project use both Purify and Valgrind extensively to detect memory errors early in development rather than letting them linger in code for long periods of time. This page will talk about how Purify is used in the Chromium project. It is assumed that you have access to a Purify license and have some basic Purify knowledge. Getting Started
Build for PurifyThe general rule of thumb for Purify is that you'd like your code to be smaller, but without a lot of optimizations. The reason for (1) is that the more code you have, the slower Purify runs and the longer it takes to instrument your code. The reason for (2) is that if optimizations are maxed out, its error detection isn't as accurate, and the stacks traces it reports for errors may not be complete. There are a lot more specifics that go into this rule, so we have a special set of build configuration files that we use to do builds when running under Purify. I highly recommend this approach as it will give you the least amount of grief, even though it requires a completely separate compilation. If you use a Debug build, instrumentation and execution will be dramatically slower. If you use straight Release mode, you'll get odd behavior. Use the build configuration pulldown in Visual Studio and select "Purify" instead of "Debug" or "Release" and then build the project you're interested in. The output will show up in its own "Purify" directory. General Usage Tips
Purify on the buildbotsSince running an executable through Purify is significantly slower than normal execution, we split the various unit tests across (currently) five builders: Modules (net_unittests.exe and base_unittests.exe), Unit (unit_tests.exe), Webkit (test_shell_tests.exe), layout and UI (ui_tests.exe). These bots run Purify from the command-line using a set of scripts in tools/purify. The main driver script crome_tests.py knows specifically about each of the unit tests that the buildbots run, so if you want to add support for another test, start there. This script can also be run on your workstation (use --help for instructions). It's sometimes simpler to run this way than running in the Purify user interface, since you should replicate exactly what the bots are doing. Not every test within each unit test is run. The scripts look at a file called <module>/data/purify/<exe name>.gtest.txt (e.g. base/data/purify/base_unittests.exe.gtest.txt) for a list of tests that it should skip. For each module, there are a set of expected errors. Expected errors are errors which are known, but are supposed to be fixed. Every expected error should have a corresponding bug in the issue database. When a new error is detected that's not in the expected list, the bot turns red. When one of the expected errors doesn't show up, the bot turns orange (indicating an unexpected fix). In addition, there are some errors which are known, but completely filtered. The presence or absence of these errors don't change the bot status. Further, filtered errors don't even appear in the output of the bot. Skipped TestsEvery skipped test that's in the gtest filter file should have a corresponding bug number explaining why it's skipped. Some are skipped because they cause Purify to crash. Others are skipped because they simply run too slowly. Still others are skipped because the test isn't interesting to run in Purify. It's sometimes useful to disable a test temporarily in order to get the bot green while a fix is being worked on. In general, we should work to minimize the number of tests that we're skipping as it reduces the amount of test coverage we get. Expected ErrorsThe Purify scripts look in a global directory, tools/purify/data, and a per-module directory, <module>/data/purify (e.g. base/data/purify or chrome/test/data/purify) for some plain text files that contain a list of known expected and flakey errors. The filenames are of the form <exe name>_<CODE>.txt and <exe name>_<CODE>_flakey.txt. CODE is a three letter error code that Purify uses to identify the particular type of error (e.g. MLK - memory leak, UMR - uninitialized memory read). The expected file is the list of errors which happen every run. When an error in a expected file doesn't appear, then it turns the bot orange (an unexpected pass). The flakey file is for errors that don't happen reliably, so they don't turn the bot orange when an error in the list isn't present. The lists of errors are simply a collection of normalized stack traces. These are similar to the stack traces that appear in the Purify UI, but they've had a number of things removed from them: line numbers, system dll stack frames, address information, etc. This allows the stacks to be mostly unchanging from build to build and machine to machine. One confusing thing is that the errors that appear in the buildbot output are slightly different from these stacks. They retain line numbers and some other info to make them more useful to engineers investigating the error. So how do you get a normalized stack to add to the expected error file? On the waterfall, there's a link on the Purify test bubbles called "download". This provides a link to the various output data files from that particular Purify run. You can browse all of the data from all of the different tests at http://build.chromium.org/buildbot/purify/. In each output directory, you'll find a number of files. You'll find a file called <exe name>.txt, which is simply the raw output from Purify, with nothing omitted, grouped, or normalized. This is sometimes useful when you're tracking down a particular error since it will have more detailed information than the normalized output. There's a file called Summary.txt which is roughly equivalent to what gets printed to the buildbot output. If the bot turned red or orange, then there will also be error-specific files. These will be of the form: <exe name>_<CODE>_<TYPE>.txt (e.g. base_unittests.exe_MLK_NEW.txt). TYPE will be one of NEW, BASELINE, FIXED or GROUPS. NEW is the list of errors that are not in any of the expected files. BASELINE are the errors that are in the expected files and happened this run. FIXED is the list of errors that are in the expected files, but did not happen this run. GROUPS is an attempted summary of which tests had how many errors. These error-specific files have the fully normalized errors, so they're suitable for copying and pasting into the expected error files (make sure there's an extra newline after each error in the file). Filtered ErrorsAs described in the Getting Started section, Purify has a global filter file that can be used to completely filter errors from its output (even its raw output file). Unfortunately, this filter file is a binary file, so it's only possible to edit it from within Purify. Also unfortunately, you can't open up an arbitrary filter file to edit or to use from within Purify (that's why you had to copy the file from svn to your local directory). Here is the procedure for editing this file:
Since it's binary, people can't easily review your change. Given this, be very careful with edits. It's shockingly easy to inadvertently create a filter that will filter out all errors. Remember that this file is only for errors that we never intend to fix. Once a filter is added, you'll never see those errors again, so if it needs to be fixed, it's too easy to lose. Also, even if an error really is something we don't intend to fix, be very careful that you're not filtering out more code than you'd like. For example, let's say that there was a known benign error in a large method. Even if that error is something we never intend to fix, if you add a filter, you're now preventing us from detecting any other errors (of that type) in that entire method. Notice the phrase "be careful" was used many times in this section. Believe it. When the bot turns redGiven these tools, what should you do when a Purify bot turns red? First, do exactly what you'd do with any other error: identify the change that caused the error and revert it. Assuming that's not an option, the next steps to consider are modifying the expected errors file or the skipped test file and filing a bug so that the underlying problem gets fixed. If an error is benign but fixable, then fix the error, don't add it to the filter list. Why do this work? For example, let's say that the leak is just in the unit test itself, not in the underlying code. The reason for this is that the filters are imprecise. Stacks change, so the errors stop being filtered and the bots turn red again. Each error that exists adds a certain maintenance cost to the Purify bots as we deal with these regressions. It's worth a lot for us to keep the bots green with a minimum of false positives. If you're really, really sure that the error is something that's unfixable and benign, then consider modifying the filter file. Fixing ErrorsSo the bots have identified a real error in your code. Now what? The easiest thing to do is to just look at the error. It's often apparent simply by inspection what the error is. The nice thing about this is that you don't even need Purify installed, so you can just use the output from the bot to identify the error and fix it. Unfortunately, this is only really a viable approach if the error is really obvious. If you're not 100% sure what the actual cause of the error is, then you'll need to install Purify (see Getting Started above). The reason for this is that otherwise you can't verify the fix. If you're not sure what the cause is, then unless you can reproduce the error locally, fix it and verify that your fix worked, you're just guessing. Since the Purify buildbots take a long time to cycle, you'll leave the tree red for a long time, which is not acceptable. If you're a Linux developer and don't have access to Purify, you can try Valgrind. Purify and Valgrind have a large amount of overlap in the errors that they detect, so it's possible that you can reproduce it with Valgrind in order to track down your error. TroubleshootingCommon IssuesFor almost any symptom, try these solutions first.
Other IssuesTons of UMR errors from STLMicrosoft's implementation of STL actually has a bunch of uninitialized memory reads. As far as we've been able to tell, these UMRs are benign. In fact, Purify now filters out many of them by default. If for some reason these errors aren't being filterered out, then somehow your filter file has gotten messed up. Follow the instructions in Getting Started to copy the common filter file. My app is crashingFirst, make sure you can catch the crash in visual studio so you can see where it's crashing. Disable breakpad if necessary (--disable-breakpad). If that doesn't work try attaching with visual studio prior to the crash. Once you have a crash location, see the next tip. My app is crashing in an unexpected locationOften, this is actually a real (but latent) bug that's exposed by running in Purify. Purify instrumentation winds up changing how objects are laid out in memory and how memory is initialized. A common bug that fits this case is an uninitialized data structure. For whatever reason, in your normal build, the memory winds up being zero filled. In a Purify run it winds up being filled by garbage (or maybe just different garbage) that winds up triggering a bug in your code. If you're sure that the crash in question is bogus, it's possible that it's a Purify bug. See Report a bug below. Purify is crashing during instrumentationIf you have a reproducible crasher, save the exe and its associated pdb file and ask someone else to try to reproduce it, then see Report a bug below. I swear, it's not a UMR...Sometimes Purify will wrongly complain about an uninitialized memory read. Now, most of the time it is dead right, but occasionally it lacks sufficient information to know. For example, Purify seems unable to recognize when a buffer is filled by the kernel in response to an asynchronous read request triggered by a call to WinHttpReadData. (This may actually be a problem for all overlapped IO, but that remains unconfirmed.) If the kernel legitimately writes 0xCD, then Purify will complain when you read that byte. The problem is that Purify cannot tell that the 0xCD value in the buffer was produced by the kernel as 0xCD is the value that allocated memory is initialized to when running under Purify. (Purify has a similar problem with shared memory, but in that case, it recognizes the shared memory and simply does no analysis on it, either for leaks or UMRs) Be careful with this conclusion however. We've had cases where someone was sure that it was a Purify bug that actually turned out to be a legit UMR. If the original conclusion had been kept and the error had been filtered out, the bug likely would have lingered and caused problems for a long time. The UMR shows up on buildbot, but not on my machine.Some classes of UMRs can be flaky due to alignment padding / compiler initialization issues. An example we ran into recently was someone using a static array of bytes (not null terminated) and assigning it to a std:string. The assign function in this case wound up implicitly converting his const char* array to a std:string, which called strlen, which then should have always done an unbounded read followed by a copy. However, it didn't always repro. The reason was that the static array wasn't an even number of bytes, so it was padded out to the nearest 4-byte boundary. These were often zeros (in debug builds, always), but occasionally not. Purify didn't detect reading into the padding as a UMR, but did detect past it. So when the padding was all non-zero, you had a UMR in the strlen. My app is running too slowlyRunning apps under Purify will be significantly slower than normal. Read the general usage tips above for ideas on how to speed things up, in particular make sure you're not using a Debug build. Other than that, be patient. It takes time. But my app is timing out because it's running too slowlyUnfortunately, there's a limit to how fast you're going to be able to make your app run under Purify. Given that, you should look into whether your app has a configurable timeout and see if you can disable or lengthen it for your Purify run. Report a bug to IBM / RationalIf you have a reproducible bug, let Erik Kay or Darin Fisher know about it. They can report it to IBM and often get it fixed in the next patch release. |