Overview

Try-job-based Approach

The try job approach is to identify an exact CL with concrete evidence of its causing a failure by running strategic recipes that perform actual builds on the revisions in regression range. Since try jobs must perform compile and/or run tests on multiple revisions in the range of a build failure, by nature take time but produce concrete results.

For compile failures, try jobs are triggered once a build is completed and against the first red build. In a series of red builds, the culprit was likely introduced in the earliest one. For test failures, the earli›est build in a series of failures for newly failing tests will have a try job triggered, even if the build is already red for other reasons.

Compile Flow

The moment a build failure on the main waterfall is detected, Findit begins its analysis. The regression range is narrowed down to the first red build since last known green, and a try job is triggered to run on the commits in that range. Because compile is an expensive task that needs to be repeated on multiple revisions, Findit employs different strategies to check as few revisions as possible depending on what information is available.

Bisect

One of the strategies Findit is capable of is bisecting revisions to find the culprit. Bisect is only eligible when compile targets are passed (extracted from ninja output) and is one of the optimization strategies for only running a subset of the revisions to come up with a culprit.

Heuristic Guidance

Since heuristic analysis results are available quickly and are often useful, suspected CLs from heuristic analysis can be used to guide the try job. In the best case, only 2 revisions need to be tested and the try job simply proves the heuristic results as correct. Should the heuristic result be incorrect, the try job will continue a linear search or bisect the revision range for the culprit CL.

Test Flow

Because tests can be flaky, an additional step is required to avoid running meaningless try jobs to find culprits. Steps that run on swarming will have tasks triggered first to rerun the tests repeatedly (currently 10x) to determine flakiness. If all iterations in the swarming rerun fail as expected, then the failure is reliable and a try job can safely be triggered. Non-swarming steps are expensive to determine whether or not the failures are reliable, so try jobs are also skipped.

In case the swarming task determines the tests are flaky, no try job is triggered to avoid false positives. Findit only supports determining whether or not those tests are flaky, but does not currently support identifying the culprit for which CL introduced or triggered the flakiness. Like compile, when try jobs are triggered heuristic guidance is also considered to reduce the number of revisions needed to be tested.

Heuristic-based Analysis

With heuristics from manual triage experience, Findit uses Git blame to correlate a CL with errors in log.

For a buildbot failure, Findit goes through this simplified main analysis flow:

0. Input of an analysis request:

From builder_alerts: url to failed build cycle, and failed step names.
From UI page: url to failed build cycle.

1. Detect failed steps in the given buildbot failure, and from which build cycles they started to fail.

This is to discover the failed steps and to determine the regression ranges.

2. Extract signals from error messages:

Significant signals for a failure include file name and line numbers, test name, function name, etc. These are what a build sheriff usually check in the manual triage.

As there are various failure types and output formats, different signal extractors are implemented for different steps: compile, sizes, Android Instrumentation Test, check_perms, etc. A general extractor is also implemented as the fallback, although it might sometimes extract false signals.

3. Pull change logs from Gitiles:

Based on regression range, Findit pulls change logs for each CL in the range.

4. Detect DEPS rolls:

Changes to DEPS might include dependency rolls, like v8, pdfium, skia, etc. Thus, DEPS rolls might lead to compile or test failures and CLs within the dependency roll need checking too.

5. Heuristics-based Analysis:

Based on heuristics (i.e. the file-based heuristics below) and Git blame, Findit correlates extracted signals with CLs in the regression ranges (including CLs in a dependency roll) to infer a suspect.

AutoRevert for compile failures

When a culprit is identified, Findit now automatically creates and commits a revert. Notifications are also posted to Sheriff-o-Matic. If a revert has already been created by a sheriff, Findit instead posts a confirmation to the code-review of the culprit.