To manually identify the culprit CL for a Chrome crash, Clusterfuzz testcase or UMA sampling profiler performance regression, a sheriff has to read the crash stack traces, go over the complete list of CLs in the regression range, make estimation, greps, git blame, etc.
Predator is to automate the triage of crashes and performance changes. Predator recommends suspected CLs with justification reasons and correlations that can't be done easily in manual triage. Results from Predator are used by Stability sheriffs for bug owner/component assignment.
Sharu Jiang (katesonia@)
Currently, Predator is based on Git blame and heuristics from manual triage experience.
Once a regression is passed over to Predator, it will go through this simplified main analysis flow:
0. Input:
1. Extract stack traces from the raw data:
A Chrome crash could include more than one stack. For crashes found by different sanitizer tools (ASAN, MSAN, etc), some stacks are much more important than other stacks. Eg., for MSAN the importance decreases from the creation stack, to storage stack, and then to crashing stack, while for ASAN the crashing stack is the most important one.
A performance change can also happen over multiple call stacks. In the case of a reported performance change, all relevant call stacks from the profiler are aggregated to form a subtree contained within the overall call tree for a process. The root of this subtree is usually the focal point from which the regression or improvement originates.
When Predator extracts the expected stacktraces, Predator extracts file path and line number of each frame and also the frame index in the stack.
2. Detect dependency regression ranges:
Quite often, Chrome regressions happen in dependencies like v8, pdfium, skia, etc.
With a regression range, Predator will detect dependency regression ranges so that CLs in dependencies will also be checked.
3. Pull changes logs from Gittiles:
Predator will check the extracted stacktraces to determine which dependencies change logs should be pulled, and pull the change logs from Gitiles repositories accordingly.
4. Heuristics-based Analysis:
The heuristics for regression analysis are quite complicated, but below are important ones: