IntroSheriffs
The tree sheriff is responsible for keeping an overall eye on the buildbots, closing and opening the tree if nobody else does, tracking down people responsible for breakage, and backing out broken changes. Every committer is empowered and encouraged to do any of those things when needed (and every developer can do most of them), but the sheriff has overall responsibility in case somebody else is away or not paying attention.
Monitoring You should receive gatekeeper emails when the tree is being closed automatically. Please take action on these. You should also monitor the Main waterfall and the For Your Information waterfall as necessary. Never close the tree for FYI build slave failure. The failures_only=true part of the url hides the green slaves to make it more readable.
Everyone Helps (and how to get a replacement if you're out): Everyone has been signed up to be a sheriff. If you're new, and therefore should be added to the team, submit a code review to add yourself or ping Ian. You can find your time as sheriff at the "Upcoming Sheriffs" list at the end of this document. If you need to change the schedule (you're out sick or on vacation), it's your responsibility to find a replacement for your time slot. How to swapOnce you have someone lined up, you need to add them to the actual rotation so that the buildbot and other tools display the proper people as sheriffs. To do this,
1) Find the "meeting" on your calendar for when you are sheriff. Click on it so you can edit the details. (Make sure you click the event on your calendar and not the event on the Sheriff Calendar). 2) On the upper right, where Calendar indicates your response, choose "No" for "Are you coming?". Below that, where Guests are listed, click "Add Guest" and "invite" your replacement. 3) Finally, hit "Save" at the top. 4) Have your replacement repeat this process on their calendar for the days you are taking. 5) Edit the "Upcoming Sheriffs" list at the bottom of this page. TroopersTroopers know more about maintaining the buildbot masters and slaves themselves. There's no trooper schedule, but they're the people to look for when the bots need an OS update, a machine goes offline, checkouts are failing repeatedly, and so on.
Troopers: nsylvain, pamg, tc (tony), mmentovai, maruel, thestig (leiz)
Arborist Guidelines (Care of the Tree)The overriding rule is "use your best judgment", but here are some general guidelines.
Regular watering
- If a layout test becomes flaky while you're sheriff, please make sure it gets into the right test list file so the tree goes green again as soon as possible. When the tests are red, we miss additional regressions.
- To close the tree, visit http://chromium-status.appspot.com/ .
Problems and solutions
- A test went red: Tree maybe closed
- If the cause is obvious (the FooShouldWork test broke, and someone just checked in changes to foo_utils.cc), the tree can stay open. Pester the person who checked in to fix their bug. If they can't fix it within 30-60 minutes, ask them to back out the change.
- If the cause isn't obvious, close the tree. Ask everyone on the blamelist to help track it down and fix it or back it out.
- A test occasionally goes red: Tree open
- This is usually the sign of a flaky test. Similar to the above point, if the change is obvious (FooShouldWork occasionally breaks, and someone just checked in changes to foo_utils.cc) you can pester the person who checked in to fix the flakiness.
- If the change isn't obvious, keep the tree open but disable the test and file a bug. See below for details.
- Layout tests went red or got new regressions: Tree maybe closed
- Layout tests are just like other kinds of tests, except that sometimes we file and mark their new failures rather than fixing them right away. See below for details.
- One category of bot fails to build or has a swarm of test failures: Tree closed
- If all the debug, release, Vista, XP, etc. builds go red, close the tree until they're fixed and have cycled once.
- One bot went red: Tree open
- If only one buildbot is having problems (can't update, can't compile, exploding in some other way), the tree can stay open while it's fixed. We have reasonable redundant coverage now.
- An update failed: Tree maybe closed
- Try again. If it keeps failing or gives a worrisome error, contact a trooper.
- "extract build" fails: Tree open
- Try again. If it does not work the second time, contact a trooper.
- "download build" fails: Tree open
- Sometimes this fails because a tester tried to download a build just as the builder was trying to upload a new one. Try again. If it does not work the second time, contact a trooper.
- Small insects crawling on stems and leaves seem to be eating sap: Tree infested
- The tree probably has aphids. Release ladybugs nearby to eat them.
Backing out changes, the easy wayIn a temporary directory, run
See this thread for more information. Backing out changes, The slightly harder way
You can back out any change given the Subversion revision number corresponding to change to be backed out. First, make sure your working copy is up to date: gclient sync Automatic revertingTo revert a cl, run revert REV. For example, to back out change 1234, run:
revert 1234 Then commit your change by running gcl upload revertREV: gcl commit revert123 (possibly with a --force if the tree is closed)
Manual revertingNow, reverse-merge the change to be backed out. For each file modified in the change, run svn merge -c -REV FILE. Note that there's a minus sign before the revision number, indicating “reverse merge.” For example, to back out change 1234, which modified foo.cc and foo.h, run:
svn merge -c -1234 foo.cc svn merge -c -1234 foo.h
To catch all the files in a single command, you can instead start at the root of the tree and run: svn merge -c -1234 . Once you've reverse-merged the changes, you can now use gcl to create a changelist to back out r1234. When backing something out to fix a red-hot closed tree, it’s acceptable to commit the change as a TBR (To Be Reviewed). Disabling tests- Head over to www.crbug.com and file a bug. Make sure to include sample output from the flaky test.
- Find the flaky test and prefix DISABLED_ to the test case name:
- TEST(FooTest, FooShouldWork) becomes TEST(FooTest, DISABLED_FooShouldWork)
- Add a comment above the test with a link to the bug on www.crbug.com
- In the change description add the line "BUG=xyz" where xyz is the bug number
- For an excellent example take a look at the following:
Layout tests (run_webkit_tests)
- If a change makes layout tests start failing, it should be fixed promptly or backed out, just like any other failing test.
- If a change fixes a bunch of tests but makes others start failing, it can stay in, but you should mark those tests as failing so they don't keep the tree red.
- File a bug saying "The following layout tests started failing in r1234".
- Edit webkit\tools\layout_tests\test_expectations.txt. Add the failing tests. Instructions for this action are at the top of the text file.
- You should run run_webkit_tests --lint-test-files before checking in changes to test_expectations. IMPORTANT: The script fails if you don't have the layout tests checked out, so make sure your .gclient file doesn't exclude them.
- If you notice any flaky tests, file bugs for them and add them with an appropriate comment to the test list.
Suppressing other testsIf you suppress anything new, please file a bug. - valgrind: Suppression files for unit tests are in chrome/test/data/valgrind. For example, see ui_tests.gtest.txt and ui_tests.gtest_mac.txt. The suppression file for the valgrind tests is tools/valgrind/memcheck/suppressions.txt. More information on valgrind and chromium is here: http://www.chromium.org/developers/how-tos/using-valgrind .
PMO shadow-a-sheriff tasks
- Monitor tests + performance aggressively. Make sure that the sheriffs are closing the trees quickly for test/performance regressions. We should monitor the distributed reliability tests and the page cycler performance tests primarily.
- Monitor checkins a few times a day for bug fixes and other important features that needed to be integrated into the release branch and update the spreadsheet accordingly. When in doubt follow up with the developer.
- At the end of each week debrief your successor.
LKGR stopped updatingScheduleAuthoritativeThe authoritative list is the on the google calendar. The calendar widget is somewhat broken so I've removed it, the easiest thing is to sign in to Google Calendar, and where it says "Other calendars - Add a friend's calendar" add google.com_r6oah4kurfoe0i3kee16kaitq0@group.calendar.google.com. The calendar will now show up for you, and you can click an event to see who is sheriff for that day (the guest list). Yes, it would be nice if it showed the people in the event title, but then there's the issue of the event title and the guest list getting out of sync -- no easy answer. See "How to Swap" above if you need to swap with someone. Upcoming sheriffs (not authoritative):This list is meant to be a quick reference, but is not updated and is not authoritative. The authoritative list is the calendar above. The below represents what people were scheduled for, and may not be accurate if people have "traded" slots. Please see "How to swap" above if you need to swap with someone. - 2009-07-09 to 2009-07-10: ericroman huanr
- 2009-07-11 to 2009-07-14: stoyan mhm* (mhm is an external contributor, taking 5P ET ->.)
- 2009-07-15 to 2009-07-16: agl sverrir
- 2009-07-17 to 2009-07-20: jabdelmalek maruel
- 2009-07-21 to 2009-07-22: tommi estade
- 2009-07-23 to 2009-07-24: yuzo joshia
- 2009-07-27 to 2009-07-28: robertshield ajwong
- 2009-07-29 to 2009-07-30: avi, idanan
- 2009-07-31 to 2009-08-03: thomasvl aa
- 2009-08-04 to 2009-08-05: jar hamaji
- 2009-08-06 to 2009-08-07: timsteele pinkerton
- 2009-08-10 to 2009-08-11: sgk deanm
- 2009-08-12 to 2009-08-13: cmp sidchat
- 2009-08-14 to 2009-08-17: maruel ericroman
- 2009-08-18 to 2009-08-19: cevans hbono
- 2009-08-20 to 2009-08-21: brg brettw
- 2009-08-24 to 2009-08-25: ncarter yutak
- 2009-08-26 to 2009-08-27: rvargas wtc
- 2009-08-28 to 2009-08-31: ukai skylined
- 2009-09-01 to 2009-09-02: dglazkov rohitrao
- 2009-09-03 to 2009-09-04: yusukes asargent
- 2009-09-07 to 2009-09-08: tigerf arv
- 2009-09-09 to 2009-09-10: rahulk mmentovai
- 2009-09-11 to 2009-09-14: shess mhm* (mhm is an external contributor, taking 5P ET ->.)
- 2009-09-15 to 2009-09-16: tyoshino agl
- 2009-09-17 to 2009-09-18: dkegel mbelshe
- 2009-09-21 to 2009-09-22: stuartmorgan cira
- 2009-09-23 to 2009-09-24: jungshik davemoore
- 2009-09-25 to 2009-09-28: senorblanco thestig
- 2009-09-29 to 2009-09-30: jrg huanr
- 2009-10-01 to 2009-10-02: erg cpu
- 2009-10-05 to 2009-10-06: pkasting mattm
- 2009-10-07 to 2009-10-08: zork jhawkins
- 2009-10-09 to 2009-10-12: mmoss michaeln
- 2009-10-13 to 2009-10-14: xji finnur
- 2009-10-15 to 2009-10-16: munjal & TBD
- 2009-10-19 to 2009-10-20: tkent awalker
- 2009-10-21 to 2009-10-22: dumi nsylvain
- 2009-10-23 to 2009-10-26: ojan glen
- 2009-10-27 to 2009-10-28: japhet markus
- 2009-10-29 to 2009-10-30: mirandac sidchat
- 2009-11-02 to 2009-11-03: rafaelw slightlyoff
- 2009-11-04 to 2009-11-05: beng dpranke
- 2009-11-06 to 2009-11-09: willchan iyengar
- 2009-11-10 to 2009-11-11: timsteele darin
- 2009-11-12 to 2009-11-13: levin scherkus
- 2009-11-16 to 2009-11-17: idana erikkay
- 2009-11-18 to 2009-11-19: lzheng tc
- 2009-11-20 to 2009-11-23: jcampan mpcomplete
- 2009-11-24 to 2009-11-25: playmobil jparent
- 2009-11-26 to 2009-11-27: jhaas jennb
|