Update Sheriff docs under chromium/src/docs
* Updated Trunk Sheriffing, Branch Sheriffing, Perf Regression Sheriffing and Perf Bot Sheriffing * Add links to internal documentation under go/chrome-sheriffing (consolidated internal docs location) Change-Id: I074a8e84d8b9d1535f5999eb5f2ff0e055713f71 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3068176 Auto-Submit: Eric Foo <efoo@chromium.org> Commit-Queue: Dirk Pranke <dpranke@google.com> Reviewed-by: Dirk Pranke <dpranke@google.com> Cr-Commit-Position: refs/heads/master@{#908006}
This commit is contained in:

committed by
Chromium LUCI CQ

parent
da311bbf29
commit
da089b50f9
docs
@ -1,145 +1,18 @@
|
||||
# Chromium Branch Sheriffing
|
||||
|
||||
This document describes how to be a Chromium *branch* sheriff and how sheriffing
|
||||
on a branch differs from sheriffing on trunk. For trunk sheriffing guidance, see
|
||||
[//docs/sheriff.md][sheriff-md].
|
||||
|
||||
[TOC]
|
||||
|
||||
## Philosophy
|
||||
The Chrome release branch sheriff provides coverage for release branches
|
||||
(stable and beta) under Pacific timezone shifts.
|
||||
|
||||
The goals of a branch sheriff are quite similar to those of a trunk sheriff.
|
||||
Branch sheriffs need to ensure that:
|
||||
|
||||
1. **Compile failures get fixed**, because compile failures on branches block
|
||||
all tests (both automated and manual) and consequently reduce our confidence
|
||||
in the quality of what we're shipping, possibly to the point of blocking
|
||||
releases.
|
||||
2. **Consistent test failures get repaired**, because they similarly reduce
|
||||
our confidence in the quality of our code.
|
||||
1. **Compile failures get fixed**, because compile failures on branches block
|
||||
all tests (both automated and manual) and consequently reduce our confidence
|
||||
in the quality of what we're shipping, possibly to the point of blocking
|
||||
releases.
|
||||
2. **Consistent test failures get repaired**, because they similarly reduce our
|
||||
confidence in the quality of our code.
|
||||
|
||||
**Communication** is important for sheriffs in general, but it's particularly
|
||||
important for branch sheriffs. Over the course of your shift, you may need to
|
||||
coordinate with trunk sheriffs, troopers, release TPMs, and others -- don't
|
||||
hesitate to do so, particularly if you have questions.
|
||||
|
||||
Points of contact (i.e. platform-specific sheriffs) can be found
|
||||
[here](http://goto.google.com/chrome-branch-sheriffing#points-of-contact).
|
||||
|
||||
## Processes
|
||||
|
||||
In general, you'll want to follow the same processes outlined in
|
||||
[//docs/sheriff.md][sheriff-md]. There are some differences, though.
|
||||
|
||||
### Checkout
|
||||
|
||||
You'll need to ensure that your checkout is configured to check out the branch
|
||||
heads. You can do so by running
|
||||
|
||||
```
|
||||
src $ gclient sync --with_branch_heads
|
||||
```
|
||||
|
||||
> This only needs to be done once, though running it more than once won't hurt.
|
||||
|
||||
You may also need to run:
|
||||
|
||||
```
|
||||
src $ git fetch
|
||||
```
|
||||
|
||||
Once you've done that, you'll be able to check out branches:
|
||||
|
||||
```
|
||||
src $ git checkout branch-heads/$BRANCH_NUMBER # e.g. branch-heads/4044 for M81
|
||||
src $ gclient sync
|
||||
```
|
||||
|
||||
To determine the appropriate branch number, you can either use
|
||||
[chromiumdash](#chromiumdash) or check [milestone.json][milestone-json]
|
||||
directly.
|
||||
|
||||
### Findit
|
||||
|
||||
As FindIt is not available on branches, one way to try to find culprits is using
|
||||
`git bisect` locally and upload changes to a gerrit CL and run the needed
|
||||
trybots to check. This is especially useful when the errors are not reproducible
|
||||
on your local builds or you don't have the required hardware to build the failed
|
||||
tests.
|
||||
|
||||
### Flaky tests
|
||||
|
||||
Flaky tests that are disabled on trunk should also be disabled on any branches
|
||||
with frequent failures of that test. If a trunk CL lands with no change other
|
||||
than to disable one or more tests ([example](https://crrev.com/c/2507299)) and
|
||||
it has an associated bug and the release manager is cc'd on the bug, you can and
|
||||
should cherrypick it to the affected branch without requesting merge approval.
|
||||
|
||||
On the other hand, if you believe that a flake was introduced by a cherry-pick
|
||||
to the branch in question and is not flaky on trunk, you will need to create a
|
||||
new CL to disable it only on the branch and go through the usual merge request
|
||||
process.
|
||||
|
||||
Note: there is little value in merging changes to the stable release
|
||||
branch when the next milestone's stable release is less than a week away
|
||||
(since there are usually no planned stable respins at that point).
|
||||
You can find release dates on [chromiumdash][chromiumdash-schedule].
|
||||
|
||||
### Landing changes
|
||||
|
||||
When you need to land a change to a branch, you'll need to go through [the same
|
||||
merge approval process](./process/merge_request.md) as other cherry-picks (see
|
||||
exception for flaky tests above). You should feel free to ping the relevant
|
||||
release TPM as listed on [chromiumdash][chromiumdash-schedule].
|
||||
|
||||
## Tools
|
||||
|
||||
### Sheriff-o-Matic
|
||||
|
||||
Use the [branch SoM console][sheriff-o-matic] rather than the main chromium
|
||||
console.
|
||||
|
||||
### Consoles
|
||||
|
||||
Use the [beta][main-beta] and [stable][main-stable] branch consoles rather than
|
||||
the main console. A new console is created for each milestone. They are named
|
||||
"Chromium M## Console" and can be found under the
|
||||
[Chromium Project](https://ci.chromium.org/p/chromium).
|
||||
|
||||
### Monorail issues (crbug)
|
||||
|
||||
Refer and use the
|
||||
[Sheriff-Chrome-Release label](https://bugs.chromium.org/p/chromium/issues/list?q=label%3ASheriff-Chrome-Release)
|
||||
to find and tag issues that are of importance to Branch sheriffs.
|
||||
|
||||
### Chromiumdash
|
||||
|
||||
[chromiumdash][chromiumdash] can help you determine the branch number for a
|
||||
particular milestone or channel, along with a host of other useful information:
|
||||
|
||||
* [Branches][chromiumdash-branches] lists the branches for each milestone.
|
||||
* [Releases][chromiumdash-releases] lists the builds currently shipping to
|
||||
each channel, which can help map from channel to milestone or to branch.
|
||||
* [Schedule][chromiumdash-schedule] lists the relevant dates for each
|
||||
milestone and includes the release TPMs responsible for each milestone by
|
||||
platform.
|
||||
|
||||
### Rotation
|
||||
|
||||
The current branch sheriff is listed [here][rotation-home]. The configuration
|
||||
and source of truth for the schedule lives [here][rotation-config]. To swap,
|
||||
simply send a CL changing schedule at the bottom of the file.
|
||||
You can also use [Oncall Swapper](https://oncallswapper.corp.google.com/)
|
||||
to find the swap and submit the CL for you.
|
||||
|
||||
[chromiumdash]: https://chromiumdash.appspot.com
|
||||
[chromiumdash-branches]: https://chromiumdash.appspot.com/branches
|
||||
[chromiumdash-releases]: https://chromiumdash.appspot.com/releases
|
||||
[chromiumdash-schedule]: https://chromiumdash.appspot.com/schedule
|
||||
[main-beta]: https://ci.chromium.org/p/chromium/g/main-m81/console
|
||||
[main-stable]: https://ci.chromium.org/p/chromium/g/main-m80/console
|
||||
[milestone-json]: https://goto.google.com/chrome-milestone-json
|
||||
[rotation-home]: https://goto.google.com/chrome-branch-sheriff-amer-west
|
||||
[rotation-config]: https://goto.google.com/chrome-branch-sheriff-amer-west-config
|
||||
[sheriff-md]: /docs/sheriff.md
|
||||
[sheriff-o-matic]: https://sheriff-o-matic.appspot.com/chrome_browser_release
|
||||
For more information on Chromium Branch Sheriffs, including How Tos, Swapping
|
||||
Shifts and rotation updates, please see [Chromium
|
||||
Branch Sheriffing](http://goto.google.com/chrome-branch-sheriffing)
|
||||
|
@ -1,9 +1,5 @@
|
||||
# Chromium Sheriffing
|
||||
|
||||
Author: ellyjones@
|
||||
|
||||
## Sheriffing Philosophy
|
||||
|
||||
Sheriffs have one overarching role: to ensure that the Chromium build
|
||||
infrastructure is doing its job of helping developers deliver good software.
|
||||
Every other sheriff responsibility flows from that one. In priority order,
|
||||
@ -29,6 +25,9 @@ necessary authority to fulfill them. In particular, you have the authority to:
|
||||
|
||||
TBRs were removed in Q1 2021.
|
||||
|
||||
For more information on Chromium Trunk Sheriffs, including How Tos, Swapping
|
||||
Shifts and rotation updates, please see [Chromium Trunk Sheriffing](http://goto.google.com/chrome-trunk-sheriffing)
|
||||
|
||||
## How to be a Sheriff
|
||||
|
||||
To be a sheriff, you must be both a Chromium committer and a Google employee.
|
||||
|
@ -1,5 +1,7 @@
|
||||
# How to access and navigate test logs
|
||||
|
||||
**Important**: When making changes to this document, also update duplicate files under the [internal docs](http://goto.google.com/perf-bot-health-sheriffs).
|
||||
|
||||
When trying to understand a failure, it can be useful to inspect the test logs where the failure occurred.
|
||||
|
||||
[TOC]
|
||||
|
@ -1,5 +1,7 @@
|
||||
# How to address a new alert with the same root cause as an existing alert
|
||||
|
||||
**Important**: When making changes to this document, also update duplicate files under the [internal docs](http://goto.google.com/perf-bot-health-sheriffs).
|
||||
|
||||
It's common when large problems arise for multiple alerts to fire due to the same underlying problem. Sheriff-o-matic does its best to automatically group these problems into a single alert, but sometimes it's unable to and we have to group the alerts together manually. This is important because it helps future sheriffs see at a glance the number of distinct problems.
|
||||
|
||||
Unfortunately, there's no way to distinguish these duplicate alerts from new alerts without knowing the contents of those other alerts. If you're unsure about two particular alerts, don't hesitate to ask for help [on chat](https://hangouts.google.com/group/2GmiXjz55R2ixTXi1).
|
||||
|
@ -1,5 +1,7 @@
|
||||
# How to disable a failing test/story on the perf waterfall
|
||||
|
||||
**Important**: When making changes to this document, also update duplicate files under the [internal docs](http://goto.google.com/perf-bot-health-sheriffs).
|
||||
|
||||
To disable a failing test/story, the first step is to figure
|
||||
out if the failing thing is gtest or Telemetry, then you can
|
||||
follow the below directions to disable the failing test/story.
|
||||
|
@ -1,5 +1,7 @@
|
||||
# How to follow up on an alert
|
||||
|
||||
**Important**: When making changes to this document, also update duplicate files under the [internal docs](http://goto.google.com/perf-bot-health-sheriffs).
|
||||
|
||||
[TOC]
|
||||
|
||||
Skim the bug to understand where the last sheriff left things and where you should pick up.
|
||||
|
@ -1,5 +1,7 @@
|
||||
# How to handle an alert for a new problem
|
||||
|
||||
**Important**: When making changes to this document, also update duplicate files under the [internal docs](http://goto.google.com/perf-bot-health-sheriffs).
|
||||
|
||||
**Warning: this is the hardest part of being a sheriff.**
|
||||
|
||||
Each bug may take 10 minutes to an hour to address, but there are usually a manageable number of new bugs per shift (5 on a good shift, 15 on a bad one).
|
||||
|
@ -1,5 +1,7 @@
|
||||
# How to launch a functional bisect and interpret its results
|
||||
|
||||
**Important**: When making changes to this document, also update duplicate files under the [internal docs](http://goto.google.com/perf-bot-health-sheriffs).
|
||||
|
||||
A functional bisect determines the revision at which a particular benchmark or story started failing more often. It does this by doing a binary search between a known good and known bad revision, running the test multiple times at each potential revision until it narrows down the culprit to a single revision.
|
||||
|
||||
[TOC]
|
||||
|
@ -1,5 +1,7 @@
|
||||
# How to snooze an alert
|
||||
|
||||
**Important**: When making changes to this document, also update duplicate files under the [internal docs](http://goto.google.com/perf-bot-health-sheriffs).
|
||||
|
||||
After addressing an alert, the next step is to snooze it.
|
||||
|
||||
Snoozing an alert hides the alert, moving it to a collapsed section at the bottom of the "Consistent alerts" section until the specified time has expired. This acts as a signal to yourself and other sheriffs that no further action is necessarily until the alert becomes unsnoozed.
|
||||
|
@ -1,18 +1,14 @@
|
||||
# Perf bot health sheriff rotation
|
||||
# Perf Bot Health Sheriff
|
||||
|
||||
## Warning
|
||||
The goal of the perf bot health sheriff rotation is to ensure that the benchmarks running on our perf waterfall continue to produce data and catch regressions quickly. This is also known as "keeping the bots green" and is primarily achieved by triaging incoming alerts. Note that a different rotation [Perf Regressions Sheriffs](../perf_regression_sheriffing.md) is focused on performance.
|
||||
|
||||
**Note that Sheriff-O-Matic currently doesn't work for the perf waterfall
|
||||
[crbug.com/984159](https://crbug.com/984159).
|
||||
Please use [Milo chrome.perf
|
||||
console](https://ci.chromium.org/p/chrome/g/chrome.perf/console) instead.**
|
||||
|
||||
## Goal
|
||||
|
||||
The goal of the perf bot health sheriff rotation is to ensure that the benchmarks running on our perf waterfall continue to produce data and catch regressions quickly. This is also known as "keeping the bots green" and is primarily achieved by triaging incoming alerts.
|
||||
For more information on Perf Bot Health Sheriffing, who's on rotation, how to handle specific
|
||||
tasks, and swap shifts, please see [Perf Bot Health
|
||||
Sheriffs](http://goto.google.com/perf-bot-health-sheriffs)
|
||||
|
||||
## Quick links
|
||||
|
||||
* [Perf Bot Health Sheriffing Overview and How-To](http://goto.google.com/perf-bot-health-sheriffs)
|
||||
* [How to determine what story is failing](https://chromium.googlesource.com/chromium/src/+/main/docs/speed/bot_health_sheriffing/what_test_is_failing.md)
|
||||
* [How to disable a story](https://chromium.googlesource.com/chromium/src/+/main/docs/speed/bot_health_sheriffing/how_to_disable_a_story.md)
|
||||
* [How to launch a functional bisect](https://chromium.googlesource.com/chromium/src/+/main/docs/speed/bot_health_sheriffing/how_to_launch_a_functional_bisect.md)
|
||||
@ -21,102 +17,3 @@ The goal of the perf bot health sheriff rotation is to ensure that the benchmark
|
||||
* [How to handle a new problem](https://chromium.googlesource.com/chromium/src/+/main/docs/speed/bot_health_sheriffing/how_to_handle_a_new_problem.md)
|
||||
* [How to follow up on an alert](https://chromium.googlesource.com/chromium/src/+/main/docs/speed/bot_health_sheriffing/how_to_follow_up_on_an_alert.md)
|
||||
* [How to address duplicate alerts](https://chromium.googlesource.com/chromium/src/+/main/docs/speed/bot_health_sheriffing/how_to_address_duplicate_alerts.md)
|
||||
* [Glossary](https://chromium.googlesource.com/chromium/src/+/main/docs/speed/bot_health_sheriffing/glossary.md)
|
||||
|
||||
[TOC]
|
||||
|
||||
## Vocabulary
|
||||
|
||||
Definitions of various bot health related vocabulary can be found in our [glossary](https://chromium.googlesource.com/chromium/src/+/main/docs/speed/bot_health_sheriffing/glossary.md).
|
||||
|
||||
## High-level responsibilities
|
||||
|
||||
The sheriff's role is to work through the list of failures, fixing the easiest ones and routing the rest to the correct owners. This mostly requires filing bugs, disabling benchmarks and stories, launching bisects, and reverting any CLs that are obviously responsible for breakages.
|
||||
|
||||
Additionally, the sheriff should watch the [catapult
|
||||
roll](https://autoroll.skia.org/r/catapult-autoroll), which should
|
||||
automatically TBR the sheriff. If the catapult roll fails, the sheriff should
|
||||
investigate and revert suspect changelists.
|
||||
|
||||
Near the end of their shift, sheriffs should also inspect[this dashboard](https://dashboards.corp.google.com/_e3cbeb60_d250_4e67_8795_56cd9af8a303) for the time covered during their shift, and do a first-pass analysis of any anomalies (e.g. jobs taking 6 hours when they normally take 1.5).
|
||||
|
||||
The sheriff should *not* feel responsible for investigating hard problems. The volume of incoming alerts makes this infeasible. Instead, they should delegate deep investigations to the right owners. As a rule of thumb, a trained sheriff should expect to spend 10-20 minutes per alert and should never be spending more than an hour per alert.
|
||||
|
||||
## Workflow
|
||||
|
||||
~~Incoming failures are shown in [Sheriff-o-matic](https://sheriff-o-matic.appspot.com/chromium.perf), which acts as a task management system for bot health sheriffs. Failures are divided into three groups on the dashboard:~~
|
||||
|
||||
* ~~**Infra failures** show general infrastructure problems that are affecting benchmarks. Besides surfacing in Sheriff-o-matic, we also need to check for down bots in the lame duck pool. Please file a ticket for any bots you see in [this list](https://chrome-swarming.appspot.com/botlist?c=id&c=os&c=task&c=status&c=os&c=task&c=status&c=pool&f=status%3Adead&f=pool%3Achrome.tests.perf&l=100&q=pool%3Achrome.tests.perf&s=id%3Aasc) or [this list for webview](https://chrome-swarming.appspot.com/botlist?c=id&c=os&c=task&c=status&c=os&c=task&c=status&c=pool&f=status%3Adead&f=pool%3Achrome.tests.perf-webview&l=100&q=pool%3Achrome.tests.perf&s=id%3Aasc) as they will not show up in Sheriff-o-matic.~~
|
||||
|
||||
* ~~**Consistent failures** show benchmarks that have been failing for a while.~~
|
||||
|
||||
* ~~**New failures** show benchmarks that benchmarks that have recently started failing.~~
|
||||
|
||||
~~Of these three groups, the sheriff should only be concerned with **infra failures** and **consistent failures.** New failures are too likely to be one-off flakes to warrant investigation.~~
|
||||
|
||||
~~The high-level workflow is to start at the top of the list of the list of failures and address one alert at a time. The alerts are ordered roughly in order of their impact.~~
|
||||
|
||||
~~As the sheriff addresses alerts, the number of alerts will generally decrease as problems with the same cause get grouped together and failures get fixed. Addressed alerts will also move to the bottom of the list. Ideally, Sheriff-o-matic should reflect the work you've done so that a new sheriff could potentially take over at any time and pick up at the top of the list.~~
|
||||
|
||||
**Note that Sheriff-O-Matic currently doesn't work for the perf waterfall
|
||||
[crbug.com/984159](https://crbug.com/984159).
|
||||
Please use [Milo chrome.perf
|
||||
console](https://ci.chromium.org/p/chrome/g/chrome.perf/console) instead.**
|
||||
|
||||
## How to address each alert
|
||||
|
||||
Alerts can be addressed by answering the following questions:
|
||||
|
||||
### Has a previous sheriff already addressed this alert?
|
||||
|
||||
This category of alert should have a bug already linked with it. This link can be found next to the alert.
|
||||
|
||||

|
||||
|
||||
Instructions can be found [here](https://chromium.googlesource.com/chromium/src/+/main/docs/speed/bot_health_sheriffing/how_to_follow_up_on_an_alert.md) on how to follow up with an existing alert.
|
||||
|
||||
### Is this a new alert caused by the same root cause as an already-triaged alert?
|
||||
|
||||
This category of alert won't have a bug linked with it yet. However, a bug *does* exist for the issue: it may be linked to another alert, but can otherwise be found [here](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=label:Performance-Sheriff-BotHealth&sort=pri&colspec=ID%20Pri%20M%20Stars%20ReleaseBlock%20Component%20Status%20Owner%20Summary%20OS%20Modified) under the Performance-Sheriff-BotHealth label in monorail. For example:
|
||||
|
||||

|
||||
|
||||
and
|
||||
|
||||

|
||||
|
||||
are both in the list of current of alerts but represent the same failure.
|
||||
|
||||
It can sometimes be tricky to differentiate between these alerts and ones caused by completely new problems, but sheriffs can always treat an alert as new and merge it with another later.
|
||||
|
||||
Instructions can be found [here](https://chromium.googlesource.com/chromium/src/+/main/docs/speed/bot_health_sheriffing/how_to_address_duplicate_alerts.md) on how to handle a duplicate alert.
|
||||
|
||||
### Is this a new alert caused by a new problem?
|
||||
|
||||
This category of alert doesn't yet have a bug associated with it. It's the most common category and requires the most expertise to handle.
|
||||
|
||||
Instructions can be found [here](https://chromium.googlesource.com/chromium/src/+/main/docs/speed/bot_health_sheriffing/how_to_handle_a_new_problem.md) on how to handle an alert for a new problem.
|
||||
|
||||
## After your shift is over
|
||||
|
||||
Your only responsibility after your shift concludes is to follow up with any bugs that would no longer appear on the dashboard (i.e. the failure has stopped) but still need correct routing.
|
||||
|
||||
For example, if you disabled a story and snoozed an alert during your shift, you should ensure that the bug is assigned to the benchmark's owner before relinquishing responsibility for the bug.
|
||||
|
||||
## Frequently asked questions
|
||||
|
||||
### Why do the benchmarks break so often?
|
||||
|
||||
The bots runs Chrome benchmarks that are complicated integration tests of Chrome. Developers frequently submit code that breaks some part of Chrome and one of our integration tests (hopefully) tests that bit of code, resulting in a broken benchmark. In some sense, frequent breakages indicate that the benchmarks are working.
|
||||
|
||||
Many breakages probably *aren't* good signs, though. If you have ideas on how to reduce the number of breakages or the work required to handle a breakage, submit your idea to the Chrome benchmarking group!
|
||||
|
||||
### Do I have to use Sheriff-o-matic?
|
||||
|
||||
Yes! Sheriff-o-matic allows us to smoothly hand off responsibility between sheriffs and allows us to standardize sheriffing.
|
||||
|
||||
If you find a problem with Sheriff-o-matic or have a feature request, file a bug [here](https://bugs.chromium.org/p/chromium/issues/entry?template=Build%20Infrastructure&components=Infra%3ESheriffing%3ESheriffOMatic&labels=Pri-2,Infra-DX&cc=seanmccullough@chromium.org,martiniss@chromium.org,zhangtiff@chromium.org&comment=Problem+with+Sheriff-o-Matic). The team is usually very responsive and, because of their work, the tool is getting better every day.
|
||||
|
||||
### How can I tell if I've done a good job?
|
||||
|
||||
It can be hard to tell. Generally, a good goal is to try and have fewer alerts when your shift ends than when it began. Sometimes that isn't possible, though.
|
||||
|
@ -1,5 +1,7 @@
|
||||
# How to determine what story is failing
|
||||
|
||||
**Important**: When making changes to this document, also update duplicate files under the [internal docs](http://goto.google.com/perf-bot-health-sheriffs).
|
||||
|
||||
The first step in addressing a test failure is to identify what stories are failing.
|
||||
|
||||
The easiest way to identify these is to use the [Flakiness dashboard](https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=blink_perf.layout), which is a high-level dashboard showing test passes and failures. (Sheriff-o-matic tries to automatically identify the failing stories, but is often incorrect and therefore can't be trusted.) Open up the flakiness dashboard and select the benchmark and platform in question (pulled from the SOM alert) from the "Test type" and "Builder" dropdowns. You should see a view like this:
|
||||
|
@ -1,4 +1,4 @@
|
||||
# Perf Regression Sheriffing (go/perfregression-sheriff)
|
||||
# Perf Regression Sheriffing
|
||||
|
||||
The perf regression sheriff tracks performance regressions in Chrome's
|
||||
continuous integration tests. Note that a [different
|
||||
@ -6,95 +6,12 @@ rotation](perf_bot_sheriffing.md) has been created to ensure the builds and
|
||||
tests stay green, so the perf regression sheriff role is now entirely focused
|
||||
on performance.
|
||||
|
||||
**[Rotation calendar](https://calendar.google.com/calendar/embed?src=google.com_2fpmo740pd1unrui9d7cgpbg2k%40group.calendar.google.com)**
|
||||
Key responsibilities include:
|
||||
|
||||
## Key Responsibilities
|
||||
* Addressing bugs that need attention
|
||||
* Follow up on Performance Regressions
|
||||
* Give Feedback on our Infrastructure
|
||||
|
||||
* [Address bugs needing attention](#Address-bugs-needing-attention)
|
||||
|
||||
* [Follow up on Performance Regressions](#Follow-up-on-Performance-Regressions)
|
||||
|
||||
* [Give Feedback on our Infrastructure](#Give-Feedback-on-our-Infrastructure)
|
||||
|
||||
## Address bugs needing attention
|
||||
|
||||
NOTE: Ensure that you're signed into Monorail.
|
||||
|
||||
Use [this Monorail query](https://bugs.chromium.org/p/chromium/issues/list?sort=modified&q=label%3AChromeperf-Sheriff-NeedsAttention%2CChromeperf-Auto-NeedsAttention%20-has%3Aowner&can=2)
|
||||
to find automatically triaged issues which need attention.
|
||||
|
||||
NOTE: If the list of issues that need attention is empty, please jump ahead to
|
||||
[Follow up on Performance Regressions](#Follow-up-on-Performance-Regressions).
|
||||
|
||||
Issues in the list will include automatically filed and bisected regressions
|
||||
that are supported by the Chromium Perf Sheriff rotation. For each of the
|
||||
issues:
|
||||
|
||||
1. Determine the cause of the failure:
|
||||
|
||||
* If it's Pinpoint failing to find a culprit, consider re-running the
|
||||
failing Pinpoint job.
|
||||
|
||||
* If it's the Chromeperf Dashboard failing to start a Pinpoint bisection,
|
||||
consider running a bisection from the grouped alerts. The issue
|
||||
description should have a link to the group of anomalies associated with
|
||||
the issue.
|
||||
|
||||
* If this was a manual escalation (e.g. a suspected culprit author put the
|
||||
`Chromeperf-Sheriff-NeedsAttention` label to seek help) use the tools at
|
||||
your disposal, like:
|
||||
|
||||
* Retry the most recent Pinpoint job, potentially changing the parameters.
|
||||
|
||||
* Inspect the results of the Pinpoint job associated with the issues and
|
||||
decide that this could be noise.
|
||||
|
||||
* In cases where it's unclear what next should be done, escalate the issue
|
||||
to the Chrome Speed Tooling team by adding the `Speed>Bisection` component
|
||||
and leaving the issue `Untriaged` or `Unconfirmed`.
|
||||
|
||||
2. Remove the `Chromeperf-Sheriff-NeedsAttention` or
|
||||
`Chromeperf-Auto-NeedsAttention` label once you've acted on an issue.
|
||||
|
||||
**For alerts related to `resource_sizes`:** Refer to
|
||||
[apk_size_regressions.md](apk_size_regressions.md).
|
||||
|
||||
## Follow up on Performance Regressions
|
||||
|
||||
Please spend any spare time driving down bugs from the [regression
|
||||
backlog](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=Performance%3DSheriff+Type%3ABug+modified-before%3Atoday-6&sort=-modified).
|
||||
Treat these bugs as you would your own -- investigate the regressions, find out
|
||||
what the next step should be, and then move the bug along. Some possible next steps
|
||||
and questions to answer are:
|
||||
|
||||
* Should the bug be closed?
|
||||
* Are there questions that need to be answered?
|
||||
* Are there people that should be added to the CC list?
|
||||
* Is the correct owner assigned?
|
||||
|
||||
When a bug does need to be pinged, rather than adding a generic "ping", it's
|
||||
much much more effective to include the username and action item.
|
||||
|
||||
You should aim to end your shift with an empty backlog, but it's important to
|
||||
still advance each bug in a meaningful way.
|
||||
|
||||
After your shift, please try to follow up on the bugs you filed weekly. Kick off
|
||||
new bisects if the previous ones failed, and if the bisect picks a likely
|
||||
culprit follow up to ensure the CL author addresses the problem. If you are
|
||||
certain that a specific CL caused a performance regression, and the author does
|
||||
not have an immediate plan to address the problem, please revert the CL.
|
||||
|
||||
## Give Feedback on our Infrastructure
|
||||
|
||||
Perf regression sheriffs have their eyes on the perf dashboard and bisects
|
||||
more than anyone else, and their feedback is invaluable for making sure these
|
||||
tools are accurate and improving them. Please file bugs and feature requests
|
||||
as you see them:
|
||||
|
||||
* **Perf Dashboard**: Please use the red "Report Issue" link in the navbar.
|
||||
* **Pinpoint**: If Pinpoint is identifying the wrong CL as culprit
|
||||
or missing a clear culprit, or not reproducing what appears to be a clear
|
||||
regression, please file an issue in crbug with the `Speed>Bisection`
|
||||
component.
|
||||
* **Noisy Tests**: Please file a bug in crbug with component `Speed>Benchmarks`
|
||||
and [cc the owner](http://go/perf-owners).
|
||||
For more information on how these responsibilities, how to swap shifts and more,
|
||||
please see [Perf Regression
|
||||
Sheriffs](http://goto.google.com/chrome-perf-regression-sheriffing)
|
||||
|
Reference in New Issue
Block a user