
Adds documentation around the usage of the Slow expectation for GPU tests that use a heartbeat mechanism. Bug: 1354797 Change-Id: I7682d22aee258c09e2da9036403b23b4d097ea9d Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4355585 Commit-Queue: Yuly Novikov <ynovikov@chromium.org> Reviewed-by: Yuly Novikov <ynovikov@chromium.org> Auto-Submit: Brian Sheedy <bsheedy@chromium.org> Cr-Commit-Position: refs/heads/main@{#1120559}
299 lines
12 KiB
Markdown
299 lines
12 KiB
Markdown
# GPU Expectation Files
|
|
|
|
This file goes over details of the expectation files which are critical for
|
|
ensuring that GPU tests only run where they should and that flakes are
|
|
suppressed to avoid red bots.
|
|
|
|
[TOC]
|
|
|
|
## Overview
|
|
|
|
The GPU Telemetry-based integration tests (tests that use the
|
|
`telemetry_gpu_integration_test` target)
|
|
[utilize expectation files](gpu_expectations) in order to define when certain
|
|
tests should not be run or are expected to fail. The core expectation format is
|
|
defined by [typ](typ_expectations), although there are some Chromium-specific
|
|
extensions as well. Each expectation consists of the following fields, separated
|
|
by a space:
|
|
|
|
1. An optional bug identifier. While optional, it is heavily encouraged that GPU
|
|
expectations have this field filled.
|
|
1. A set of tags that the expectation applies to. This is technically optional,
|
|
as omitting tags will cause the expectation to be applied everywhere, but
|
|
there are very few, if any, instances where tags will not be specified for
|
|
GPU expectations.
|
|
1. The name of the test that the expectation applies to. A single wildcard (`*`)
|
|
character is allowed at the end of the string, but use of a wildcard anywhere
|
|
but the end of the string is an error.
|
|
1. A set of expected results for the test. This technically supports multiple
|
|
values, but for GPU purposes, it will always be a single value.
|
|
|
|
Additionally, comments are supported, which begin with `#`.
|
|
|
|
Thus, a sample expectation entry might look like:
|
|
|
|
```
|
|
# Flakes regularly but infrequently.
|
|
crbug.com/1234 [ win amd ] foo/test [ RetryOnFailure ]
|
|
```
|
|
|
|
[gpu_expectations]: https://chromium.googlesource.com/chromium/src/+/main/content/test/gpu/gpu_tests/test_expectations
|
|
[typ_expectations]: https://chromium.googlesource.com/catapult.git/+/main/third_party/typ/typ/expectations_parser.py
|
|
|
|
## Core Format
|
|
|
|
The following are further details on each of the parts of an expectation that
|
|
are part of the core expectation file format.
|
|
|
|
### Bug Identifier
|
|
|
|
An optional string(s) pointing to the bug(s) tracking the reason why the
|
|
expectation exists. For GPU uses, this is usually a single bug, but multiple
|
|
space-separated strings are supported.
|
|
|
|
The format of the string is enforced by [these](bug_regexes) regular
|
|
expressions, so CLs that introduce malformed bugs will not be submittable.
|
|
|
|
[bug_regexes]: https://chromium.googlesource.com/chromium/src/+/e26d89a52627f8910b79a95668dfa48e5fe8fa06/content/test/gpu/gpu_tests/test_expectations_unittest.py#66
|
|
|
|
### Tags
|
|
|
|
One or more tags are used to specify which configuration(s) an expectation
|
|
applies to. For GPU tests, this is often things such as the OS, the GPU vendor,
|
|
or the specific GPU model.
|
|
|
|
Tag sets are defined at the top of the expectation file using `# tags:`
|
|
comments. Each comment defines a different set of mutually exclusive tags, e.g.
|
|
all of the OS tags are in a single set. An expectation is only allowed to use
|
|
one tag from each set, but can use tags from an arbitrary number of sets. For
|
|
example, `[ win win10 ]` would be invalid since both are OS tags, but
|
|
`[ win amd release ]` would be valid since there is one tag each from the OS,
|
|
GPU, and browser type tag sets.
|
|
|
|
Additionally, tags used for expectations with the same test must be unambiguous
|
|
so that the same test cannot have multiple expectations applied to it at once.
|
|
Take the following expectations as an example:
|
|
|
|
```
|
|
[ mac intel ] foo/test [ Failure ]
|
|
[ mac debug ] foo/test [ RetryOnFailure ]
|
|
```
|
|
|
|
These expectations would be considered to be conflicting since `[ mac intel ]`
|
|
does not make any distinctions about the browser type, and `[ mac debug ]` does
|
|
not make any distinctions about the GPU type. As written, `foo/test` running
|
|
on a configuration that produced the `mac`, `intel`, and `debug` tags would try
|
|
to use both expectations.
|
|
|
|
This can be fixed by adding a tag from the same tag set but with a different
|
|
value so that the configurations are no longer ambiguous.
|
|
`[ mac intel release ]` would work since a configuration cannot be both
|
|
`release` and `debug` at the same time. Similarly, `[ mac amd debug ]` would
|
|
work since a configuration cannot be both `intel` and `amd` at the same time.
|
|
|
|
Such conflicts will be caught and reported by presubmit tests, so you should not
|
|
have to worry about accidentally landing bad expectations, but you will need to
|
|
fix any found conflicts before you can submit your CL.
|
|
|
|
#### Adding/Modifying Tags
|
|
|
|
Actually updating the test harness to generate new tags is out of scope for this
|
|
documentation. However, if a new tag needs to be added to an expectation file
|
|
or an existing one modified (e.g. renamed), it is important to note that the
|
|
tag header should not be manually modified in the expectation file itself.
|
|
|
|
Instead, modify the header in [validate_tag_consistency.py] and run
|
|
`validate_tag_consistency.py apply` to apply the new header to all expectation
|
|
files. This ensures that all files remain in sync.
|
|
|
|
Tag consistency is checked as part of presubmit, so it will be apparent if you
|
|
accidentally modify the tag header in a file directly.
|
|
|
|
[validate_tag_consistency.py]: https://chromium.googlesource.com/chromium/src/+/main/content/test/gpu/validate_tag_consistency.py
|
|
|
|
### Test Name
|
|
|
|
A single string with either a test name or part of a test name suffixed with a
|
|
wildcard character. Note that the test name is just the test case as reported
|
|
by the test harness, not the fully qualified name that is sometimes reported in
|
|
places such as the "Test Results" tab on bots.
|
|
|
|
As an example,
|
|
`gpu_tests.webgl1_conformance_integration_test.WebGL1ConformanceIntegrationTest.WebglExtension_EXT_blend_minmax`
|
|
is a fully qualified name, while `WebglExtension_EXT_blend_minmax` is what would
|
|
actually be used in the expectation file for the `webgl1_conformance` suite.
|
|
|
|
### Expected Results
|
|
|
|
Usually one, but potentially multiple, results that are expected on the
|
|
configuration that the expectation is for. Like tags, expected results are
|
|
defined at the top of each expectation file and have the same caveat about
|
|
addition/modification with the helper script. However, unlike tags, there is
|
|
only one set of values which are not expected to be added to/changed on any
|
|
sort of regular basis. The following expected results are used by GPU tests:
|
|
|
|
#### Skip
|
|
|
|
Skips the test entirely. The benefit of this is that no time is wasted on a bad
|
|
test. However, it also means that it is impossible to check if the test is still
|
|
failing or not by just looking at historical results. This is problematic for
|
|
humans, but even more problematic for scripts we have to automatically remove
|
|
expectations that are no longer needed.
|
|
|
|
As such, it is heavily discouraged to add new Skip expectations except under the
|
|
following circumstances:
|
|
|
|
1. The test is invalid on a configuration for some reason, e.g. a feature is not
|
|
and will not be supported on a certain OS, and so should never be run. These
|
|
sorts of expectations are expected to be permanent.
|
|
1. The act of running the test is significantly detrimental to other tests, e.g.
|
|
running the test kills the test device. These are expected to be temporary,
|
|
so the root cause should be fixed relatively quickly.
|
|
|
|
If presubmit thinks you are adding new Skip expectations, it will warn you, but
|
|
the warning can be ignored if the addition falls into one of the above
|
|
categories or it is a false positive, such as due to modifying tags on an
|
|
existing expectation.
|
|
|
|
#### Failure
|
|
|
|
Lets the test run normally, but hides the fact that it failed during result
|
|
reporting. This is the preferred way to suppress frequent failures on bots, as
|
|
it keeps the bots green while still reporting results that can be used later.
|
|
|
|
#### RetryOnFailure
|
|
|
|
Allows the test to be retried up to two additional times before being marked as
|
|
failing, as by default GPU tests do not retry on failure. This is preferred if
|
|
the test fails occasionally, but not enough to warrant marking it as failing
|
|
consistently.
|
|
|
|
#### Slow
|
|
|
|
Only has an effect in a subset of test suites. Currently, those are suites that
|
|
use a heartbeat mechanism instead of a fixed timeout:
|
|
|
|
* `webgpu_cts`
|
|
* `webgl1_conformance`
|
|
* `webgl2_conformance`
|
|
|
|
Since these tests use a relatively short timeout that gets refreshed as long as
|
|
the test does not hang, they are more susceptible to timeouts if the test does a
|
|
lot of work or other parallel tests are using a large number of resources. In
|
|
these cases, the `Slow` expectation can be used to increase the heartbeat
|
|
timeout for a test, reducing the chance that one of these timeouts is hit.
|
|
|
|
If the reported failure for a test is along the lines of "Timed out waiting for
|
|
websocket message", prefer to use a `Slow` expectation first over a `Failure` or
|
|
`RetryOnFailure` one.
|
|
|
|
## Extensions
|
|
|
|
In addition to the normal expectation functionality, Chromium has several
|
|
extensions to the expectation file format.
|
|
|
|
### Unexpected Pass Finder Annotations
|
|
|
|
Chromium has several unexpected pass finder scripts (sometimes called stale
|
|
expectation removers) to automatically reclaim test coverage by modifying
|
|
expectation files. These mostly work as intended, but can occasionally make
|
|
changes that don't align with what we actually want. Thus, there are several
|
|
annotations that can be inserted into expectation files to adjust the behavior
|
|
of these scripts.
|
|
|
|
#### Disable
|
|
|
|
There are several annotations that can be used to prevent the scripts from
|
|
automatically removing expectations. All of these start with `finder:disable`
|
|
with some suffix.
|
|
|
|
`finder:disable-general` prevents the expectation from being removed under any
|
|
circumstances.
|
|
|
|
`finder:disable-stale` prevents the expectation from being removed if it is
|
|
still applicable to at least one bot, but all queried results point to the
|
|
expectation no longer being needed. This is most likely to be used for
|
|
expectations for very infrequent flakes, where the flake might not occur within
|
|
the data range that we query.
|
|
|
|
`finder:disable-unused` prevents the expectation from being removed if it is
|
|
found to not be used on any bots, i.e. the specified configuration does not
|
|
appear to actually be tested. This is most likely to be used for expectations
|
|
for failures reported by third parties with their own testing configurations.
|
|
|
|
`finder:disable-narrowing` prevents the expectation from having its scope
|
|
automatically narrowed to only apply to configurations that are found to need
|
|
it. This is most likely to be used for expectations that are intentionally
|
|
broad to prevent failures that aren't planned on being fixed.
|
|
|
|
All of these annotations can either be used inline for a single expectation:
|
|
|
|
```
|
|
[ mac intel ] foo/test [ Failure ] # finder:disable-general
|
|
```
|
|
|
|
or with their `finder:enable` equivalent for blocks:
|
|
|
|
```
|
|
# finder:disable-general
|
|
[ mac intel ] foo/test [ Failure ]
|
|
[ mac intel ] bart/test [ Failure ]
|
|
# finder:enable-general
|
|
```
|
|
|
|
Nested blocks are not allowed. The `finder:disable` annotations can be followed
|
|
with a description of why the disable is necessary, which will be output by the
|
|
script when it encounters a case where one of the disabled expectations would
|
|
have been removed if the annotation was not present:
|
|
|
|
```
|
|
# finder:disable-stale Very low flake rate
|
|
[ mac intel ] foo/test [ Failure ]
|
|
[ mac intel ] bar/test [ Failure ]
|
|
# finder:enable-stale
|
|
```
|
|
|
|
#### Group Start/End
|
|
|
|
There may be cases where groups of expectations should only be removed together,
|
|
e.g. if a flake affects a large number of tests but the chance of any individual
|
|
test hitting the flake is low. In these cases, the expectations can be grouped
|
|
together so one is only removed if all of them are being removed.
|
|
|
|
```
|
|
# finder:group-start Some group description or name
|
|
[ mac intel ] foo/test [ Failure ]
|
|
[ mac intel ] bar/test [ Failure ]
|
|
# finder:group-end
|
|
```
|
|
|
|
The group name/description is required and is used to uniquely identify each
|
|
group. This means that groups with the same name string in different parts of
|
|
the file will be treated as the same group, as if they were all in a single
|
|
group block together.
|
|
|
|
```
|
|
# finder:group-start group_name
|
|
[ mac ] foo/test [ Failure ]
|
|
[ mac ] bar/test [ Failure ]
|
|
# finder:group-end
|
|
|
|
...
|
|
|
|
# finder:group-start group_name
|
|
[ android ] foo/test [ Failure ]
|
|
[ android ] bar/test [ Failure ]
|
|
# finder:group-end
|
|
```
|
|
|
|
is equivalent to
|
|
|
|
```
|
|
# finder:group-start group_name
|
|
[ mac ] foo/test [ Failure ]
|
|
[ mac ] bar/test [ Failure ]
|
|
[ android ] foo/test [ Failure ]
|
|
[ android ] bar/test [ Failure ]
|
|
# finder:group-end
|
|
```
|