
Adds documentation on the expectation file format used by GPU and Blink tests. Some aspects of the format, particularly various edge cases, do not seem to currently have documentation anywhere. Bug: 1499971 Change-Id: I0942e3afd8b935bf813cfb4acb9fb162aa49191d Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5014876 Reviewed-by: Dirk Pranke <dpranke@google.com> Commit-Queue: Brian Sheedy <bsheedy@chromium.org> Cr-Commit-Position: refs/heads/main@{#1244250}
11 KiB
Expectation Files
A number of test suites in Chromium use expectation files to handle test failures in order to have more granular control compared to the usual approach of entirely disabling failing tests. This documentation goes into the general usage of expecation files, while suite-specific details are handled in other files.
[TOC]
Currently, the test suites that use expectation files can be broadly categorized as Blink tests and GPU tests. Blink-specific documentation can be found here, while GPU-specific documentation can be found here.
Design
The full design for the format can be found here if the overview in this documentation is not sufficient.
Code
The parser implementation used by Chromium can be found here. This handles the parsing of the text files into Python objects usable by Chromium's test harnesses.
Syntax
An expectation file can be broadly broken up into two sections: the header and test expectations.
Header
The header consists of specially formatted comments that define what tags and expected results are usable in expectations later in the file. All header content must be before any expectation content. Failure to do so will result in the parser raising errors. An example header is:
# tags: [ linux ubuntu jammy
# mac mac10 mac11 mac12 mac13
# win win7 win10 ]
# tags: [ release debug ]
# results: [ Failure Skip Slow ]
Specifically, the header consists of one or more tag sets and exactly one expected result set.
Tag Sets
Each tag set begins with a # tags:
comment followed by a space-separated list
of tags between [ ]
. Order does not matter to the parser, and tags are
case-insensitive. Tag sets can span multiple lines as long as each line starts
with #
and all tags are within the brackets.
Each tag set contains all the tags that can be used in expectations for a particular aspect of a test configuration. In the example header, the first tag set contains values for operating systems, while the second tag set contains values for browser build type. Grouping tags together into different sets instead of having a monolithic set with all possible tag values is necessary in order to handle conflicting expectation detection (explained later in the conflict section).
One important note about tag sets is that unless a test harness is implementing custom conflict detection logic, all tags within a set should be mutually exclusive, i.e. only one tag from each tag set should be produced when running a test. Failure to do so can result in conflict detection false negatives, the specifics of which are explained in the conflict section.
Expected Result Set
The expected result set begins with a # results:
comment followed by a
space-separated list of expected results between [ ]
. Order does not matter to
the parser, but expected results are case sensitive. Additionally, only values
known to the parser can be used. The expected results can
span multiple lines as long as each line starts with #
and all values are
within the brackets.
The expected result set contains all the expected results that can be used in expectations. The specifics of how each expected result affects test behavior can differ slightly between test suites, but generally do the following:
- Pass - The default expected result for all tests. Let the test run, and expect it to run without issue.
- Failure - Let the test run, but treat failures as a pass.
- Crash - Let the test run, but treat test failures due to crashes as a pass.
- Timeout - Let the test run, but treat test failures due to timeouts as a pass.
- Skip - Do not run the test.
- RetryOnFailure - Re-enable automatic retries of a test if a suite has them disabled by default.
- Slow - Indicate that the test is expected to take longer than normal, usually as a signal to increase timeouts.
Expectations
After the header, the rest of the file consists of test expectations which specify what non-standard test behavior is expected on specific test machine configurations. An expectation is a single line in the following format:
bug_identifier [ tags ] test_name [ expected_results ]
As an example, the following would be an expectation specifying that the
foo.html
test is expected to fail on Windows machines with Debug browsers:
crbug.com/1234 [ win debug ] foo.html [ Failure ]
The bug identifier and tags are both optional and can be omitted. Not specifying
any tags means that the expectation applies to the test regardless of where it
is run. When omitting tags, the brackets are also omitted. Additionally,
multiple bug identifiers are allowed as long as they are space-separated. The
parser looks for certain prefixes, e.g. crbug.com/
to determine what is
considered a bug. This allows the parser to properly disambiguate one or more
bug identifiers from the test name in the event that an expectation does not
have any tags.
Multiple expected results are allowed and are space-separated like tags. As an
example, [ Failure Crash ]
would specify that the test is expected to either
fail or crash.
Additionally, the test name is allowed to have up to one wildcard at the very
end to match any tests that begin with the specified name. As an example, the
following would be an expectation specifying that any test starting with foo
is expected to fail on Windows machines with Debug browsers.
crbug.com/1234 [ win debug ] foo* [ Failure ]
Priority
When using wildcards, it is possible for multiple expectations to apply to a test at runtime. For example, given the following:
[ win ] foo* [ Slow ]
[ win ] foo/bar* [ Failure ]
[ win ] foo/bar/specific_test.html [ Skip ]
foo/bar/specific_test.html
running on a Windows machine would have three
applicable expectations. In these cases, the most specific (i.e. the
longest-named) expectation will be used.
The order in which expectations are defined is not considered when determining priority.
Conflicts
When more than one expectation exists for a test, it is possible that there will be a conflict where a test run on a particular test machine could have more than one expectation apply to it. Whether these conflicts are treated as errors and how conflicts get resolved are both configurable options via annotations found under the annotations section.
Detection
Two expectations for the same test conflict with each other if they do not use different tags from at least one shared tag set. As an example, look at the following expectations:
# Group 1
[ win ] foo.html [ Failure ]
[ mac ] foo.html [ Skip ]
# Group 2
[ win ] bar.html [ Failure ]
[ debug ] bar.html [ Skip ]
# Group 3
[ linux ] foo.html [ Failure ]
[ linux debug ] foo.html [ Skip ]
Group 1 would not result in a conflict since both win
and mac
are from the
same tag set and are different values. Thus, the parser would be able to
determine that at most one expectation will apply when running a test.
Group 2 would result in a conflict since there are no tag sets that both
expectations use, and thus there could be a test configuration that causes both
expectations to apply. In this case, a configuration that produces both the
win
and debug
tags is possible. This conflict could be resolved by adding
a browser type tag to the first expectation or an operating system tag to the
second expectation.
Group 3 would result in a conflict since there is a tag set that both
expectations use (operating system), but the exact tag is the same. Thus, a
test running on Linux with a Debug browser would have both expectations apply.
This conflict could be resolved by changing the first expectation to use
[ linux release ]
.
It is important to be aware of the following when it comes to conflicts:
- The expectation file has no knowledge of which tag combinations are actually
possible in the real world, only what is theoretically possible given the
defined tag sets. A real world example of this would be the use of the Metal
API, which is Mac-specific. While a human would be able to reason that
[ metal ]
implies[ mac metal ]
, the latter is necessary for the conflict detection to work properly. - If tag sets include non-mutually-exclusive values and the test suite has not
implemented custom conflict checking logic, there can be false negatives when
checking for conflicts. For example, if
win
andwin10
were both in the OS tag set,[ win ] foo.html [ Failure ]
and[ win10 ] foo.html [ Skip ]
would not be found to conflict even though they can in the real world due towin10
being a more specific version ofwin
. - Expectations that use wildcards can result in conflict detection false
negatives. Conflict detection is only run on expectations with identical test
names. Thus, while
[ win ] foo* [ Failure ]
and[ debug ] foo* [ Skip ]
would be found to conflict since the test name isfoo*
in both cases,[ win ] f* [ Failure ]
and[ debug ] foo* [ Skip ]
would not be found to conflict.
Annotations
By default, conflicts result in a parsing error. However, expectation files support several annotations to affect how conflicts are handled.
# conflicts_allowed: true
causes conflicts to no longer cause parsing errors.
Instead, conflicts will be handled gracefully depending on the conflict
resolution setting, the default of which is to take the union of expected
results.
# conflict_resolution:
specifies how conflicts will be handled when they are
allowed. Supported values are union
(the default) and override
. union
causes all conflicted expectations to be merged together. For example, the
following:
[ win ] foo.html [ Failure ]
[ debug ] foo.html [ Slow ]
would be equivalent to [ win debug ] foo.html [ Failure Slow ]
when running on
a Windows machine with a Debug browser.
override
uses whatever expectation was parsed last. Using the above example,
A Windows machine with a Debug browser would end up using the
[ debug ] foo.html [ Slow ]
expectation.