Port GPU documentation to Markdown
This ports the following wiki pages into markdown: https://www.chromium.org/developers/testing/gpu-testing https://www.chromium.org/developers/testing/gpu-testing/gpu-bot-details https://www.chromium.org/developers/how-tos/gpu-wrangling https://www.chromium.org/developers/how-tos/debugging-gpu-related-code and updates *some* of the old outdated content. Bug: 813153 Change-Id: Ic5f1b58659bbdb691343785cb18c50f4d55c177f Reviewed-on: https://chromium-review.googlesource.com/987233 Reviewed-by: Kenneth Russell <kbr@chromium.org> Commit-Queue: Kenneth Russell <kbr@chromium.org> Cr-Commit-Position: refs/heads/master@{#547060}
This commit is contained in:
235
docs/gpu/debugging_gpu_related_code.md
Normal file
235
docs/gpu/debugging_gpu_related_code.md
Normal file
@ -0,0 +1,235 @@
|
||||
# Debugging GPU related code
|
||||
|
||||
Chromium's GPU system is multi-process, which can make debugging it rather
|
||||
difficult. See [GPU Command Buffer] for some of the nitty gitty. These are just
|
||||
a few notes to help with debugging.
|
||||
|
||||
[TOC]
|
||||
|
||||
<!-- TODO(kainino): update link if the page moves -->
|
||||
[GPU Command Buffer]: https://sites.google.com/a/chromium.org/dev/developers/design-documents/gpu-command-buffer
|
||||
|
||||
## Renderer Process Code
|
||||
|
||||
### `--enable-gpu-client-logging`
|
||||
|
||||
If you are trying to track down a bug in a GPU client process (compositing,
|
||||
WebGL, Skia/Ganesh, Aura), then in a debug build you can use the
|
||||
`--enable-gpu-client-logging` flag, which will show every GL call sent to the
|
||||
GPU service process. (From the point of view of a GPU client, it's calling
|
||||
OpenGL ES functions - but the real driver calls are made in the GPU process.)
|
||||
|
||||
```
|
||||
[4782:4782:1219/141706:INFO:gles2_implementation.cc(1026)] [.WebGLRenderingContext] glUseProgram(3)
|
||||
[4782:4782:1219/141706:INFO:gles2_implementation_impl_autogen.h(401)] [.WebGLRenderingContext] glGenBuffers(1, 0x7fffc9e1269c)
|
||||
[4782:4782:1219/141706:INFO:gles2_implementation_impl_autogen.h(416)] 0: 1
|
||||
[4782:4782:1219/141706:INFO:gles2_implementation_impl_autogen.h(23)] [.WebGLRenderingContext] glBindBuffer(GL_ARRAY_BUFFER, 1)
|
||||
[4782:4782:1219/141706:INFO:gles2_implementation.cc(1313)] [.WebGLRenderingContext] glBufferData(GL_ARRAY_BUFFER, 36, 0x7fd268580120, GL_STATIC_DRAW)
|
||||
[4782:4782:1219/141706:INFO:gles2_implementation.cc(2480)] [.WebGLRenderingContext] glEnableVertexAttribArray(0)
|
||||
[4782:4782:1219/141706:INFO:gles2_implementation.cc(1140)] [.WebGLRenderingContext] glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, 0)
|
||||
[4782:4782:1219/141706:INFO:gles2_implementation_impl_autogen.h(135)] [.WebGLRenderingContext] glClear(16640)
|
||||
[4782:4782:1219/141706:INFO:gles2_implementation.cc(2490)] [.WebGLRenderingContext] glDrawArrays(GL_TRIANGLES, 0, 3)
|
||||
```
|
||||
|
||||
### Checking about:gpu
|
||||
|
||||
The GPU process logs many errors and warnings. You can see these by navigating
|
||||
to `about:gpu`. Logs appear at the bottom of the page. You can also see them
|
||||
on standard output if Chromium is run from the command line on Linux/Mac.
|
||||
On Windows, you need debugging tools (like VS, WinDbg, etc.) to connect to the
|
||||
debug output stream.
|
||||
|
||||
**Note:** If `about:gpu` is telling you that your GPU is disabled and
|
||||
hardware acceleration is unavailable, it might be a problem with your GPU being
|
||||
unsupported. To override this and turn on hardware acceleration anyway, you can
|
||||
use the `--ignore-gpu-blacklist` command line option when starting Chromium.
|
||||
|
||||
### Breaking on GL Error
|
||||
|
||||
In <code>[gles2_implementation.h]</code>, there is some code like this:
|
||||
|
||||
```cpp
|
||||
// Set to 1 to have the client fail when a GL error is generated.
|
||||
// This helps find bugs in the renderer since the debugger stops on the error.
|
||||
#if DCHECK_IS_ON()
|
||||
#if 0
|
||||
#define GL_CLIENT_FAIL_GL_ERRORS
|
||||
#endif
|
||||
#endif
|
||||
```
|
||||
|
||||
Change that `#if 0` to `#if 1`, build a debug build, then run in a debugger.
|
||||
The debugger will break when any renderer code sees a GL error, and you should
|
||||
be able to examine the call stack to find the issue.
|
||||
|
||||
[gles2_implementation.h]: https://chromium.googlesource.com/chromium/src/+/master/gpu/command_buffer/client/gles2_implementation.h
|
||||
|
||||
### Labeling your calls
|
||||
|
||||
The output of all of the errors, warnings and debug logs are prefixed. You can
|
||||
set this prefix by calling `glPushGroupMarkerEXT`, `glPopGroupMarkerEXT` and
|
||||
`glInsertEventMarkerEXT`. `glPushGroupMarkerEXT` appends a string to the end of
|
||||
the current log prefix (think namespace in C++). `glPopGroupmarkerEXT` pops off
|
||||
the last string appended. `glInsertEventMarkerEXT` sets a suffix for the
|
||||
current string. Example:
|
||||
|
||||
```cpp
|
||||
glPushGroupMarkerEXT(0, "Foo"); // -> log prefix = "Foo"
|
||||
glInsertEventMarkerEXT(0, "This"); // -> log prefix = "Foo.This"
|
||||
glInsertEventMarkerEXT(0, "That"); // -> log prefix = "Foo.That"
|
||||
glPushGroupMarkerEXT(0, "Bar"); // -> log prefix = "Foo.Bar"
|
||||
glInsertEventMarkerEXT(0, "Orange"); // -> log prefix = "Foo.Bar.Orange"
|
||||
glInsertEventMarkerEXT(0, "Banana"); // -> log prefix = "Foo.Bar.Banana"
|
||||
glPopGroupMarkerEXT(); // -> log prefix = "Foo.That"
|
||||
```
|
||||
|
||||
### Making a reduced test case.
|
||||
|
||||
You can often make a simple OpenGL-ES-2.0-only C++ reduced test case that is
|
||||
relatively quick to compile and test, by adding tests to the `gl_tests` target.
|
||||
Those tests exist in `src/gpu/command_buffer/tests` and are made part of the
|
||||
build in `src/gpu/gpu.gyp`. Build with `ninja -C out/Debug gl_tests`. All the
|
||||
same command line options listed on this page will work with the `gl_tests`,
|
||||
plus `--gtest_filter=NameOfTest` to run a specific test. Note the `gl_tests`
|
||||
are not multi-process, so they probably won't help with race conditions, but
|
||||
they do go through most of the same code and are much easier to debug.
|
||||
|
||||
### Debugging the renderer process
|
||||
|
||||
Given that Chrome starts many renderer processes I find it's easier if I either
|
||||
have a remote webpage I can access or I make one locally and then use a local
|
||||
server to serve it like `python -m SimpleHTTPServer`. Then
|
||||
|
||||
On Linux this works for me:
|
||||
|
||||
* `out/Debug/chromium --no-sandbox --renderer-cmd-prefix="xterm -e gdb
|
||||
--args" http://localhost:8000/page-to-repro.html`
|
||||
|
||||
On OSX this works for me:
|
||||
|
||||
* `out/Debug/Chromium.app/Contents/MacOSX/Chromium --no-sandbox
|
||||
--renderer-cmd-prefix="xterm -e gdb --args"
|
||||
http://localhost:8000/page-to-repro.html`
|
||||
|
||||
On Windows I use `--renderer-startup-dialog` and then connect to the listed process.
|
||||
|
||||
Note 1: On Linux and OSX I use `cgdb` instead of `gdb`.
|
||||
|
||||
Note 2: GDB can take minutes to index symbol. To save time, you can precache
|
||||
that computation by running `build/gdb-add-index out/Debug/chrome`.
|
||||
|
||||
## GPU Process Code
|
||||
|
||||
### `--enable-gpu-service-logging`
|
||||
|
||||
In a debug build, this will print all actual calls into the GL driver.
|
||||
|
||||
```
|
||||
[5497:5497:1219/142413:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kEnableVertexAttribArray
|
||||
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(905)] glEnableVertexAttribArray(0)
|
||||
[5497:5497:1219/142413:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kVertexAttribPointer
|
||||
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(1573)] glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, 0)
|
||||
[5497:5497:1219/142413:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kClear
|
||||
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(746)] glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE)
|
||||
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(840)] glDepthMask(GL_TRUE)
|
||||
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(900)] glEnable(GL_DEPTH_TEST)
|
||||
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(1371)] glStencilMaskSeparate(GL_FRONT, 4294967295)
|
||||
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(1371)] glStencilMaskSeparate(GL_BACK, 4294967295)
|
||||
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(860)] glDisable(GL_STENCIL_TEST)
|
||||
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(860)] glDisable(GL_CULL_FACE)
|
||||
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(860)] glDisable(GL_SCISSOR_TEST)
|
||||
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(900)] glEnable(GL_BLEND)
|
||||
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(721)] glClear(16640)
|
||||
[5497:5497:1219/142413:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kDrawArrays
|
||||
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(870)] glDrawArrays(GL_TRIANGLES, 0, 3)
|
||||
```
|
||||
|
||||
Note that GL calls into the driver are not currently prefixed (todo?). But, you
|
||||
can tell from the commands logged which command, from which context caused the
|
||||
following GL calls to be made.
|
||||
|
||||
Also note that client resource IDs are virtual IDs, so calls into the real GL
|
||||
driver will not match (though some commands print the mapping). Examples:
|
||||
|
||||
```
|
||||
[5497:5497:1219/142413:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kBindTexture
|
||||
[5497:5497:1219/142413:INFO:gles2_cmd_decoder.cc(837)] [.WebGLRenderingContext] glBindTexture: client_id = 2, service_id = 10
|
||||
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(662)] glBindTexture(GL_TEXTURE_2D, 10)
|
||||
[5497:5497:1219/142413:ERROR:gles2_cmd_decoder.cc(3301)] [0052064A367F0000]cmd: kBindBuffer
|
||||
[5497:5497:1219/142413:INFO:gles2_cmd_decoder.cc(837)] [0052064A367F0000] glBindBuffer: client_id = 2, service_id = 6
|
||||
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(637)] glBindBuffer(GL_ARRAY_BUFFER, 6)
|
||||
[5497:5497:1219/142413:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kBindFramebuffer
|
||||
[5497:5497:1219/142413:INFO:gles2_cmd_decoder.cc(837)] [.WebGLRenderingContext] glBindFramebuffer: client_id = 1, service_id = 3
|
||||
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(652)] glBindFramebufferEXT(GL_FRAMEBUFFER, 3)
|
||||
```
|
||||
|
||||
etc... so that you can see renderer process code would be using the client IDs
|
||||
where as the gpu process is using the service IDs. This is useful for matching
|
||||
up calls if you're dumping both client and service GL logs.
|
||||
|
||||
### `--enable-gpu-debugging`
|
||||
|
||||
In any build, this will call glGetError after each command
|
||||
|
||||
### `--enable-gpu-command-logging`
|
||||
|
||||
This will print the name of each GPU command before it is executed.
|
||||
|
||||
```
|
||||
[5234:5234:1219/052139:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kBindBuffer
|
||||
[5234:5234:1219/052139:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kBufferData
|
||||
[5234:5234:1219/052139:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: SetToken
|
||||
[5234:5234:1219/052139:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kEnableVertexAttribArray
|
||||
[5234:5234:1219/052139:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kVertexAttribPointer
|
||||
[5234:5234:1219/052139:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kClear
|
||||
[5234:5234:1219/052139:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kDrawArrays
|
||||
```
|
||||
|
||||
### Debugging in the GPU Process
|
||||
|
||||
Given the multi-processness of chromium it can be hard to debug both sides.
|
||||
Turing on all the logging and having a small test case is useful. One minor
|
||||
suggestion, if you have some idea where the bug is happening a call to some
|
||||
obscure gl function like `glHint()` can give you a place to catch a command
|
||||
being processed in the GPU process (put a break point on
|
||||
`gpu::gles2::GLES2DecoderImpl::HandleHint`. Once in you can follow the commands
|
||||
after that. All of them go through `gpu::gles2::GLES2DecoderImpl::DoCommand`.
|
||||
|
||||
To actually debug the GPU process:
|
||||
|
||||
On Linux this works for me:
|
||||
|
||||
* `out/Debug/chromium --no-sandbox --gpu-launcher="xterm -e gdb --args"
|
||||
http://localhost:8000/page-to-repro.html`
|
||||
|
||||
On OSX this works for me:
|
||||
|
||||
* `out/Debug/Chromium.app/Contents/MacOSX/Chromium --no-sandbox
|
||||
--gpu-launcher="xterm -e gdb --args"
|
||||
http://localhost:8000/page-to-repro.html`
|
||||
|
||||
On Windows I use `--gpu-startup-dialog` and then connect to the listed process.
|
||||
|
||||
### `GPU PARSE ERROR`
|
||||
|
||||
If you see this message in `about:gpu` or your console and you didn't cause it
|
||||
directly (by calling `glLoseContextCHROMIUM`) and it's something other than 5
|
||||
that means there's likely a bug. Please file an issue at <http://crbug.com/new>.
|
||||
|
||||
## Debugging Performance
|
||||
|
||||
If you have something to add here please add it. Most perf debugging is done
|
||||
using `about:tracing` (see [Trace Event Profiling] for details). Otherwise,
|
||||
be aware that, since the system is multi-process, calling:
|
||||
|
||||
```
|
||||
start = GetTime()
|
||||
DoSomething()
|
||||
glFinish()
|
||||
end = GetTime
|
||||
printf("elapsedTime = %f\n", end - start);
|
||||
```
|
||||
|
||||
**will not** give you meaningful results.
|
||||
|
||||
[See Trace Event Profiling for details]: https://sites.google.com/a/chromium.org/dev/developers/how-tos/trace-event-profiling-tool
|
571
docs/gpu/gpu_testing.md
Normal file
571
docs/gpu/gpu_testing.md
Normal file
@ -0,0 +1,571 @@
|
||||
# GPU Testing
|
||||
|
||||
This set of pages documents the setup and operation of the GPU bots and try
|
||||
servers, which verify the correctness of Chrome's graphically accelerated
|
||||
rendering pipeline.
|
||||
|
||||
[TOC]
|
||||
|
||||
## Overview
|
||||
|
||||
The GPU bots run a different set of tests than the majority of the Chromium
|
||||
test machines. The GPU bots specifically focus on tests which exercise the
|
||||
graphics processor, and whose results are likely to vary between graphics card
|
||||
vendors.
|
||||
|
||||
Most of the tests on the GPU bots are run via the [Telemetry framework].
|
||||
Telemetry was originally conceived as a performance testing framework, but has
|
||||
proven valuable for correctness testing as well. Telemetry directs the browser
|
||||
to perform various operations, like page navigation and test execution, from
|
||||
external scripts written in Python. The GPU bots launch the full Chromium
|
||||
browser via Telemetry for the majority of the tests. Using the full browser to
|
||||
execute tests, rather than smaller test harnesses, has yielded several
|
||||
advantages: testing what is shipped, improved reliability, and improved
|
||||
performance.
|
||||
|
||||
[Telemetry framework]: https://github.com/catapult-project/catapult/tree/master/telemetry
|
||||
|
||||
A subset of the tests, called "pixel tests", grab screen snapshots of the web
|
||||
page in order to validate Chromium's rendering architecture end-to-end. Where
|
||||
necessary, GPU-specific results are maintained for these tests. Some of these
|
||||
tests verify just a few pixels, using handwritten code, in order to use the
|
||||
same validation for all brands of GPUs.
|
||||
|
||||
The GPU bots use the Chrome infrastructure team's [recipe framework], and
|
||||
specifically the [`chromium`][recipes/chromium] and
|
||||
[`chromium_trybot`][recipes/chromium_trybot] recipes, to describe what tests to
|
||||
execute. Compared to the legacy master-side buildbot scripts, recipes make it
|
||||
easy to add new steps to the bots, change the bots' configuration, and run the
|
||||
tests locally in the same way that they are run on the bots. Additionally, the
|
||||
`chromium` and `chromium_trybot` recipes make it possible to send try jobs which
|
||||
add new steps to the bots. This single capability is a huge step forward from
|
||||
the previous configuration where new steps were added blindly, and could cause
|
||||
failures on the tryservers. For more details about the configuration of the
|
||||
bots, see the [GPU bot details].
|
||||
|
||||
[recipe framework]: https://chromium.googlesource.com/external/github.com/luci/recipes-py/+/master/doc/user_guide.md
|
||||
[recipes/chromium]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipes/chromium.py
|
||||
[recipes/chromium_trybot]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipes/chromium_trybot.py
|
||||
[GPU bot details]: gpu_testing_bot_details.md
|
||||
|
||||
The physical hardware for the GPU bots lives in the Swarming pool\*. The
|
||||
Swarming infrastructure ([new docs][new-testing-infra], [older but currently
|
||||
more complete docs][isolated-testing-infra]) provides many benefits:
|
||||
|
||||
* Increased parallelism for the tests; all steps for a given tryjob or
|
||||
waterfall build run in parallel.
|
||||
* Simpler scaling: just add more hardware in order to get more capacity. No
|
||||
manual configuration or distribution of hardware needed.
|
||||
* Easier to run certain tests only on certain operating systems or types of
|
||||
GPUs.
|
||||
* Easier to add new operating systems or types of GPUs.
|
||||
* Clearer description of the binary and data dependencies of the tests. If
|
||||
they run successfully locally, they'll run successfully on the bots.
|
||||
|
||||
(\* All but a few one-off GPU bots are in the swarming pool. The exceptions to
|
||||
the rule are described in the [GPU bot details].)
|
||||
|
||||
The bots on the [chromium.gpu.fyi] waterfall are configured to always test
|
||||
top-of-tree ANGLE. This setup is done with a few lines of code in the
|
||||
[tools/build workspace]; search the code for "angle".
|
||||
|
||||
These aspects of the bots are described in more detail below, and in linked
|
||||
pages. There is a [presentation][bots-presentation] which gives a brief
|
||||
overview of this documentation and links back to various portions.
|
||||
|
||||
<!-- XXX: broken link -->
|
||||
[new-testing-infra]: https://github.com/luci/luci-py/wiki
|
||||
[isolated-testing-infra]: https://www.chromium.org/developers/testing/isolated-testing/infrastructure
|
||||
[chromium.gpu]: https://build.chromium.org/p/chromium.gpu/console
|
||||
[chromium.gpu.fyi]: https://build.chromium.org/p/chromium.gpu.fyi/console
|
||||
[tools/build workspace]: https://code.google.com/p/chromium/codesearch#chromium/build/scripts/slave/recipe_modules/chromium_tests/chromium_gpu_fyi.py
|
||||
[bots-presentation]: https://docs.google.com/presentation/d/1BC6T7pndSqPFnituR7ceG7fMY7WaGqYHhx5i9ECa8EI/edit?usp=sharing
|
||||
|
||||
## Fleet Status
|
||||
|
||||
Please see the [GPU Pixel Wrangling instructions] for links to dashboards
|
||||
showing the status of various bots in the GPU fleet.
|
||||
|
||||
[GPU Pixel Wrangling instructions]: pixel_wrangling.md#Fleet-Status
|
||||
|
||||
## Using the GPU Bots
|
||||
|
||||
Most Chromium developers interact with the GPU bots in two ways:
|
||||
|
||||
1. Observing the bots on the waterfalls.
|
||||
2. Sending try jobs to them.
|
||||
|
||||
The GPU bots are grouped on the [chromium.gpu] and [chromium.gpu.fyi]
|
||||
waterfalls. Their current status can be easily observed there.
|
||||
|
||||
To send try jobs, you must first upload your CL to the codereview server. Then,
|
||||
either clicking the "CQ dry run" link or running from the command line:
|
||||
|
||||
```sh
|
||||
git cl try
|
||||
```
|
||||
|
||||
Sends your job to the default set of try servers.
|
||||
|
||||
The GPU tests are part of the default set for Chromium CLs, and are run as part
|
||||
of the following tryservers' jobs:
|
||||
|
||||
* [linux_chromium_rel_ng] on the [tryserver.chromium.linux] waterfall
|
||||
* [mac_chromium_rel_ng] on the [tryserver.chromium.mac] waterfall
|
||||
* [win_chromium_rel_ng] on the [tryserver.chromium.win] waterfall
|
||||
|
||||
[linux_chromium_rel_ng]: http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_rel_ng?numbuilds=100
|
||||
[mac_chromium_rel_ng]: http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_ng?numbuilds=100
|
||||
[win_chromium_rel_ng]: http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng?numbuilds=100
|
||||
[tryserver.chromium.linux]: http://build.chromium.org/p/tryserver.chromium.linux/waterfall?numbuilds=100
|
||||
[tryserver.chromium.mac]: http://build.chromium.org/p/tryserver.chromium.mac/waterfall?numbuilds=100
|
||||
[tryserver.chromium.win]: http://build.chromium.org/p/tryserver.chromium.win/waterfall?numbuilds=100
|
||||
|
||||
Scan down through the steps looking for the text "GPU"; that identifies those
|
||||
tests run on the GPU bots. For each test the "trigger" step can be ignored; the
|
||||
step further down for the test of the same name contains the results.
|
||||
|
||||
It's usually not necessary to explicitly send try jobs just for verifying GPU
|
||||
tests. If you want to, you must invoke "git cl try" separately for each
|
||||
tryserver master you want to reference, for example:
|
||||
|
||||
```sh
|
||||
git cl try -b linux_chromium_rel_ng
|
||||
git cl try -b mac_chromium_rel_ng
|
||||
git cl try -b win_chromium_rel_ng
|
||||
```
|
||||
|
||||
Alternatively, the Gerrit UI can be used to send a patch set to these try
|
||||
servers.
|
||||
|
||||
Three optional tryservers are also available which run additional tests. As of
|
||||
this writing, they ran longer-running tests that can't run against all Chromium
|
||||
CLs due to lack of hardware capacity. They are added as part of the included
|
||||
tryservers for code changes to certain sub-directories.
|
||||
|
||||
* [linux_optional_gpu_tests_rel] on the [tryserver.chromium.linux] waterfall
|
||||
* [mac_optional_gpu_tests_rel] on the [tryserver.chromium.mac] waterfall
|
||||
* [win_optional_gpu_tests_rel] on the [tryserver.chromium.win] waterfall
|
||||
|
||||
[linux_optional_gpu_tests_rel]: https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_optional_gpu_tests_rel?numbuilds=200
|
||||
[mac_optional_gpu_tests_rel]: https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel?numbuilds=200
|
||||
[win_optional_gpu_tests_rel]: https://build.chromium.org/p/tryserver.chromium.win/builders/win_optional_gpu_tests_rel?numbuilds=200
|
||||
|
||||
Tryservers for the [ANGLE project] are also present on the
|
||||
[tryserver.chromium.angle] waterfall. These are invoked from the Gerrit user
|
||||
interface. They are configured similarly to the tryservers for regular Chromium
|
||||
patches, and run the same tests that are run on the [chromium.gpu.fyi]
|
||||
waterfall, in the same way (e.g., against ToT ANGLE).
|
||||
|
||||
If you find it necessary to try patches against other sub-repositories than
|
||||
Chromium (`src/`) and ANGLE (`src/third_party/angle/`), please
|
||||
[file a bug](http://crbug.com/new) with component Internals\>GPU\>Testing.
|
||||
|
||||
[ANGLE project]: https://chromium.googlesource.com/angle/angle/+/master/README.md
|
||||
[tryserver.chromium.angle]: https://build.chromium.org/p/tryserver.chromium.angle/waterfall
|
||||
[file a bug]: http://crbug.com/new
|
||||
|
||||
## Running the GPU Tests Locally
|
||||
|
||||
All of the GPU tests running on the bots can be run locally from a Chromium
|
||||
build. Many of the tests are simple executables:
|
||||
|
||||
* `angle_unittests`
|
||||
* `content_gl_tests`
|
||||
* `gl_tests`
|
||||
* `gl_unittests`
|
||||
* `tab_capture_end2end_tests`
|
||||
|
||||
Some run only on the chromium.gpu.fyi waterfall, either because there isn't
|
||||
enough machine capacity at the moment, or because they're closed-source tests
|
||||
which aren't allowed to run on the regular Chromium waterfalls:
|
||||
|
||||
* `angle_deqp_gles2_tests`
|
||||
* `angle_deqp_gles3_tests`
|
||||
* `angle_end2end_tests`
|
||||
* `audio_unittests`
|
||||
|
||||
The remaining GPU tests are run via Telemetry. In order to run them, just
|
||||
build the `chrome` target and then
|
||||
invoke `src/content/test/gpu/run_gpu_integration_test.py` with the appropriate
|
||||
argument. The tests this script can invoke are
|
||||
in `src/content/test/gpu/gpu_tests/`. For example:
|
||||
|
||||
* `run_gpu_integration_test.py context_lost --browser=release`
|
||||
* `run_gpu_integration_test.py pixel --browser=release`
|
||||
* `run_gpu_integration_test.py webgl_conformance --browser=release --webgl-conformance-version=1.0.2`
|
||||
* `run_gpu_integration_test.py maps --browser=release`
|
||||
* `run_gpu_integration_test.py screenshot_sync --browser=release`
|
||||
* `run_gpu_integration_test.py trace_test --browser=release`
|
||||
|
||||
**Note:** If you are on Linux and see this test harness exit immediately with
|
||||
`**Non zero exit code**`, it's probably because of some incompatible Python
|
||||
packages being installed. Please uninstall the `python-egenix-mxdatetime` and
|
||||
`python-logilab-common` packages in this case; see
|
||||
[Issue 716241](http://crbug.com/716241).
|
||||
|
||||
You can also run a subset of tests with this harness:
|
||||
|
||||
* `run_gpu_integration_test.py webgl_conformance --browser=release
|
||||
--test-filter=conformance_attribs`
|
||||
|
||||
Figuring out the exact command line that was used to invoke the test on the
|
||||
bots can be a little tricky. The bots all\* run their tests via Swarming and
|
||||
isolates, meaning that the invocation of a step like `[trigger]
|
||||
webgl_conformance_tests on NVIDIA GPU...` will look like:
|
||||
|
||||
* `python -u
|
||||
'E:\b\build\slave\Win7_Release__NVIDIA_\build\src\tools\swarming_client\swarming.py'
|
||||
trigger --swarming https://chromium-swarm.appspot.com
|
||||
--isolate-server https://isolateserver.appspot.com
|
||||
--priority 25 --shards 1 --task-name 'webgl_conformance_tests on NVIDIA GPU...'`
|
||||
|
||||
You can figure out the additional command line arguments that were passed to
|
||||
each test on the bots by examining the trigger step and searching for the
|
||||
argument separator (<code> -- </code>). For a recent invocation of
|
||||
`webgl_conformance_tests`, this looked like:
|
||||
|
||||
* `webgl_conformance --show-stdout '--browser=release' -v
|
||||
'--extra-browser-args=--enable-logging=stderr --js-flags=--expose-gc'
|
||||
'--isolated-script-test-output=${ISOLATED_OUTDIR}/output.json'`
|
||||
|
||||
You can leave off the --isolated-script-test-output argument, so this would
|
||||
leave a full command line of:
|
||||
|
||||
* `run_gpu_integration_test.py
|
||||
webgl_conformance --show-stdout '--browser=release' -v
|
||||
'--extra-browser-args=--enable-logging=stderr --js-flags=--expose-gc'`
|
||||
|
||||
The Maps test requires you to authenticate to cloud storage in order to access
|
||||
the Web Page Reply archive containing the test. See [Cloud Storage Credentials]
|
||||
for documentation on setting this up.
|
||||
|
||||
[Cloud Storage Credentials]: gpu_testing_bot_details.md#Cloud-storage-credentials
|
||||
|
||||
Pixel tests use reference images from cloud storage, bots pass
|
||||
`--upload-refimg-to-cloud-storage` argument, but to run locally you need to pass
|
||||
`--download-refimg-from-cloud-storage` argument, as well as other arguments bot
|
||||
uses, like `--refimg-cloud-storage-bucket` and `--os-type`.
|
||||
|
||||
Sample command line for Android:
|
||||
|
||||
* `run_gpu_integration_test.py pixel --show-stdout --browser=android-chromium
|
||||
-v --passthrough --extra-browser-args='--enable-logging=stderr
|
||||
--js-flags=--expose-gc' --refimg-cloud-storage-bucket
|
||||
chromium-gpu-archive/reference-images --os-type android
|
||||
--download-refimg-from-cloud-storage`
|
||||
|
||||
<!-- XXX: update this section; these isolates don't exist anymore -->
|
||||
You can find the isolates for the various tests in
|
||||
[src/chrome/](http://src.chromium.org/viewvc/chrome/trunk/src/chrome/):
|
||||
|
||||
* [angle_unittests.isolate](https://chromium.googlesource.com/chromium/src/+/master/chrome/angle_unittests.isolate)
|
||||
* [content_gl_tests.isolate](https://chromium.googlesource.com/chromium/src/+/master/content/content_gl_tests.isolate)
|
||||
* [gl_tests.isolate](https://chromium.googlesource.com/chromium/src/+/master/chrome/gl_tests.isolate)
|
||||
* [gles2_conform_test.isolate](https://chromium.googlesource.com/chromium/src/+/master/chrome/gles2_conform_test.isolate)
|
||||
* [tab_capture_end2end_tests.isolate](https://chromium.googlesource.com/chromium/src/+/master/chrome/tab_capture_end2end_tests.isolate)
|
||||
* [telemetry_gpu_test.isolate](https://chromium.googlesource.com/chromium/src/+/master/chrome/telemetry_gpu_test.isolate)
|
||||
|
||||
The isolates contain the full or partial command line for invoking the target.
|
||||
The complete command line for any test can be deduced from the contents of the
|
||||
isolate plus the stdio output from the test's run on the bot.
|
||||
|
||||
Note that for the GN build, the isolates are simply described by build targets,
|
||||
and [gn_isolate_map.pyl] describes the mapping between isolate name and build
|
||||
target, as well as the command line used to invoke the isolate. Once all
|
||||
platforms have switched to GN, the .isolate files will be obsolete and be
|
||||
removed.
|
||||
|
||||
(\* A few of the one-off GPU configurations on the chromium.gpu.fyi waterfall
|
||||
run their tests locally rather than via swarming, in order to decrease the
|
||||
number of physical machines needed.)
|
||||
|
||||
[gn_isolate_map.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/gn_isolate_map.pyl
|
||||
|
||||
## Running Binaries from the Bots Locally
|
||||
|
||||
Any binary run remotely on a bot can also be run locally, assuming the local
|
||||
machine loosely matches the architecture and OS of the bot.
|
||||
|
||||
The easiest way to do this is to find the ID of the swarming task and use
|
||||
"swarming.py reproduce" to re-run it:
|
||||
|
||||
* `./src/tools/swarming_client/swarming.py reproduce -S https://chromium-swarm.appspot.com [task ID]`
|
||||
|
||||
The task ID can be found in the stdio for the "trigger" step for the test. For
|
||||
example, look at a recent build from the [Mac Release (Intel)] bot, and
|
||||
look at the `gl_unittests` step. You will see something like:
|
||||
|
||||
[Mac Release (Intel)]: https://ci.chromium.org/buildbot/chromium.gpu/Mac%20Release%20%28Intel%29/
|
||||
|
||||
```
|
||||
Triggered task: gl_unittests on Intel GPU on Mac/Mac-10.12.6/[TRUNCATED_ISOLATE_HASH]/Mac Release (Intel)/83664
|
||||
To collect results, use:
|
||||
swarming.py collect -S https://chromium-swarm.appspot.com --json /var/folders/[PATH_TO_TEMP_FILE].json
|
||||
Or visit:
|
||||
https://chromium-swarm.appspot.com/user/task/[TASK_ID]
|
||||
```
|
||||
|
||||
There is a difference between the isolate's hash and Swarming's task ID. Make
|
||||
sure you use the task ID and not the isolate's hash.
|
||||
|
||||
As of this writing, there seems to be a
|
||||
[bug](https://github.com/luci/luci-py/issues/250)
|
||||
when attempting to re-run the Telemetry based GPU tests in this way. For the
|
||||
time being, this can be worked around by instead downloading the contents of
|
||||
the isolate. To do so, look more deeply into the trigger step's log:
|
||||
|
||||
* <code>python -u
|
||||
/b/build/slave/Mac_10_10_Release__Intel_/build/src/tools/swarming_client/swarming.py
|
||||
trigger [...more args...] --tag data:[ISOLATE_HASH] [...more args...]
|
||||
[ISOLATE_HASH] -- **[...TEST_ARGS...]**</code>
|
||||
|
||||
As of this writing, the isolate hash appears twice in the command line. To
|
||||
download the isolate's contents into directory `foo` (note, this is in the
|
||||
"Help" section associated with the page for the isolate's task, but I'm not
|
||||
sure whether that's accessible only to Google employees or all members of the
|
||||
chromium.org organization):
|
||||
|
||||
* `python isolateserver.py download -I https://isolateserver.appspot.com
|
||||
--namespace default-gzip -s [ISOLATE_HASH] --target foo`
|
||||
|
||||
`isolateserver.py` will tell you the approximate command line to use. You
|
||||
should concatenate the `TEST_ARGS` highlighted in red above with
|
||||
`isolateserver.py`'s recommendation. The `ISOLATED_OUTDIR` variable can be
|
||||
safely replaced with `/tmp`.
|
||||
|
||||
Note that `isolateserver.py` downloads a large number of files (everything
|
||||
needed to run the test) and may take a while. There is a way to use
|
||||
`run_isolated.py` to achieve the same result, but as of this writing, there
|
||||
were problems doing so, so this procedure is not documented at this time.
|
||||
|
||||
Before attempting to download an isolate, you must ensure you have permission
|
||||
to access the isolate server. Full instructions can be [found
|
||||
here][isolate-server-credentials]. For most cases, you can simply run:
|
||||
|
||||
* `./src/tools/swarming_client/auth.py login
|
||||
--service=https://isolateserver.appspot.com`
|
||||
|
||||
The above link requires that you log in with your @google.com credentials. It's
|
||||
not known at the present time whether this works with @chromium.org accounts.
|
||||
Email kbr@ if you try this and find it doesn't work.
|
||||
|
||||
[isolate-server-credentials]: gpu_testing_bot_details.md#Isolate-server-credentials
|
||||
|
||||
## Running Locally Built Binaries on the GPU Bots
|
||||
|
||||
See the [Swarming documentation] for instructions on how to upload your binaries to the isolate server and trigger execution on Swarming.
|
||||
|
||||
[Swarming documentation]: https://www.chromium.org/developers/testing/isolated-testing/for-swes#TOC-Run-a-test-built-locally-on-Swarming
|
||||
|
||||
## Adding New Tests to the GPU Bots
|
||||
|
||||
The goal of the GPU bots is to avoid regressions in Chrome's rendering stack.
|
||||
To that end, let's add as many tests as possible that will help catch
|
||||
regressions in the product. If you see a crazy bug in Chrome's rendering which
|
||||
would be easy to catch with a pixel test running in Chrome and hard to catch in
|
||||
any of the other test harnesses, please, invest the time to add a test!
|
||||
|
||||
There are a couple of different ways to add new tests to the bots:
|
||||
|
||||
1. Adding a new test to one of the existing harnesses.
|
||||
2. Adding an entire new test step to the bots.
|
||||
|
||||
### Adding a new test to one of the existing test harnesses
|
||||
|
||||
Adding new tests to the GTest-based harnesses is straightforward and
|
||||
essentially requires no explanation.
|
||||
|
||||
As of this writing it isn't as easy as desired to add a new test to one of the
|
||||
Telemetry based harnesses. See [Issue 352807](http://crbug.com/352807). Let's
|
||||
collectively work to address that issue. It would be great to reduce the number
|
||||
of steps on the GPU bots, or at least to avoid significantly increasing the
|
||||
number of steps on the bots. The WebGL conformance tests should probably remain
|
||||
a separate step, but some of the smaller Telemetry based tests
|
||||
(`context_lost_tests`, `memory_test`, etc.) should probably be combined into a
|
||||
single step.
|
||||
|
||||
If you are adding a new test to one of the existing tests (e.g., `pixel_test`),
|
||||
all you need to do is make sure that your new test runs correctly via isolates.
|
||||
See the documentation from the GPU bot details on [adding new isolated
|
||||
tests][new-isolates] for the `GYP_DEFINES` and authentication needed to upload
|
||||
isolates to the isolate server. Most likely the new test will be Telemetry
|
||||
based, and included in the `telemetry_gpu_test_run` isolate. You can then
|
||||
invoke it via:
|
||||
|
||||
* `./src/tools/swarming_client/run_isolated.py -s [HASH]
|
||||
-I https://isolateserver.appspot.com -- [TEST_NAME] [TEST_ARGUMENTS]`
|
||||
|
||||
[new-isolates]: gpu_testing_bot_details.md#Adding-a-new-isolated-test-to-the-bots
|
||||
|
||||
o## Adding new steps to the GPU Bots
|
||||
|
||||
The tests that are run by the GPU bots are described by a couple of JSON files
|
||||
in the Chromium workspace:
|
||||
|
||||
* [`chromium.gpu.json`](https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.json)
|
||||
* [`chromium.gpu.fyi.json`](https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.fyi.json)
|
||||
|
||||
These files are autogenerated by the following script:
|
||||
|
||||
* [`generate_buildbot_json.py`](https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/generate_buildbot_json.py)
|
||||
|
||||
This script is completely self-contained and should hopefully be
|
||||
self-explanatory. The JSON files are parsed by the chromium and chromium_trybot
|
||||
recipes, and describe two types of tests:
|
||||
|
||||
* GTests: those which use the Googletest and Chromium's `base/test/launcher/`
|
||||
frameworks.
|
||||
* Telemetry based tests: those which are built on the Telemetry framework and
|
||||
launch the entire browser.
|
||||
|
||||
A prerequisite of adding a new test to the bots is that that test [run via
|
||||
isolates][new-isolates]. Once that is done, modify `generate_buildbot_json.py` to add the
|
||||
test to the appropriate set of bots. Be careful when adding large new test
|
||||
steps to all of the bots, because the GPU bots are a limited resource and do
|
||||
not currently have the capacity to absorb large new test suites. It is safer to
|
||||
get new tests running on the chromium.gpu.fyi waterfall first, and expand from
|
||||
there to the chromium.gpu waterfall (which will also make them run against
|
||||
every Chromium CL by virtue of the `linux_chromium_rel_ng`,
|
||||
`mac_chromium_rel_ng` and `win_chromium_rel_ng` tryservers' mirroring of the
|
||||
bots on this waterfall – so be careful!).
|
||||
|
||||
Tryjobs which add new test steps to the chromium.gpu.json file will run those
|
||||
new steps during the tryjob, which helps ensure that the new test won't break
|
||||
once it starts running on the waterfall.
|
||||
|
||||
Tryjobs which modify chromium.gpu.fyi.json can be sent to the
|
||||
`win_optional_gpu_tests_rel`, `mac_optional_gpu_tests_rel` and
|
||||
`linux_optional_gpu_tests_rel` tryservers to help ensure that they won't
|
||||
break the FYI bots.
|
||||
|
||||
## Updating and Adding New Pixel Tests to the GPU Bots
|
||||
|
||||
Adding new pixel tests which require reference images is a slightly more
|
||||
complex process than adding other kinds of tests which can validate their own
|
||||
correctness. There are a few reasons for this.
|
||||
|
||||
* Reference image based pixel tests require different golden images for
|
||||
different combinations of operating system, GPU, driver version, OS
|
||||
version, and occasionally other variables.
|
||||
* The reference images must be generated by the main waterfall. The try
|
||||
servers are not allowed to produce new reference images, only consume them.
|
||||
The reason for this is that a patch sent to the try servers might cause an
|
||||
incorrect reference image to be generated. For this reason, the main
|
||||
waterfall bots upload reference images to cloud storage, and the try
|
||||
servers download them and verify their results against them.
|
||||
* The try servers will fail if they run a pixel test requiring a reference
|
||||
image that doesn't exist in cloud storage. This is deliberate, but needs
|
||||
more thought; see [Issue 349262](http://crbug.com/349262).
|
||||
|
||||
If a reference image based pixel test's result is going to change because of a
|
||||
change in ANGLE or Blink (for example), updating the reference images is a
|
||||
slightly tricky process. Here's how to do it:
|
||||
|
||||
* Mark the pixel test as failing in the [pixel tests]' [test expectations]
|
||||
* Commit the change to ANGLE, Blink, etc. which will change the test's
|
||||
results
|
||||
* Note that without the failure expectation, this commit would turn some bots
|
||||
red; a Blink change will turn the GPU bots on the chromium.webkit waterfall
|
||||
red, and an ANGLE change will turn the chromium.gpu.fyi bots red
|
||||
* Wait for Blink/ANGLE/etc. to roll
|
||||
* Commit a change incrementing the revision number associated with the test
|
||||
in the [test pages]
|
||||
* Commit a second change removing the failure expectation, once all of the
|
||||
bots on the main waterfall have generated new reference images. This change
|
||||
should go through the commit queue cleanly.
|
||||
|
||||
[pixel tests]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/pixel_test_pages.py
|
||||
[test expectations]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/pixel_expectations.py
|
||||
[test pages]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/pixel_test_pages.py
|
||||
|
||||
When adding a brand new pixel test that uses a reference image, the steps are
|
||||
similar, but simpler:
|
||||
|
||||
* Mark the test as failing in the same commit which introduces the new test
|
||||
* Wait for the reference images to be produced by all of the GPU bots on the
|
||||
waterfalls (see [chromium-gpu-archive/reference-images])
|
||||
* Commit a change un-marking the test as failing
|
||||
|
||||
When making a Chromium-side change which changes the pixel tests' results:
|
||||
|
||||
* In your CL, both mark the pixel test as failing in the pixel test's test
|
||||
expectations and increment the test's version number in the page set (see
|
||||
above)
|
||||
* After your CL lands, land another CL removing the failure expectations. If
|
||||
this second CL goes through the commit queue cleanly, you know reference
|
||||
images were generated properly.
|
||||
|
||||
In general, when adding a new pixel test, it's better to spot check a few
|
||||
pixels in the rendered image rather than using a reference image per platform.
|
||||
The [GPU rasterization test] is a good example of a recently added test which
|
||||
performs such spot checks.
|
||||
|
||||
[cloud storage bucket]: https://console.developers.google.com/storage/chromium-gpu-archive/reference-images
|
||||
<!-- XXX: old link -->
|
||||
[GPU rasterization test]: http://src.chromium.org/viewvc/chrome/trunk/src/content/test/gpu/gpu_tests/gpu_rasterization.py
|
||||
|
||||
## Stamping out Flakiness
|
||||
|
||||
It's critically important to aggressively investigate and eliminate the root
|
||||
cause of any flakiness seen on the GPU bots. The bots have been known to run
|
||||
reliably for days at a time, and any flaky failures that are tolerated on the
|
||||
bots translate directly into instability of the browser experienced by
|
||||
customers. Critical bugs in subsystems like WebGL, affecting high-profile
|
||||
products like Google Maps, have escaped notice in the past because the bots
|
||||
were unreliable. After much re-work, the GPU bots are now among the most
|
||||
reliable automated test machines in the Chromium project. Let's keep them that
|
||||
way.
|
||||
|
||||
Flakiness affecting the GPU tests can come in from highly unexpected sources.
|
||||
Here are some examples:
|
||||
|
||||
* Intermittent pixel_test failures on Linux where the captured pixels were
|
||||
black, caused by the Display Power Management System (DPMS) kicking in.
|
||||
Disabled the X server's built-in screen saver on the GPU bots in response.
|
||||
* GNOME dbus-related deadlocks causing intermittent timeouts ([Issue
|
||||
309093](http://crbug.com/309093) and related bugs).
|
||||
* Windows Audio system changes causing intermittent assertion failures in the
|
||||
browser ([Issue 310838](http://crbug.com/310838)).
|
||||
* Enabling assertion failures in the C++ standard library on Linux causing
|
||||
random assertion failures ([Issue 328249](http://crbug.com/328249)).
|
||||
* V8 bugs causing random crashes of the Maps pixel test (V8 issues
|
||||
[3022](https://code.google.com/p/v8/issues/detail?id=3022),
|
||||
[3174](https://code.google.com/p/v8/issues/detail?id=3174)).
|
||||
* TLS changes causing random browser process crashes ([Issue
|
||||
264406](http://crbug.com/264406)).
|
||||
* Isolated test execution flakiness caused by failures to reliably clean up
|
||||
temporary directories ([Issue 340415](http://crbug.com/340415)).
|
||||
* The Telemetry-based WebGL conformance suite caught a bug in the memory
|
||||
allocator on Android not caught by any other bot ([Issue
|
||||
347919](http://crbug.com/347919)).
|
||||
* context_lost test failures caused by the compositor's retry logic ([Issue
|
||||
356453](http://crbug.com/356453)).
|
||||
* Multiple bugs in Chromium's support for lost contexts causing flakiness of
|
||||
the context_lost tests ([Issue 365904](http://crbug.com/365904)).
|
||||
* Maps test timeouts caused by Content Security Policy changes in Blink
|
||||
([Issue 395914](http://crbug.com/395914)).
|
||||
* Weak pointer assertion failures in various webgl\_conformance\_tests caused
|
||||
by changes to the media pipeline ([Issue 399417](http://crbug.com/399417)).
|
||||
* A change to a default WebSocket timeout in Telemetry causing intermittent
|
||||
failures to run all WebGL conformance tests on the Mac bots ([Issue
|
||||
403981](http://crbug.com/403981)).
|
||||
* Chrome leaking suspended sub-processes on Windows, apparently a preexisting
|
||||
race condition that suddenly showed up ([Issue
|
||||
424024](http://crbug.com/424024)).
|
||||
* Changes to Chrome's cross-context synchronization primitives causing the
|
||||
wrong tiles to be rendered ([Issue 584381](http://crbug.com/584381)).
|
||||
* A bug in V8's handling of array literals causing flaky failures of
|
||||
texture-related WebGL 2.0 tests ([Issue 606021](http://crbug.com/606021)).
|
||||
* Assertion failures in sync point management related to lost contexts that
|
||||
exposed a real correctness bug ([Issue 606112](http://crbug.com/606112)).
|
||||
* A bug in glibc's `sem_post`/`sem_wait` primitives breaking V8's parallel
|
||||
garbage collection ([Issue 609249](http://crbug.com/609249)).
|
||||
|
||||
If you notice flaky test failures either on the GPU waterfalls or try servers,
|
||||
please file bugs right away with the component Internals>GPU>Testing and
|
||||
include links to the failing builds and copies of the logs, since the logs
|
||||
expire after a few days. [GPU pixel wranglers] should give the highest priority
|
||||
to eliminating flakiness on the tree.
|
||||
|
||||
[GPU pixel wranglers]: pixel_wrangling.md
|
539
docs/gpu/gpu_testing_bot_details.md
Normal file
539
docs/gpu/gpu_testing_bot_details.md
Normal file
@ -0,0 +1,539 @@
|
||||
# GPU Bot Details
|
||||
|
||||
This PAGE describes in detail how the GPU bots are set up, which files affect
|
||||
their configuration, and how to both modify their behavior and add new bots.
|
||||
|
||||
[TOC]
|
||||
|
||||
## Overview of the GPU bots' setup
|
||||
|
||||
Chromium's GPU bots, compared to the majority of the project's test machines,
|
||||
are physical pieces of hardware. When end users run the Chrome browser, they
|
||||
are almost surely running it on a physical piece of hardware with a real
|
||||
graphics processor. There are some portions of the code base which simply can
|
||||
not be exercised by running the browser in a virtual machine, or on a software
|
||||
implementation of the underlying graphics libraries. The GPU bots were
|
||||
developed and deployed in order to cover these code paths, and avoid
|
||||
regressions that are otherwise inevitable in a project the size of the Chromium
|
||||
browser.
|
||||
|
||||
The GPU bots are utilized on the [chromium.gpu] and [chromium.gpu.fyi]
|
||||
waterfalls, and various tryservers, as described in [Using the GPU Bots].
|
||||
|
||||
[chromium.gpu]: https://build.chromium.org/p/chromium.gpu/console
|
||||
[chromium.gpu.fyi]: https://build.chromium.org/p/chromium.gpu.fyi/console
|
||||
[Using the GPU Bots]: gpu_testing.md#Using-the-GPU-Bots
|
||||
|
||||
The vast majority of the hardware for the bots lives in the Chrome-GPU Swarming
|
||||
pool. The waterfall bots are simply virtual machines which spawn Swarming tasks
|
||||
with the appropriate tags to get them to run on the desired GPU and operating
|
||||
system type. So, for example, the [Win10 Release (NVIDIA)] bot is actually a
|
||||
virtual machine which spawns all of its jobs with the Swarming parameters:
|
||||
|
||||
[Win10 Release (NVIDIA)]: https://ci.chromium.org/buildbot/chromium.gpu/Win10%20Release%20%28NVIDIA%29/?limit=200
|
||||
|
||||
```json
|
||||
{
|
||||
"gpu": "10de:1cb3-23.21.13.8816",
|
||||
"os": "Windows-10",
|
||||
"pool": "Chrome-GPU"
|
||||
}
|
||||
```
|
||||
|
||||
Since the GPUs in the Swarming pool are mostly homogeneous, this is sufficient
|
||||
to target the pool of Windows 10-like NVIDIA machines. (There are a few Windows
|
||||
7-like NVIDIA bots in the pool, which necessitates the OS specifier.)
|
||||
|
||||
Details about the bots can be found on [chromium-swarm.appspot.com] and by
|
||||
using `src/tools/swarming_client/swarming.py`, for example `swarming.py bots`.
|
||||
If you are authenticated with @google.com credentials you will be able to make
|
||||
queries of the bots and see, for example, which GPUs are available.
|
||||
|
||||
[chromium-swarm.appspot.com]: https://chromium-swarm.appspot.com/
|
||||
|
||||
The waterfall bots run tests on a single GPU type in order to make it easier to
|
||||
see regressions or flakiness that affect only a certain type of GPU.
|
||||
|
||||
The tryservers like `win_chromium_rel_ng` which include GPU tests, on the other
|
||||
hand, run tests on more than one GPU type. As of this writing, the Windows
|
||||
tryservers ran tests on NVIDIA and AMD GPUs; the Mac tryservers ran tests on
|
||||
Intel and NVIDIA GPUs. The way these tryservers' tests are specified is simply
|
||||
by *mirroring* how one or more waterfall bots work. This is an inherent
|
||||
property of the [`chromium_trybot` recipe][chromium_trybot.py], which was designed to eliminate
|
||||
differences in behavior between the tryservers and waterfall bots. Since the
|
||||
tryservers mirror waterfall bots, if the waterfall bot is working, the
|
||||
tryserver must almost inherently be working as well.
|
||||
|
||||
[chromium_trybot.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipes/chromium_trybot.py
|
||||
|
||||
There are a few one-off GPU configurations on the waterfall where the tests are
|
||||
run locally on physical hardware, rather than via Swarming. A few examples are:
|
||||
|
||||
<!-- XXX: update this list -->
|
||||
* [Mac Pro Release (AMD)](https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Mac%20Pro%20Release%20%28AMD%29/)
|
||||
* [Mac Pro Debug (AMD)](https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Mac%20Pro%20Debug%20%28AMD%29/)
|
||||
* [Linux Release (Intel HD 630)](https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Linux%20Release%20%28Intel%20HD%20630%29/)
|
||||
* [Linux Release (AMD R7 240)](https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Linux%20Release%20%28AMD%20R7%20240%29/)
|
||||
|
||||
There are a couple of reasons to continue to support running tests on a
|
||||
specific machine: it might be too expensive to deploy the required multiple
|
||||
copies of said hardware, or the configuration might not be reliable enough to
|
||||
begin scaling it up.
|
||||
|
||||
## Adding a new isolated test to the bots
|
||||
|
||||
Adding a new test step to the bots requires that the test run via an isolate.
|
||||
Isolates describe both the binary and data dependencies of an executable, and
|
||||
are the underpinning of how the Swarming system works. See the [LUCI wiki] for
|
||||
background on Isolates and Swarming.
|
||||
|
||||
<!-- XXX: broken link -->
|
||||
[LUCI wiki]: https://github.com/luci/luci-py/wiki
|
||||
|
||||
### Adding a new isolate
|
||||
|
||||
1. Define your target using the `template("test")` template in
|
||||
[`src/testing/test.gni`][testing/test.gni]. See `test("gl_tests")` in
|
||||
[`src/gpu/BUILD.gn`][gpu/BUILD.gn] for an example. For a more complex
|
||||
example which invokes a series of scripts which finally launches the
|
||||
browser, see [`src/chrome/telemetry_gpu_test.isolate`][telemetry_gpu_test.isolate].
|
||||
2. Add an entry to [`src/testing/buildbot/gn_isolate_map.pyl`][gn_isolate_map.pyl] that refers to
|
||||
your target. Find a similar target to yours in order to determine the
|
||||
`type`. The type is referenced in [`src/tools/mb/mb_config.pyl`][mb_config.pyl].
|
||||
|
||||
[testing/test.gni]: https://chromium.googlesource.com/chromium/src/+/master/testing/test.gni
|
||||
[gpu/BUILD.gn]: https://chromium.googlesource.com/chromium/src/+/master/gpu/BUILD.gn
|
||||
<!-- XXX: broken link -->
|
||||
[telemetry_gpu_test.isolate]: https://chromium.googlesource.com/chromium/src/+/master/chrome/telemetry_gpu_test.isolate
|
||||
[gn_isolate_map.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/gn_isolate_map.pyl
|
||||
[mb_config.pyl]: https://chromium.googlesource.com/chromium/src/+/master/tools/mb/mb_config.pyl
|
||||
|
||||
At this point you can build and upload your isolate to the isolate server.
|
||||
|
||||
See [Isolated Testing for SWEs] for the most up-to-date instructions. These
|
||||
instructions are a copy which show how to run an isolate that's been uploaded
|
||||
to the isolate server on your local machine rather than on Swarming.
|
||||
|
||||
[Isolated Testing for SWEs]: https://www.chromium.org/developers/testing/isolated-testing/for-swes
|
||||
|
||||
If `cd`'d into `src/`:
|
||||
|
||||
1. `./tools/mb/mb.py isolate //out/Release [target name]`
|
||||
* For example: `./tools/mb/mb.py isolate //out/Release angle_end2end_tests`
|
||||
1. `python tools/swarming_client/isolate.py batcharchive -I https://isolateserver.appspot.com out/Release/[target name].isolated.gen.json`
|
||||
* For example: `python tools/swarming_client/isolate.py batcharchive -I https://isolateserver.appspot.com out/Release/angle_end2end_tests.isolated.gen.json`
|
||||
1. This will write a hash to stdout. You can run it via:
|
||||
`python tools/swarming_client/run_isolated.py -I https://isolateserver.appspot.com -s [HASH] -- [any additional args for the isolate]`
|
||||
|
||||
See the section below on [isolate server credentials](#Isolate-server-credentials).
|
||||
|
||||
### Adding your new isolate to the tests that are run on the bots
|
||||
|
||||
See [Adding new steps to the GPU bots] for details on this process.
|
||||
|
||||
[Adding new steps to the GPU bots]: gpu_testing.md#Adding-new-steps-to-the-GPU-Bots
|
||||
|
||||
## Relevant files that control the operation of the GPU bots
|
||||
|
||||
In the [tools/build] workspace:
|
||||
|
||||
* [masters/master.chromium.gpu] and [masters/master.chromium.gpu.fyi]:
|
||||
* builders.pyl in these two directories defines the bots that show up on
|
||||
the waterfall. If you are adding a new bot, you need to add it to
|
||||
builders.pyl and use go/bug-a-trooper to request a restart of either
|
||||
master.chromium.gpu or master.chromium.gpu.fyi.
|
||||
* Only changes under masters/ require a waterfall restart. All other
|
||||
changes – for example, to scripts/slave/ in this workspace, or the
|
||||
Chromium workspace – do not require a master restart (and go live the
|
||||
minute they are committed).
|
||||
* `scripts/slave/recipe_modules/chromium_tests/`:
|
||||
* <code>[chromium_gpu.py]</code> and
|
||||
<code>[chromium_gpu_fyi.py]</code> define the following for
|
||||
each builder and tester:
|
||||
* How the workspace is checked out (e.g., this is where top-of-tree
|
||||
ANGLE is specified)
|
||||
* The build configuration (e.g., this is where 32-bit vs. 64-bit is
|
||||
specified)
|
||||
* Various gclient defines (like compiling in the hardware-accelerated
|
||||
video codecs, and enabling compilation of certain tests, like the
|
||||
dEQP tests, that can't be built on all of the Chromium builders)
|
||||
* Note that the GN configuration of the bots is also controlled by
|
||||
<code>[mb_config.pyl]</code> in the Chromium workspace; see below.
|
||||
* <code>[trybots.py]</code> defines how try bots *mirror* one or more
|
||||
waterfall bots.
|
||||
* The concept of try bots mirroring waterfall bots ensures there are
|
||||
no differences in behavior between the waterfall bots and the try
|
||||
bots. This helps ensure that a CL will not pass the commit queue
|
||||
and then break on the waterfall.
|
||||
* This file defines the behavior of the following GPU-related try
|
||||
bots:
|
||||
* `linux_chromium_rel_ng`, `mac_chromium_rel_ng`, and
|
||||
`win_chromium_rel_ng`, which run against every Chromium CL, and
|
||||
which mirror the behavior of bots on the chromium.gpu
|
||||
waterfall.
|
||||
* The ANGLE try bots, which run against ANGLE CLs, and mirror the
|
||||
behavior of the chromium.gpu.fyi waterfall (including using
|
||||
top-of-tree ANGLE, and running additional tests not run by the
|
||||
regular Chromium try bots)
|
||||
* The optional GPU try servers `linux_optional_gpu_tests_rel`,
|
||||
`mac_optional_gpu_tests_rel` and
|
||||
`win_optional_gpu_tests_rel`, which are triggered manually and
|
||||
run some tests which can't be run on the regular Chromium try
|
||||
servers mainly due to lack of hardware capacity.
|
||||
|
||||
[tools/build]: https://chromium.googlesource.com/chromium/tools/build/
|
||||
[masters/master.chromium.gpu]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.chromium.gpu/
|
||||
[masters/master.chromium.gpu.fyi]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.chromium.gpu.fyi/
|
||||
[chromium_gpu.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipe_modules/chromium_tests/chromium_gpu.py
|
||||
[chromium_gpu_fyi.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipe_modules/chromium_tests/chromium_gpu_fyi.py
|
||||
[trybots.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipe_modules/chromium_tests/trybots.py
|
||||
|
||||
In the [chromium/src] workspace:
|
||||
|
||||
* [src/testing/buildbot]:
|
||||
* <code>[chromium.gpu.json]</code> and
|
||||
<code>[chromium.gpu.fyi.json]</code> define which steps are run on
|
||||
which bots. These files are autogenerated. Don't modify them directly!
|
||||
* <code>[gn_isolate_map.pyl]</code> defines all of the isolates' behavior in the GN
|
||||
build.
|
||||
* [`src/tools/mb/mb_config.pyl`][mb_config.pyl]
|
||||
* Defines the GN arguments for all of the bots.
|
||||
* [`src/content/test/gpu/generate_buildbot_json.py`][generate_buildbot_json.py]
|
||||
* The generator script for `chromium.gpu.json` and
|
||||
`chromium.gpu.fyi.json`. It defines on which GPUs various tests run.
|
||||
* It's completely self-contained and should hopefully be fairly
|
||||
comprehensible.
|
||||
* When modifying this script, don't forget to also run it, to regenerate
|
||||
the JSON files.
|
||||
* See [Adding new steps to the GPU bots] for more details.
|
||||
|
||||
[chromium/src]: https://chromium.googlesource.com/chromium/src/
|
||||
[src/testing/buildbot]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot
|
||||
[chromium.gpu.json]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.json
|
||||
[chromium.gpu.fyi.json]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.fyi.json
|
||||
[gn_isolate_map.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/gn_isolate_map.pyl
|
||||
[mb_config.pyl]: https://chromium.googlesource.com/chromium/src/+/master/tools/mb/mb_config.pyl
|
||||
[generate_buildbot_json.py]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/generate_buildbot_json.py
|
||||
|
||||
In the [infradata/config] workspace (Google internal only, sorry):
|
||||
|
||||
* [configs/chromium-swarm/bots.cfg]
|
||||
* Defines a `Chrome-GPU` Swarming pool which contains most of the
|
||||
specialized hardware: as of this writing, the Windows and Linux NVIDIA
|
||||
bots, the Windows AMD bots, and the MacBook Pros with NVIDIA and AMD
|
||||
GPUs. New GPU hardware should be added to this pool.
|
||||
|
||||
[infradata/config]: https://chrome-internal.googlesource.com/infradata/config
|
||||
[configs/chromium-swarm/bots.cfg]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/bots.cfg
|
||||
|
||||
## Walkthroughs of various maintenance scenarios
|
||||
|
||||
This section describes various common scenarios that might arise when
|
||||
maintaining the GPU bots, and how they'd be addressed.
|
||||
|
||||
### How to add a new test or an entire new step to the bots
|
||||
|
||||
This is described in [Adding new tests to the GPU bots].
|
||||
|
||||
[Adding new tests to the GPU bots]: https://www.chromium.org/developers/testing/gpu-testing/#TOC-Adding-New-Tests-to-the-GPU-Bots
|
||||
|
||||
### How to add a new bot
|
||||
|
||||
The first decision point when adding a new GPU bot is whether it is a one-off
|
||||
piece of hardware, or one which is expected to be scaled up at some point. If
|
||||
it's a one-off piece of hardware, it can be added to the chromium.gpu.fyi
|
||||
waterfall as a non-swarmed test machine. If it's expected to be scaled up at
|
||||
some point, the hardware should be added to the swarming pool. These two
|
||||
scenarios are described in more detail below.
|
||||
|
||||
#### How to add a new, non-swarmed, physical bot to the chromium.gpu.fyi waterfall
|
||||
|
||||
1. Work with the Chrome Infrastructure Labs team to get the hardware deployed
|
||||
so it can talk to the chromium.gpu.fyi master.
|
||||
1. Create a CL in the build workspace which:
|
||||
1. Add the new machine to
|
||||
[`masters/master.chromium.gpu.fyi/builders.pyl`][master.chromium.gpu.fyi/builders.pyl].
|
||||
1. Add the new machine to
|
||||
[`scripts/slave/recipe_modules/chromium_tests/chromium_gpu_fyi.py`][chromium_gpu_fyi.py].
|
||||
Set the `enable_swarming` property to `False`.
|
||||
1. Retrain recipe expectations
|
||||
(`scripts/slave/recipes.py --use-bootstrap test train`) and add the
|
||||
newly created JSON file(s) corresponding to the new machines to your CL.
|
||||
1. Create a CL in the Chromium workspace to:
|
||||
1. Add the new machine to
|
||||
[`src/content/test/gpu/generate_buildbot_json.py`][generate_buildbot_json.py].
|
||||
Make sure to set the `swarming` property to `False`.
|
||||
1. If the machine runs GN, add a description to
|
||||
[`src/tools/mb/mb_config.pyl`][mb_config.pyl].
|
||||
1. Once the build workspace CL lands, use go/bug-a-trooper (or contact kbr@)
|
||||
to schedule a restart of the chromium.gpu.fyi waterfall. This is only
|
||||
necessary when modifying files under the masters/ directory. A reboot of
|
||||
the machine may be needed once the waterfall has been restarted in order to
|
||||
make it connect properly.
|
||||
1. The CLs from (2) and (3) can land in either order, though it is preferable
|
||||
to land the Chromium-side CL first so that the machine knows what tests to
|
||||
run the first time it boots up.
|
||||
|
||||
[master.chromium.gpu.fyi/builders.pyl]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.chromium.gpu.fyi/builders.pyl
|
||||
|
||||
#### How to add a new swarmed bot to the chromium.gpu.fyi waterfall
|
||||
|
||||
When deploying a new GPU configuration, it should be added to the
|
||||
chromium.gpu.fyi waterfall first. The chromium.gpu waterfall should be reserved
|
||||
for those GPUs which are tested on the commit queue. (Some of the bots violate
|
||||
this rule – namely, the Debug bots – though we should strive to eliminate these
|
||||
differences.) Once the new configuration is ready to be fully deployed on
|
||||
tryservers, bots can be added to the chromium.gpu waterfall, and the tryservers
|
||||
changed to mirror them.
|
||||
|
||||
In order to add Release and Debug waterfall bots for a new configuration,
|
||||
experience has shown that at least 4 physical machines are needed in the
|
||||
swarming pool. The reason is that the tests all run in parallel on the Swarming
|
||||
cluster, so the load induced on the swarming bots is higher than it would be
|
||||
for a non-swarmed bot that executes its tests serially.
|
||||
|
||||
With these prerequisites, these are the steps to add a new swarmed bot.
|
||||
(Actually, pair of bots -- Release and Debug.)
|
||||
|
||||
1. Work with the Chrome Infrastructure Labs team to get the (minimum 4)
|
||||
physical machines added to the Swarming pool. Use
|
||||
[chromium-swarm.appspot.com] or `src/tools/swarming_client/swarming.py bots`
|
||||
to determine the PCI IDs of the GPUs in the bots. (These instructions will
|
||||
need to be updated for Android bots which don't have PCI buses.)
|
||||
1. Make sure to add these new machines to the Chrome-GPU Swarming pool by
|
||||
creating a CL against [`configs/chromium-swarm/bots.cfg`][bots.cfg] in
|
||||
the [infradata/config] workspace.
|
||||
1. File a Chrome Infrastructure Labs ticket requesting 2 virtual machines for
|
||||
the testers. These need to match the OS of the physical machines and
|
||||
builders because of limitations in the scripts which transfer builds from
|
||||
the builder to the tester; see [this feature
|
||||
request](http://crbug.com/581953). For example, if you're adding a "Windows
|
||||
7 CoolNewGPUType" tester, you'll need 2 Windows VMs.
|
||||
1. Once the VMs are ready, create a CL in the build workspace which:
|
||||
1. Adds the new VMs as the Release and Debug bots in
|
||||
[`master.chromium.gpu.fyi/builders.pyl`][master.chromium.gpu.fyi/builders.pyl].
|
||||
1. Adds the new VMs to [`chromium_gpu_fyi.py`][chromium_gpu_fyi.py]. Make
|
||||
sure to set the `enable_swarming` and `serialize_tests` properties to
|
||||
`True`. Double-check the `parent_buildername` property for each. It
|
||||
must match the Release/Debug flavor of the builder.
|
||||
1. Retrain recipe expectations
|
||||
(`scripts/slave/recipes.py --use-bootstrap test train`) and add the
|
||||
newly created JSON file(s) corresponding to the new machines to your CL.
|
||||
1. Create a CL in the Chromium workspace which:
|
||||
1. Adds the new machine to
|
||||
`src/content/test/gpu/generate_buildbot_json.py`.
|
||||
1. The swarming dimensions are crucial. These must match the GPU and
|
||||
OS type of the physical hardware in the Swarming pool. This is what
|
||||
causes the VMs to spawn their tests on the correct hardware. Make
|
||||
sure to use the Chrome-GPU pool, and that the new machines were
|
||||
specifically added to that pool.
|
||||
1. Make sure to set the `swarming` property to `True` for both the
|
||||
Release and Debug bots.
|
||||
1. Make triply sure that there are no collisions between the new
|
||||
hardware you're adding and hardware already in the Swarming pool.
|
||||
For example, it used to be the case that all of the Windows NVIDIA
|
||||
bots ran the same OS version. Later, the Windows 8 flavor bots were
|
||||
added. In order to avoid accidentally running tests on Windows 8
|
||||
when Windows 7 was intended, the OS in the swarming dimensions of
|
||||
the Win7 bots had to be changed from `win` to
|
||||
`Windows-2008ServerR2-SP1` (the Win7-like flavor running in our
|
||||
data center). Similarly, the Win8 bots had to have a very precise
|
||||
OS description (`Windows-2012ServerR2-SP0`).
|
||||
1. If the machine runs GN, adds a description to
|
||||
[`src/tools/mb/mb_config.pyl`][mb_config.pyl].
|
||||
1. Once the tools/build CL lands, use go/bug-a-trooper (or contact kbr@) to
|
||||
schedule a restart of the chromium.gpu.fyi waterfall. This is only
|
||||
necessary when modifying files under the masters/ directory. A reboot of
|
||||
the VMs may be needed once the waterfall has been restarted in order to
|
||||
make them connect properly.
|
||||
1. The CLs from (3) and (4) can land in either order, though it is preferable
|
||||
to land the Chromium-side CL first so that the machine knows what tests to
|
||||
run the first time it boots up.
|
||||
|
||||
[bots.cfg]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/bots.cfg
|
||||
[infradata/config]: https://chrome-internal.googlesource.com/infradata/config/
|
||||
|
||||
#### How to start running tests on a new GPU type on an existing try bot
|
||||
|
||||
Let's say that you want to cause the `win_chromium_rel_ng` try bot to run tests
|
||||
on CoolNewGPUType in addition to the types it currently runs (as of this
|
||||
writing, NVIDIA and AMD). To do this:
|
||||
|
||||
1. Make sure there is enough hardware capacity. Unfortunately, tools to report
|
||||
utilization of the Swarming pool are still being developed, but a
|
||||
back-of-the-envelope estimate is that you will need a minimum of 30
|
||||
machines in the Swarming pool to run the current set of GPU tests on the
|
||||
tryservers. We estimate that 90 machines will be needed in order to
|
||||
additionally run the WebGL 2.0 conformance tests. Plan for the larger
|
||||
capacity, as it's desired to run the larger test suite on as many
|
||||
configurations as possible.
|
||||
2. Deploy Release and Debug testers on the chromium.gpu waterfall, following
|
||||
the instructions for the chromium.gpu.fyi waterfall above. You will also
|
||||
need to temporarily add suppressions to
|
||||
[`tests/masters_recipes_test.py`][tests/masters_recipes_test.py] for these
|
||||
new testers since they aren't yet covered by try bots and are going on a
|
||||
non-FYI waterfall. Make sure these run green for a day or two before
|
||||
proceeding.
|
||||
3. Create a CL in the tools/build workspace, adding the new Release tester
|
||||
to `win_chromium_rel_ng`'s `bot_ids` list
|
||||
in `scripts/slave/recipe_modules/chromium_tests/trybots.py`. Rerun
|
||||
`scripts/slave/recipes.py --use-bootstrap test train`.
|
||||
4. Once the CL in (3) lands, the commit queue will **immediately** start
|
||||
running tests on the CoolNewGPUType configuration. Be vigilant and make
|
||||
sure that tryjobs are green. If they are red for any reason, revert the CL
|
||||
and figure out offline what went wrong.
|
||||
|
||||
[tests/masters_recipes_test.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/tests/masters_recipes_test.py
|
||||
|
||||
#### How to add a new optional try bot
|
||||
|
||||
The "optional" GPU try bots are a concession to the reality that there are some
|
||||
long-running GPU test suites that simply can not run against every Chromium CL.
|
||||
They run some additional tests that are usually run only on the
|
||||
chromium.gpu.fyi waterfall. Some of these tests, like the WebGL 2.0 conformance
|
||||
suite, are intended to be run on the normal try bots once hardware capacity is
|
||||
available. Some are not intended to ever run on the normal try bots.
|
||||
|
||||
The optional try bots are a little different because they mirror waterfall bots
|
||||
that don't actually exist. The waterfall bots' specifications exist only to
|
||||
tell the optional try bots which tests to run.
|
||||
|
||||
Let's say that you intended to add a new such optional try bot on Windows. Call
|
||||
it `win_new_optional_tests_rel` for example. Now, if you wanted to just add
|
||||
this GPU type to the existing `win_optional_gpu_tests_rel` try bot, you'd
|
||||
just follow the instructions above
|
||||
([How to start running tests on a new GPU type on an existing try bot](#How-to-start-running-tests-on-a-new-GPU-type-on-an-existing-try-bot)). The steps below describe how to spin up
|
||||
an entire new optional try bot.
|
||||
|
||||
1. Make sure that you have some swarming capacity for the new GPU type. Since
|
||||
it's not running against all Chromium CLs you don't need the recommended 30
|
||||
minimum bots, though ~10 would be good.
|
||||
1. Create a CL in the Chromium workspace:
|
||||
1. Add your new bot (for example, "Optional Win7 Release
|
||||
(CoolNewGPUType)") to the chromium.gpu.fyi waterfall in
|
||||
[generate_buildbot_json.py]. (Note, this is a bad example: the
|
||||
"optional" bots have special semantics in this script. You'd probably
|
||||
want to define some new category of bot if you didn't intend to add
|
||||
this to `win_optional_gpu_tests_rel`.)
|
||||
1. Re-run the script to regenerate the JSON files.
|
||||
1. Land the above CL.
|
||||
1. Create a CL in the tools/build workspace:
|
||||
1. Modify `masters/master.tryserver.chromium.win`'s [master.cfg] and
|
||||
[slaves.cfg] to add the new tryserver. Follow the pattern for the
|
||||
existing `win_optional_gpu_tests_rel` tryserver. Namely, add the new
|
||||
entry to master.cfg, and add the new tryserver to the
|
||||
`optional_builders` list in `slaves.cfg`.
|
||||
1. Modify [`chromium_gpu_fyi.py`][chromium_gpu_fyi.py] to add the new
|
||||
"Optional Win7 Release (CoolNewGPUType)" entry.
|
||||
1. Modify [`trybots.py`][trybots.py] to add
|
||||
the new `win_new_optional_tests_rel` try bot, mirroring "Optional
|
||||
Win7 Release (CoolNewGPUType)".
|
||||
1. Land the above CL and request an off-hours restart of the
|
||||
tryserver.chromium.win waterfall.
|
||||
1. Now you can send CLs to the new bot with:
|
||||
`git cl try -m tryserver.chromium.win -b win_new_optional_tests_rel`
|
||||
|
||||
[master.cfg]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.tryserver.chromium.win/master.cfg
|
||||
[slaves.cfg]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.tryserver.chromium.win/slaves.cfg
|
||||
|
||||
#### How to test and deploy a driver update
|
||||
|
||||
Let's say that you want to roll out an update to the graphics drivers on one of
|
||||
the configurations like the Win7 NVIDIA bots. The responsible way to do this is
|
||||
to run the new driver on one of the waterfalls for a day or two to make sure
|
||||
the tests are running reliably green before rolling out the driver update
|
||||
everywhere. To do this:
|
||||
|
||||
1. Work with the Chrome Infrastructure Labs team to deploy a single,
|
||||
non-swarmed, physical machine on the chromium.gpu.fyi waterfall running the
|
||||
new driver. The OS and GPU should exactly match the configuration you
|
||||
intend to upgrade. See
|
||||
[How to add a new, non-swarmed, physical bot to the chromium.gpu.fyi waterfall](#How-to-add-a-new_non-swarmed_physical-bot-to-the-chromium_gpu_fyi-waterfall).
|
||||
2. Hopefully, the new machine will pass the pixel tests. If it doesn't, then
|
||||
unfortunately, it'll be necessary to follow the instructions on
|
||||
[updating the pixel tests] to temporarily suppress the failures on this
|
||||
particular configuration. Keep the time window for these test suppressions
|
||||
as narrow as possible.
|
||||
3. Watch the new machine for a day or two to make sure it's stable.
|
||||
4. When it is, ask the Chrome Infrastructure Labs team to roll out the driver
|
||||
update across all of the similarly configured bots in the swarming pool.
|
||||
5. If necessary, update pixel test expectations and remove the suppressions
|
||||
added above.
|
||||
6. Prepare and land a CL removing the temporary machine from the
|
||||
chromium.gpu.fyi waterfall. Request a waterfall restart.
|
||||
7. File a ticket with the Chrome Infrastructure Labs team to reclaim the
|
||||
temporary machine.
|
||||
|
||||
Note that with recent improvements to Swarming, in particular [this
|
||||
RFE](https://github.com/luci/luci-py/issues/253) and others, these steps are no
|
||||
longer strictly necessary – it's possible to target Swarming jobs at a
|
||||
particular driver version. If
|
||||
[`generate_buildbot_json.py`][generate_buildbot_json.py] were improved to be
|
||||
more specific about the driver version on the various bots, then the machines
|
||||
with the new drivers could simply be added to the Swarming pool, and this
|
||||
process could be a lot simpler. Patches welcome. :)
|
||||
|
||||
[updating the pixel tests]: https://www.chromium.org/developers/testing/gpu-testing/#TOC-Updating-and-Adding-New-Pixel-Tests-to-the-GPU-Bots
|
||||
|
||||
## Credentials for various servers
|
||||
|
||||
Working with the GPU bots requires credentials to various services: the isolate
|
||||
server, the swarming server, and cloud storage.
|
||||
|
||||
### Isolate server credentials
|
||||
|
||||
To upload and download isolates you must first authenticate to the isolate
|
||||
server. From a Chromium checkout, run:
|
||||
|
||||
* `./src/tools/swarming_client/auth.py login
|
||||
--service=https://isolateserver.appspot.com`
|
||||
|
||||
This will open a web browser to complete the authentication flow. A @google.com
|
||||
email address is required in order to properly authenticate.
|
||||
|
||||
To test your authentication, find a hash for a recent isolate. Consult the
|
||||
instructions on [Running Binaries from the Bots Locally] to find a random hash
|
||||
from a target like `gl_tests`. Then run the following:
|
||||
|
||||
[Running Binaries from the Bots Locally]: https://www.chromium.org/developers/testing/gpu-testing#TOC-Running-Binaries-from-the-Bots-Locally
|
||||
|
||||
If authentication succeeded, this will silently download a file called
|
||||
`delete_me` into the current working directory. If it failed, the script will
|
||||
report multiple authentication errors. In this case, use the following command
|
||||
to log out and then try again:
|
||||
|
||||
* `./src/tools/swarming_client/auth.py logout
|
||||
--service=https://isolateserver.appspot.com`
|
||||
|
||||
### Swarming server credentials
|
||||
|
||||
The swarming server uses the same `auth.py` script as the isolate server. You
|
||||
will need to authenticate if you want to manually download the results of
|
||||
previous swarming jobs, trigger your own jobs, or run `swarming.py reproduce`
|
||||
to re-run a remote job on your local workstation. Follow the instructions
|
||||
above, replacing the service with `https://chromium-swarm.appspot.com`.
|
||||
|
||||
### Cloud storage credentials
|
||||
|
||||
Authentication to Google Cloud Storage is needed for a couple of reasons:
|
||||
uploading pixel test results to the cloud, and potentially uploading and
|
||||
downloading builds as well, at least in Debug mode. Use the copy of gsutil in
|
||||
`depot_tools/third_party/gsutil/gsutil`, and follow the [Google Cloud Storage
|
||||
instructions] to authenticate. You must use your @google.com email address and
|
||||
be a member of the Chrome GPU team in order to receive read-write access to the
|
||||
appropriate cloud storage buckets. Roughly:
|
||||
|
||||
1. Run `gsutil config`
|
||||
2. Copy/paste the URL into your browser
|
||||
3. Log in with your @google.com account
|
||||
4. Allow the app to access the information it requests
|
||||
5. Copy-paste the resulting key back into your Terminal
|
||||
6. Press "enter" when prompted for a project-id (i.e., leave it empty)
|
||||
|
||||
At this point you should be able to write to the cloud storage bucket.
|
||||
|
||||
Navigate to
|
||||
<https://console.developers.google.com/storage/chromium-gpu-archive> to view
|
||||
the contents of the cloud storage bucket.
|
||||
|
||||
[Google Cloud Storage instructions]: https://developers.google.com/storage/docs/gsutil
|
BIN
docs/gpu/images/wrangler.png
Normal file
BIN
docs/gpu/images/wrangler.png
Normal file
Binary file not shown.
After ![]() (image error) Size: 12 KiB |
298
docs/gpu/pixel_wrangling.md
Normal file
298
docs/gpu/pixel_wrangling.md
Normal file
@ -0,0 +1,298 @@
|
||||
# GPU Bots & Pixel Wrangling
|
||||
|
||||

|
||||
|
||||
(December 2017: presentation on GPU bots and pixel wrangling: see [slides].)
|
||||
|
||||
GPU Pixel Wrangling is the process of keeping various GPU bots green. On the
|
||||
GPU bots, tests run on physical hardware with real GPUs, not in VMs like the
|
||||
majority of the bots on the Chromium waterfall.
|
||||
|
||||
[slides]: https://docs.google.com/presentation/d/1sZjyNe2apUhwr5sinRfPs7eTzH-3zO0VQ-Cj-8DlEDQ/edit?usp=sharing
|
||||
|
||||
[TOC]
|
||||
|
||||
## Fleet Status
|
||||
|
||||
The following links (sorry, Google employees only) show the status of various
|
||||
GPU bots in the fleet.
|
||||
|
||||
Primary configurations:
|
||||
|
||||
* [Windows 10 Quadro P400 Pool](http://shortn/_dmtaFfY2Jq)
|
||||
* [Windows 10 Intel HD 630 Pool](http://shortn/_QsoGIGIFYd)
|
||||
* [Linux Quadro P400 Pool](http://shortn/_fNgNs1uROQ)
|
||||
* [Linux Intel HD 630 Pool](http://shortn/_dqEGjCGMHT)
|
||||
* [Mac AMD Retina 10.12.6 GPU Pool](http://shortn/_BcrVmfRoSo)
|
||||
* [Mac Mini Chrome Pool](http://shortn/_Ru8NESapPM)
|
||||
* [Android Nexus 5X Chrome Pool](http://shortn/_G3j7AVmuNR)
|
||||
|
||||
Secondary configurations:
|
||||
|
||||
* [Windows 7 Quadro P400 Pool](http://shortn/_cuxSKC15UX)
|
||||
* [Windows AMD R7 240 GPU Pool](http://shortn/_XET7RTMHQm)
|
||||
* [Mac NVIDIA Retina 10.12.6 GPU Pool](http://shortn/_jQWG7W71Ek)
|
||||
|
||||
## GPU Bots' Waterfalls
|
||||
|
||||
The waterfalls work much like any other; see the [Tour of the Chromium Buildbot
|
||||
Waterfall] for a more detailed explanation of how this is laid out. We have
|
||||
more subtle configurations because the GPU matters, not just the OS and release
|
||||
v. debug. Hence we have Windows Nvidia Release bots, Mac Intel Debug bots, and
|
||||
so on. The waterfalls we’re interested in are:
|
||||
|
||||
* [Chromium GPU]
|
||||
* Various operating systems, configurations, GPUs, etc.
|
||||
* [Chromium GPU FYI]
|
||||
* These bots run less-standard configurations like Windows with AMD GPUs,
|
||||
Linux with Intel GPUs, etc.
|
||||
* These bots build with top of tree ANGLE rather than the `DEPS` version.
|
||||
* The [ANGLE tryservers] help ensure that these bots stay green. However,
|
||||
it is possible that due to ANGLE changes these bots may be red while
|
||||
the chromium.gpu bots are green.
|
||||
* The [ANGLE Wrangler] is on-call to help resolve ANGLE-related breakage
|
||||
on this watefall.
|
||||
* To determine if a different ANGLE revision was used between two builds,
|
||||
compare the `got_angle_revision` buildbot property on the GPU builders
|
||||
or `parent_got_angle_revision` on the testers. This revision can be
|
||||
used to do a `git log` in the `third_party/angle` repository.
|
||||
|
||||
<!-- TODO(kainino): update link when the page is migrated -->
|
||||
[Tour of the Chromium Buildbot Waterfall]: http://www.chromium.org/developers/testing/chromium-build-infrastructure/tour-of-the-chromium-buildbot
|
||||
[Chromium GPU]: https://ci.chromium.org/p/chromium/g/chromium.gpu/console?reload=120
|
||||
[Chromium GPU FYI]: https://ci.chromium.org/p/chromium/g/chromium.gpu.fyi/console?reload=120
|
||||
[ANGLE tryservers]: https://build.chromium.org/p/tryserver.chromium.angle/waterfall
|
||||
<!-- TODO(kainino): update link when the page is migrated -->
|
||||
[ANGLE Wrangler]: https://sites.google.com/a/chromium.org/dev/developers/how-tos/angle-wrangling
|
||||
|
||||
## Test Suites
|
||||
|
||||
The bots run several test suites. The majority of them have been migrated to
|
||||
the Telemetry harness, and are run within the full browser, in order to better
|
||||
test the code that is actually shipped. As of this writing, the tests included:
|
||||
|
||||
* Tests using the Telemetry harness:
|
||||
* The WebGL conformance tests: `webgl_conformance_integration_test.py`
|
||||
* A Google Maps test: `maps_integration_test.py`
|
||||
* Context loss tests: `context_lost_integration_test.py`
|
||||
* Depth capture tests: `depth_capture_integration_test.py`
|
||||
* GPU process launch tests: `gpu_process_integration_test.py`
|
||||
* Hardware acceleration validation tests:
|
||||
`hardware_accelerated_feature_integration_test.py`
|
||||
* Pixel tests validating the end-to-end rendering pipeline:
|
||||
`pixel_integration_test.py`
|
||||
* Stress tests of the screenshot functionality other tests use:
|
||||
`screenshot_sync_integration_test.py`
|
||||
* `angle_unittests`: see `src/gpu/gpu.gyp`
|
||||
* drawElements tests (on the chromium.gpu.fyi waterfall): see
|
||||
`src/third_party/angle/src/tests/BUILD.gn`
|
||||
* `gles2_conform_test` (requires internal sources): see
|
||||
`src/gpu/gles2_conform_support/gles2_conform_test.gyp`
|
||||
* `gl_tests`: see `src/gpu/BUILD.gn`
|
||||
* `gl_unittests`: see `src/ui/gl/BUILD.gn`
|
||||
|
||||
And more. See `src/content/test/gpu/generate_buildbot_json.py` for the
|
||||
complete description of bots and tests.
|
||||
|
||||
Additionally, the Release bots run:
|
||||
|
||||
* `tab_capture_end2end_tests:` see
|
||||
`src/chrome/browser/extensions/api/tab_capture/tab_capture_apitest.cc` and
|
||||
`src/chrome/browser/extensions/api/cast_streaming/cast_streaming_apitest.cc`
|
||||
|
||||
### More Details
|
||||
|
||||
More details about the bots' setup can be found on the [GPU Testing] page.
|
||||
|
||||
[GPU Testing]: https://sites.google.com/a/chromium.org/dev/developers/testing/gpu-testing
|
||||
|
||||
## Wrangling
|
||||
|
||||
### Prerequisites
|
||||
|
||||
1. Ideally a wrangler should be a Chromium committer. If you're on the GPU
|
||||
pixel wrangling rotation, there will be an email notifying you of the upcoming
|
||||
shift, and a calendar appointment.
|
||||
* If you aren't a committer, don't panic. It's still best for everyone on
|
||||
the team to become acquainted with the procedures of maintaining the
|
||||
GPU bots.
|
||||
* In this case you'll upload CLs to Gerrit to perform reverts (optionally
|
||||
using the new "Revert" button in the UI), and might consider using
|
||||
`TBR=` to speed through trivial and urgent CLs. In general, try to send
|
||||
all CLs through the commit queue.
|
||||
* Contact bajones, kainino, kbr, vmiura, zmo, or another member of the
|
||||
Chrome GPU team who's already a committer for help landing patches or
|
||||
reverts during your shift.
|
||||
2. Apply for [access to the bots].
|
||||
|
||||
[access to the bots]: https://sites.google.com/a/google.com/chrome-infrastructure/golo/remote-access?pli=1
|
||||
|
||||
### How to Keep the Bots Green
|
||||
|
||||
1. Watch for redness on the tree.
|
||||
1. [Sheriff-O-Matic now has support for the chromium.gpu.fyi waterfall]!
|
||||
1. The chromium.gpu bots are covered under Sheriff-O-Matic's [Chromium
|
||||
tab]. As pixel wrangler, ignore any non-GPU test failures in this tab.
|
||||
1. The bots are expected to be green all the time. Flakiness on these bots
|
||||
is neither expected nor acceptable.
|
||||
1. If a bot goes consistently red, it's necessary to figure out whether a
|
||||
recent CL caused it, or whether it's a problem with the bot or
|
||||
infrastructure.
|
||||
1. If it looks like a problem with the bot (deep problems like failing to
|
||||
check out the sources, the isolate server failing, etc.) notify the
|
||||
Chromium troopers and file a P1 bug with labels: Infra\>Labs,
|
||||
Infra\>Troopers and Internals\>GPU\>Testing. See the general [tree
|
||||
sheriffing page] for more details.
|
||||
1. Otherwise, examine the builds just before and after the redness was
|
||||
introduced. Look at the revisions in the builds before and after the
|
||||
failure was introduced.
|
||||
1. **File a bug** capturing the regression range and excerpts of any
|
||||
associated logs. Regressions should be marked P1. CC engineers who you
|
||||
think may be able to help triage the issue. Keep in mind that the logs
|
||||
on the bots expire after a few days, so make sure to add copies of
|
||||
relevant logs to the bug report.
|
||||
1. Use the `Hotlist=PixelWrangler` label to mark bugs that require the
|
||||
pixel wrangler's attention, so it's easy to find relevant bugs when
|
||||
handing off shifts.
|
||||
1. Study the regression range carefully. Use drover to revert any CLs
|
||||
which break the chromium.gpu bots. Use your judgment about
|
||||
chromium.gpu.fyi, since not all bots are covered by trybots. In the
|
||||
revert message, provide a clear description of what broke, links to
|
||||
failing builds, and excerpts of the failure logs, because the build
|
||||
logs expire after a few days.
|
||||
1. Make sure the bots are running jobs.
|
||||
1. Keep an eye on the console views of the various bots.
|
||||
1. Make sure the bots are all actively processing jobs. If they go offline
|
||||
for a long period of time, the "summary bubble" at the top may still be
|
||||
green, but the column in the console view will be gray.
|
||||
1. Email the Chromium troopers if you find a bot that's not processing
|
||||
jobs.
|
||||
1. Make sure the GPU try servers are in good health.
|
||||
1. The GPU try servers are no longer distinct bots on a separate
|
||||
waterfall, but instead run as part of the regular tryjobs on the
|
||||
Chromium waterfalls. The GPU tests run as part of the following
|
||||
tryservers' jobs:
|
||||
1. <code>[linux_chromium_rel_ng]</code> on the [luci.chromium.try]
|
||||
waterfall
|
||||
<!-- TODO(kainino): update link to luci.chromium.try -->
|
||||
1. <code>[mac_chromium_rel_ng]</code> on the [tryserver.chromium.mac]
|
||||
waterfall
|
||||
<!-- TODO(kainino): update link to luci.chromium.try -->
|
||||
1. <code>[win7_chromium_rel_ng]</code> on the [tryserver.chromium.win]
|
||||
waterfall
|
||||
1. The best tool to use to quickly find flakiness on the tryservers is the
|
||||
new [Chromium Try Flakes] tool. Look for the names of GPU tests (like
|
||||
maps_pixel_test) as well as the test machines (e.g.
|
||||
mac_chromium_rel_ng). If you see a flaky test, file a bug like [this
|
||||
one](http://crbug.com/444430). Also look for compile flakes that may
|
||||
indicate that a bot needs to be clobbered. Contact the Chromium
|
||||
sheriffs or troopers if so.
|
||||
1. Glance at these trybots from time to time and see if any GPU tests are
|
||||
failing frequently. **Note** that test failures are **expected** on
|
||||
these bots: individuals' patches may fail to apply, fail to compile, or
|
||||
break various tests. Look specifically for patterns in the failures. It
|
||||
isn't necessary to spend a lot of time investigating each individual
|
||||
failure. (Use the "Show: 200" link at the bottom of the page to see
|
||||
more history.)
|
||||
1. If the same set of tests are failing repeatedly, look at the individual
|
||||
runs. Examine the swarming results and see whether they're all running
|
||||
on the same machine. (This is the "Bot assigned to task" when clicking
|
||||
any of the test's shards in the build logs.) If they are, something
|
||||
might be wrong with the hardware. Use the [Swarming Server Stats] tool
|
||||
to drill down into the specific builder.
|
||||
1. If you see the same test failing in a flaky manner across multiple
|
||||
machines and multiple CLs, it's crucial to investigate why it's
|
||||
happening. [crbug.com/395914](http://crbug.com/395914) was one example
|
||||
of an innocent-looking Blink change which made it through the commit
|
||||
queue and introduced widespread flakiness in a range of GPU tests. The
|
||||
failures were also most visible on the try servers as opposed to the
|
||||
main waterfalls.
|
||||
1. Check if any pixel test failures are actual failures or need to be
|
||||
rebaselined.
|
||||
1. For a given build failing the pixel tests, click the "stdio" link of
|
||||
the "pixel" step.
|
||||
1. The output will contain a link of the form
|
||||
<http://chromium-browser-gpu-tests.commondatastorage.googleapis.com/view_test_results.html?242523_Linux_Release_Intel__telemetry>
|
||||
1. Visit the link to see whether the generated or reference images look
|
||||
incorrect.
|
||||
1. All of the reference images for all of the bots are stored in cloud
|
||||
storage under [chromium-gpu-archive/reference-images]. They are indexed
|
||||
by version number, OS, GPU vendor, GPU device, and whether or not
|
||||
antialiasing is enabled in that configuration. You can download the
|
||||
reference images individually to examine them in detail.
|
||||
1. Rebaseline pixel test reference images if necessary.
|
||||
1. Follow the [instructions on the GPU testing page].
|
||||
1. Alternatively, if absolutely necessary, you can use the [Chrome
|
||||
Internal GPU Pixel Wrangling Instructions] to delete just the broken
|
||||
reference images for a particular configuration.
|
||||
1. Update Telemetry-based test expectations if necessary.
|
||||
1. Most of the GPU tests are run inside a full Chromium browser, launched
|
||||
by Telemetry, rather than a Gtest harness. The tests and their
|
||||
expectations are contained in [src/content/test/gpu/gpu_tests/] . See
|
||||
for example <code>[webgl_conformance_expectations.py]</code>,
|
||||
<code>[gpu_process_expectations.py]</code> and
|
||||
<code>[pixel_expectations.py]</code>.
|
||||
1. See the header of the file a list of modifiers to specify a bot
|
||||
configuration. It is possible to specify OS (down to a specific
|
||||
version, say, Windows 7 or Mountain Lion), GPU vendor
|
||||
(NVIDIA/AMD/Intel), and a specific GPU device.
|
||||
1. The key is to maintain the highest coverage: if you have to disable a
|
||||
test, disable it only on the specific configurations it's failing. Note
|
||||
that it is not possible to discern between Debug and Release
|
||||
configurations.
|
||||
1. Mark tests failing or skipped, which will suppress flaky failures, only
|
||||
as a last resort. It is only really necessary to suppress failures that
|
||||
are showing up on the GPU tryservers, since failing tests no longer
|
||||
close the Chromium tree.
|
||||
1. Please read the section on [stamping out flakiness] for motivation on
|
||||
how important it is to eliminate flakiness rather than hiding it.
|
||||
1. For the remaining Gtest-style tests, use the [`DISABLED_`
|
||||
modifier][gtest-DISABLED] to suppress any failures if necessary.
|
||||
|
||||
[Sheriff-O-Matic now has support for the chromium.gpu.fyi waterfall]: https://sheriff-o-matic.appspot.com/chromium.gpu.fyi
|
||||
[Chromium tab]: https://sheriff-o-matic.appspot.com/chromium
|
||||
[tree sheriffing page]: https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs
|
||||
[linux_chromium_rel_ng]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_chromium_rel_ng
|
||||
[luci.chromium.try]: https://ci.chromium.org/p/chromium/g/luci.chromium.try/builders
|
||||
[mac_chromium_rel_ng]: https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_chromium_rel_ng/
|
||||
[tryserver.chromium.mac]: https://ci.chromium.org/p/chromium/g/tryserver.chromium.mac/builders
|
||||
[win7_chromium_rel_ng]: https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/
|
||||
[tryserver.chromium.win]: https://ci.chromium.org/p/chromium/g/tryserver.chromium.win/builders
|
||||
[Chromium Try Flakes]: http://chromium-try-flakes.appspot.com/
|
||||
<!-- TODO(kainino): link doesn't work, but is still included from chromium-swarm homepage so not removing it now -->
|
||||
[Swarming Server Stats]: https://chromium-swarm.appspot.com/stats
|
||||
[chromium-gpu-archive/reference-images]: https://console.developers.google.com/storage/chromium-gpu-archive/reference-images
|
||||
[instructions on the GPU testing page]: https://sites.google.com/a/chromium.org/dev/developers/testing/gpu-testing#TOC-Updating-and-Adding-New-Pixel-Tests-to-the-GPU-Bots
|
||||
[Chrome Internal GPU Pixel Wrangling Instructions]: https://sites.google.com/a/google.com/client3d/documents/chrome-internal-gpu-pixel-wrangling-instructions
|
||||
[src/content/test/gpu/gpu_tests/]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/
|
||||
[webgl_conformance_expectations.py]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/webgl_conformance_expectations.py
|
||||
[gpu_process_expectations.py]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/gpu_process_expectations.py
|
||||
[pixel_expectations.py]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/pixel_expectations.py
|
||||
[stamping out flakiness]: gpu_testing.md#Stamping-out-Flakiness
|
||||
[gtest-DISABLED]: https://github.com/google/googletest/blob/master/googletest/docs/AdvancedGuide.md#temporarily-disabling-tests
|
||||
|
||||
### When Bots Misbehave (SSHing into a bot)
|
||||
|
||||
1. See the [Chrome Internal GPU Pixel Wrangling Instructions] for information
|
||||
on ssh'ing in to the GPU bots.
|
||||
|
||||
[Chrome Internal GPU Pixel Wrangling Instructions]: https://sites.google.com/a/google.com/client3d/documents/chrome-internal-gpu-pixel-wrangling-instructions
|
||||
|
||||
### Reproducing WebGL conformance test failures locally
|
||||
|
||||
1. From the buildbot build output page, click on the failed shard to get to
|
||||
the swarming task page. Scroll to the bottom of the left panel for a
|
||||
command to run the task locally. This will automatically download the build
|
||||
and any other inputs needed.
|
||||
2. Alternatively, to run the test on a local build, pass the arguments
|
||||
`--browser=exact --browser-executable=/path/to/binary` to
|
||||
`content/test/gpu/run_gpu_integration_test.py`.
|
||||
Also see the [telemetry documentation].
|
||||
|
||||
[telemetry documentation]: https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/docs/run_benchmarks_locally.md
|
||||
|
||||
## Extending the GPU Pixel Wrangling Rotation
|
||||
|
||||
See the [Chrome Internal GPU Pixel Wrangling Instructions] for information on extending the rotation.
|
||||
|
||||
[Chrome Internal GPU Pixel Wrangling Instructions]: https://sites.google.com/a/google.com/client3d/documents/chrome-internal-gpu-pixel-wrangling-instructions
|
Reference in New Issue
Block a user