0

Port GPU documentation to Markdown

This ports the following wiki pages into markdown:
https://www.chromium.org/developers/testing/gpu-testing
https://www.chromium.org/developers/testing/gpu-testing/gpu-bot-details
https://www.chromium.org/developers/how-tos/gpu-wrangling
https://www.chromium.org/developers/how-tos/debugging-gpu-related-code

and updates *some* of the old outdated content.

Bug: 813153
Change-Id: Ic5f1b58659bbdb691343785cb18c50f4d55c177f
Reviewed-on: https://chromium-review.googlesource.com/987233
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#547060}
This commit is contained in:
Kai Ninomiya
2018-03-30 01:30:56 +00:00
committed by Commit Bot
parent 0e542189cc
commit a6429fb3a6
5 changed files with 1643 additions and 0 deletions

@ -0,0 +1,235 @@
# Debugging GPU related code
Chromium's GPU system is multi-process, which can make debugging it rather
difficult. See [GPU Command Buffer] for some of the nitty gitty. These are just
a few notes to help with debugging.
[TOC]
<!-- TODO(kainino): update link if the page moves -->
[GPU Command Buffer]: https://sites.google.com/a/chromium.org/dev/developers/design-documents/gpu-command-buffer
## Renderer Process Code
### `--enable-gpu-client-logging`
If you are trying to track down a bug in a GPU client process (compositing,
WebGL, Skia/Ganesh, Aura), then in a debug build you can use the
`--enable-gpu-client-logging` flag, which will show every GL call sent to the
GPU service process. (From the point of view of a GPU client, it's calling
OpenGL ES functions - but the real driver calls are made in the GPU process.)
```
[4782:4782:1219/141706:INFO:gles2_implementation.cc(1026)] [.WebGLRenderingContext] glUseProgram(3)
[4782:4782:1219/141706:INFO:gles2_implementation_impl_autogen.h(401)] [.WebGLRenderingContext] glGenBuffers(1, 0x7fffc9e1269c)
[4782:4782:1219/141706:INFO:gles2_implementation_impl_autogen.h(416)] 0: 1
[4782:4782:1219/141706:INFO:gles2_implementation_impl_autogen.h(23)] [.WebGLRenderingContext] glBindBuffer(GL_ARRAY_BUFFER, 1)
[4782:4782:1219/141706:INFO:gles2_implementation.cc(1313)] [.WebGLRenderingContext] glBufferData(GL_ARRAY_BUFFER, 36, 0x7fd268580120, GL_STATIC_DRAW)
[4782:4782:1219/141706:INFO:gles2_implementation.cc(2480)] [.WebGLRenderingContext] glEnableVertexAttribArray(0)
[4782:4782:1219/141706:INFO:gles2_implementation.cc(1140)] [.WebGLRenderingContext] glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, 0)
[4782:4782:1219/141706:INFO:gles2_implementation_impl_autogen.h(135)] [.WebGLRenderingContext] glClear(16640)
[4782:4782:1219/141706:INFO:gles2_implementation.cc(2490)] [.WebGLRenderingContext] glDrawArrays(GL_TRIANGLES, 0, 3)
```
### Checking about:gpu
The GPU process logs many errors and warnings. You can see these by navigating
to `about:gpu`. Logs appear at the bottom of the page. You can also see them
on standard output if Chromium is run from the command line on Linux/Mac.
On Windows, you need debugging tools (like VS, WinDbg, etc.) to connect to the
debug output stream.
**Note:** If `about:gpu` is telling you that your GPU is disabled and
hardware acceleration is unavailable, it might be a problem with your GPU being
unsupported. To override this and turn on hardware acceleration anyway, you can
use the `--ignore-gpu-blacklist` command line option when starting Chromium.
### Breaking on GL Error
In <code>[gles2_implementation.h]</code>, there is some code like this:
```cpp
// Set to 1 to have the client fail when a GL error is generated.
// This helps find bugs in the renderer since the debugger stops on the error.
#if DCHECK_IS_ON()
#if 0
#define GL_CLIENT_FAIL_GL_ERRORS
#endif
#endif
```
Change that `#if 0` to `#if 1`, build a debug build, then run in a debugger.
The debugger will break when any renderer code sees a GL error, and you should
be able to examine the call stack to find the issue.
[gles2_implementation.h]: https://chromium.googlesource.com/chromium/src/+/master/gpu/command_buffer/client/gles2_implementation.h
### Labeling your calls
The output of all of the errors, warnings and debug logs are prefixed. You can
set this prefix by calling `glPushGroupMarkerEXT`, `glPopGroupMarkerEXT` and
`glInsertEventMarkerEXT`. `glPushGroupMarkerEXT` appends a string to the end of
the current log prefix (think namespace in C++). `glPopGroupmarkerEXT` pops off
the last string appended. `glInsertEventMarkerEXT` sets a suffix for the
current string. Example:
```cpp
glPushGroupMarkerEXT(0, "Foo"); // -> log prefix = "Foo"
glInsertEventMarkerEXT(0, "This"); // -> log prefix = "Foo.This"
glInsertEventMarkerEXT(0, "That"); // -> log prefix = "Foo.That"
glPushGroupMarkerEXT(0, "Bar"); // -> log prefix = "Foo.Bar"
glInsertEventMarkerEXT(0, "Orange"); // -> log prefix = "Foo.Bar.Orange"
glInsertEventMarkerEXT(0, "Banana"); // -> log prefix = "Foo.Bar.Banana"
glPopGroupMarkerEXT(); // -> log prefix = "Foo.That"
```
### Making a reduced test case.
You can often make a simple OpenGL-ES-2.0-only C++ reduced test case that is
relatively quick to compile and test, by adding tests to the `gl_tests` target.
Those tests exist in `src/gpu/command_buffer/tests` and are made part of the
build in `src/gpu/gpu.gyp`. Build with `ninja -C out/Debug gl_tests`. All the
same command line options listed on this page will work with the `gl_tests`,
plus `--gtest_filter=NameOfTest` to run a specific test. Note the `gl_tests`
are not multi-process, so they probably won't help with race conditions, but
they do go through most of the same code and are much easier to debug.
### Debugging the renderer process
Given that Chrome starts many renderer processes I find it's easier if I either
have a remote webpage I can access or I make one locally and then use a local
server to serve it like `python -m SimpleHTTPServer`. Then
On Linux this works for me:
* `out/Debug/chromium --no-sandbox --renderer-cmd-prefix="xterm -e gdb
--args" http://localhost:8000/page-to-repro.html`
On OSX this works for me:
* `out/Debug/Chromium.app/Contents/MacOSX/Chromium --no-sandbox
--renderer-cmd-prefix="xterm -e gdb --args"
http://localhost:8000/page-to-repro.html`
On Windows I use `--renderer-startup-dialog` and then connect to the listed process.
Note 1: On Linux and OSX I use `cgdb` instead of `gdb`.
Note 2: GDB can take minutes to index symbol. To save time, you can precache
that computation by running `build/gdb-add-index out/Debug/chrome`.
## GPU Process Code
### `--enable-gpu-service-logging`
In a debug build, this will print all actual calls into the GL driver.
```
[5497:5497:1219/142413:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kEnableVertexAttribArray
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(905)] glEnableVertexAttribArray(0)
[5497:5497:1219/142413:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kVertexAttribPointer
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(1573)] glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, 0)
[5497:5497:1219/142413:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kClear
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(746)] glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE)
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(840)] glDepthMask(GL_TRUE)
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(900)] glEnable(GL_DEPTH_TEST)
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(1371)] glStencilMaskSeparate(GL_FRONT, 4294967295)
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(1371)] glStencilMaskSeparate(GL_BACK, 4294967295)
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(860)] glDisable(GL_STENCIL_TEST)
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(860)] glDisable(GL_CULL_FACE)
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(860)] glDisable(GL_SCISSOR_TEST)
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(900)] glEnable(GL_BLEND)
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(721)] glClear(16640)
[5497:5497:1219/142413:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kDrawArrays
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(870)] glDrawArrays(GL_TRIANGLES, 0, 3)
```
Note that GL calls into the driver are not currently prefixed (todo?). But, you
can tell from the commands logged which command, from which context caused the
following GL calls to be made.
Also note that client resource IDs are virtual IDs, so calls into the real GL
driver will not match (though some commands print the mapping). Examples:
```
[5497:5497:1219/142413:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kBindTexture
[5497:5497:1219/142413:INFO:gles2_cmd_decoder.cc(837)] [.WebGLRenderingContext] glBindTexture: client_id = 2, service_id = 10
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(662)] glBindTexture(GL_TEXTURE_2D, 10)
[5497:5497:1219/142413:ERROR:gles2_cmd_decoder.cc(3301)] [0052064A367F0000]cmd: kBindBuffer
[5497:5497:1219/142413:INFO:gles2_cmd_decoder.cc(837)] [0052064A367F0000] glBindBuffer: client_id = 2, service_id = 6
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(637)] glBindBuffer(GL_ARRAY_BUFFER, 6)
[5497:5497:1219/142413:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kBindFramebuffer
[5497:5497:1219/142413:INFO:gles2_cmd_decoder.cc(837)] [.WebGLRenderingContext] glBindFramebuffer: client_id = 1, service_id = 3
[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(652)] glBindFramebufferEXT(GL_FRAMEBUFFER, 3)
```
etc... so that you can see renderer process code would be using the client IDs
where as the gpu process is using the service IDs. This is useful for matching
up calls if you're dumping both client and service GL logs.
### `--enable-gpu-debugging`
In any build, this will call glGetError after each command
### `--enable-gpu-command-logging`
This will print the name of each GPU command before it is executed.
```
[5234:5234:1219/052139:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kBindBuffer
[5234:5234:1219/052139:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kBufferData
[5234:5234:1219/052139:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: SetToken
[5234:5234:1219/052139:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kEnableVertexAttribArray
[5234:5234:1219/052139:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kVertexAttribPointer
[5234:5234:1219/052139:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kClear
[5234:5234:1219/052139:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kDrawArrays
```
### Debugging in the GPU Process
Given the multi-processness of chromium it can be hard to debug both sides.
Turing on all the logging and having a small test case is useful. One minor
suggestion, if you have some idea where the bug is happening a call to some
obscure gl function like `glHint()` can give you a place to catch a command
being processed in the GPU process (put a break point on
`gpu::gles2::GLES2DecoderImpl::HandleHint`. Once in you can follow the commands
after that. All of them go through `gpu::gles2::GLES2DecoderImpl::DoCommand`.
To actually debug the GPU process:
On Linux this works for me:
* `out/Debug/chromium --no-sandbox --gpu-launcher="xterm -e gdb --args"
http://localhost:8000/page-to-repro.html`
On OSX this works for me:
* `out/Debug/Chromium.app/Contents/MacOSX/Chromium --no-sandbox
--gpu-launcher="xterm -e gdb --args"
http://localhost:8000/page-to-repro.html`
On Windows I use `--gpu-startup-dialog` and then connect to the listed process.
### `GPU PARSE ERROR`
If you see this message in `about:gpu` or your console and you didn't cause it
directly (by calling `glLoseContextCHROMIUM`) and it's something other than 5
that means there's likely a bug. Please file an issue at <http://crbug.com/new>.
## Debugging Performance
If you have something to add here please add it. Most perf debugging is done
using `about:tracing` (see [Trace Event Profiling] for details). Otherwise,
be aware that, since the system is multi-process, calling:
```
start = GetTime()
DoSomething()
glFinish()
end = GetTime
printf("elapsedTime = %f\n", end - start);
```
**will not** give you meaningful results.
[See Trace Event Profiling for details]: https://sites.google.com/a/chromium.org/dev/developers/how-tos/trace-event-profiling-tool

571
docs/gpu/gpu_testing.md Normal file

@ -0,0 +1,571 @@
# GPU Testing
This set of pages documents the setup and operation of the GPU bots and try
servers, which verify the correctness of Chrome's graphically accelerated
rendering pipeline.
[TOC]
## Overview
The GPU bots run a different set of tests than the majority of the Chromium
test machines. The GPU bots specifically focus on tests which exercise the
graphics processor, and whose results are likely to vary between graphics card
vendors.
Most of the tests on the GPU bots are run via the [Telemetry framework].
Telemetry was originally conceived as a performance testing framework, but has
proven valuable for correctness testing as well. Telemetry directs the browser
to perform various operations, like page navigation and test execution, from
external scripts written in Python. The GPU bots launch the full Chromium
browser via Telemetry for the majority of the tests. Using the full browser to
execute tests, rather than smaller test harnesses, has yielded several
advantages: testing what is shipped, improved reliability, and improved
performance.
[Telemetry framework]: https://github.com/catapult-project/catapult/tree/master/telemetry
A subset of the tests, called "pixel tests", grab screen snapshots of the web
page in order to validate Chromium's rendering architecture end-to-end. Where
necessary, GPU-specific results are maintained for these tests. Some of these
tests verify just a few pixels, using handwritten code, in order to use the
same validation for all brands of GPUs.
The GPU bots use the Chrome infrastructure team's [recipe framework], and
specifically the [`chromium`][recipes/chromium] and
[`chromium_trybot`][recipes/chromium_trybot] recipes, to describe what tests to
execute. Compared to the legacy master-side buildbot scripts, recipes make it
easy to add new steps to the bots, change the bots' configuration, and run the
tests locally in the same way that they are run on the bots. Additionally, the
`chromium` and `chromium_trybot` recipes make it possible to send try jobs which
add new steps to the bots. This single capability is a huge step forward from
the previous configuration where new steps were added blindly, and could cause
failures on the tryservers. For more details about the configuration of the
bots, see the [GPU bot details].
[recipe framework]: https://chromium.googlesource.com/external/github.com/luci/recipes-py/+/master/doc/user_guide.md
[recipes/chromium]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipes/chromium.py
[recipes/chromium_trybot]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipes/chromium_trybot.py
[GPU bot details]: gpu_testing_bot_details.md
The physical hardware for the GPU bots lives in the Swarming pool\*. The
Swarming infrastructure ([new docs][new-testing-infra], [older but currently
more complete docs][isolated-testing-infra]) provides many benefits:
* Increased parallelism for the tests; all steps for a given tryjob or
waterfall build run in parallel.
* Simpler scaling: just add more hardware in order to get more capacity. No
manual configuration or distribution of hardware needed.
* Easier to run certain tests only on certain operating systems or types of
GPUs.
* Easier to add new operating systems or types of GPUs.
* Clearer description of the binary and data dependencies of the tests. If
they run successfully locally, they'll run successfully on the bots.
(\* All but a few one-off GPU bots are in the swarming pool. The exceptions to
the rule are described in the [GPU bot details].)
The bots on the [chromium.gpu.fyi] waterfall are configured to always test
top-of-tree ANGLE. This setup is done with a few lines of code in the
[tools/build workspace]; search the code for "angle".
These aspects of the bots are described in more detail below, and in linked
pages. There is a [presentation][bots-presentation] which gives a brief
overview of this documentation and links back to various portions.
<!-- XXX: broken link -->
[new-testing-infra]: https://github.com/luci/luci-py/wiki
[isolated-testing-infra]: https://www.chromium.org/developers/testing/isolated-testing/infrastructure
[chromium.gpu]: https://build.chromium.org/p/chromium.gpu/console
[chromium.gpu.fyi]: https://build.chromium.org/p/chromium.gpu.fyi/console
[tools/build workspace]: https://code.google.com/p/chromium/codesearch#chromium/build/scripts/slave/recipe_modules/chromium_tests/chromium_gpu_fyi.py
[bots-presentation]: https://docs.google.com/presentation/d/1BC6T7pndSqPFnituR7ceG7fMY7WaGqYHhx5i9ECa8EI/edit?usp=sharing
## Fleet Status
Please see the [GPU Pixel Wrangling instructions] for links to dashboards
showing the status of various bots in the GPU fleet.
[GPU Pixel Wrangling instructions]: pixel_wrangling.md#Fleet-Status
## Using the GPU Bots
Most Chromium developers interact with the GPU bots in two ways:
1. Observing the bots on the waterfalls.
2. Sending try jobs to them.
The GPU bots are grouped on the [chromium.gpu] and [chromium.gpu.fyi]
waterfalls. Their current status can be easily observed there.
To send try jobs, you must first upload your CL to the codereview server. Then,
either clicking the "CQ dry run" link or running from the command line:
```sh
git cl try
```
Sends your job to the default set of try servers.
The GPU tests are part of the default set for Chromium CLs, and are run as part
of the following tryservers' jobs:
* [linux_chromium_rel_ng] on the [tryserver.chromium.linux] waterfall
* [mac_chromium_rel_ng] on the [tryserver.chromium.mac] waterfall
* [win_chromium_rel_ng] on the [tryserver.chromium.win] waterfall
[linux_chromium_rel_ng]: http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_rel_ng?numbuilds=100
[mac_chromium_rel_ng]: http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_ng?numbuilds=100
[win_chromium_rel_ng]: http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng?numbuilds=100
[tryserver.chromium.linux]: http://build.chromium.org/p/tryserver.chromium.linux/waterfall?numbuilds=100
[tryserver.chromium.mac]: http://build.chromium.org/p/tryserver.chromium.mac/waterfall?numbuilds=100
[tryserver.chromium.win]: http://build.chromium.org/p/tryserver.chromium.win/waterfall?numbuilds=100
Scan down through the steps looking for the text "GPU"; that identifies those
tests run on the GPU bots. For each test the "trigger" step can be ignored; the
step further down for the test of the same name contains the results.
It's usually not necessary to explicitly send try jobs just for verifying GPU
tests. If you want to, you must invoke "git cl try" separately for each
tryserver master you want to reference, for example:
```sh
git cl try -b linux_chromium_rel_ng
git cl try -b mac_chromium_rel_ng
git cl try -b win_chromium_rel_ng
```
Alternatively, the Gerrit UI can be used to send a patch set to these try
servers.
Three optional tryservers are also available which run additional tests. As of
this writing, they ran longer-running tests that can't run against all Chromium
CLs due to lack of hardware capacity. They are added as part of the included
tryservers for code changes to certain sub-directories.
* [linux_optional_gpu_tests_rel] on the [tryserver.chromium.linux] waterfall
* [mac_optional_gpu_tests_rel] on the [tryserver.chromium.mac] waterfall
* [win_optional_gpu_tests_rel] on the [tryserver.chromium.win] waterfall
[linux_optional_gpu_tests_rel]: https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_optional_gpu_tests_rel?numbuilds=200
[mac_optional_gpu_tests_rel]: https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel?numbuilds=200
[win_optional_gpu_tests_rel]: https://build.chromium.org/p/tryserver.chromium.win/builders/win_optional_gpu_tests_rel?numbuilds=200
Tryservers for the [ANGLE project] are also present on the
[tryserver.chromium.angle] waterfall. These are invoked from the Gerrit user
interface. They are configured similarly to the tryservers for regular Chromium
patches, and run the same tests that are run on the [chromium.gpu.fyi]
waterfall, in the same way (e.g., against ToT ANGLE).
If you find it necessary to try patches against other sub-repositories than
Chromium (`src/`) and ANGLE (`src/third_party/angle/`), please
[file a bug](http://crbug.com/new) with component Internals\>GPU\>Testing.
[ANGLE project]: https://chromium.googlesource.com/angle/angle/+/master/README.md
[tryserver.chromium.angle]: https://build.chromium.org/p/tryserver.chromium.angle/waterfall
[file a bug]: http://crbug.com/new
## Running the GPU Tests Locally
All of the GPU tests running on the bots can be run locally from a Chromium
build. Many of the tests are simple executables:
* `angle_unittests`
* `content_gl_tests`
* `gl_tests`
* `gl_unittests`
* `tab_capture_end2end_tests`
Some run only on the chromium.gpu.fyi waterfall, either because there isn't
enough machine capacity at the moment, or because they're closed-source tests
which aren't allowed to run on the regular Chromium waterfalls:
* `angle_deqp_gles2_tests`
* `angle_deqp_gles3_tests`
* `angle_end2end_tests`
* `audio_unittests`
The remaining GPU tests are run via Telemetry. In order to run them, just
build the `chrome` target and then
invoke `src/content/test/gpu/run_gpu_integration_test.py` with the appropriate
argument. The tests this script can invoke are
in `src/content/test/gpu/gpu_tests/`. For example:
* `run_gpu_integration_test.py context_lost --browser=release`
* `run_gpu_integration_test.py pixel --browser=release`
* `run_gpu_integration_test.py webgl_conformance --browser=release --webgl-conformance-version=1.0.2`
* `run_gpu_integration_test.py maps --browser=release`
* `run_gpu_integration_test.py screenshot_sync --browser=release`
* `run_gpu_integration_test.py trace_test --browser=release`
**Note:** If you are on Linux and see this test harness exit immediately with
`**Non zero exit code**`, it's probably because of some incompatible Python
packages being installed. Please uninstall the `python-egenix-mxdatetime` and
`python-logilab-common` packages in this case; see
[Issue 716241](http://crbug.com/716241).
You can also run a subset of tests with this harness:
* `run_gpu_integration_test.py webgl_conformance --browser=release
--test-filter=conformance_attribs`
Figuring out the exact command line that was used to invoke the test on the
bots can be a little tricky. The bots all\* run their tests via Swarming and
isolates, meaning that the invocation of a step like `[trigger]
webgl_conformance_tests on NVIDIA GPU...` will look like:
* `python -u
'E:\b\build\slave\Win7_Release__NVIDIA_\build\src\tools\swarming_client\swarming.py'
trigger --swarming https://chromium-swarm.appspot.com
--isolate-server https://isolateserver.appspot.com
--priority 25 --shards 1 --task-name 'webgl_conformance_tests on NVIDIA GPU...'`
You can figure out the additional command line arguments that were passed to
each test on the bots by examining the trigger step and searching for the
argument separator (<code> -- </code>). For a recent invocation of
`webgl_conformance_tests`, this looked like:
* `webgl_conformance --show-stdout '--browser=release' -v
'--extra-browser-args=--enable-logging=stderr --js-flags=--expose-gc'
'--isolated-script-test-output=${ISOLATED_OUTDIR}/output.json'`
You can leave off the --isolated-script-test-output argument, so this would
leave a full command line of:
* `run_gpu_integration_test.py
webgl_conformance --show-stdout '--browser=release' -v
'--extra-browser-args=--enable-logging=stderr --js-flags=--expose-gc'`
The Maps test requires you to authenticate to cloud storage in order to access
the Web Page Reply archive containing the test. See [Cloud Storage Credentials]
for documentation on setting this up.
[Cloud Storage Credentials]: gpu_testing_bot_details.md#Cloud-storage-credentials
Pixel tests use reference images from cloud storage, bots pass
`--upload-refimg-to-cloud-storage` argument, but to run locally you need to pass
`--download-refimg-from-cloud-storage` argument, as well as other arguments bot
uses, like `--refimg-cloud-storage-bucket` and `--os-type`.
Sample command line for Android:
* `run_gpu_integration_test.py pixel --show-stdout --browser=android-chromium
-v --passthrough --extra-browser-args='--enable-logging=stderr
--js-flags=--expose-gc' --refimg-cloud-storage-bucket
chromium-gpu-archive/reference-images --os-type android
--download-refimg-from-cloud-storage`
<!-- XXX: update this section; these isolates don't exist anymore -->
You can find the isolates for the various tests in
[src/chrome/](http://src.chromium.org/viewvc/chrome/trunk/src/chrome/):
* [angle_unittests.isolate](https://chromium.googlesource.com/chromium/src/+/master/chrome/angle_unittests.isolate)
* [content_gl_tests.isolate](https://chromium.googlesource.com/chromium/src/+/master/content/content_gl_tests.isolate)
* [gl_tests.isolate](https://chromium.googlesource.com/chromium/src/+/master/chrome/gl_tests.isolate)
* [gles2_conform_test.isolate](https://chromium.googlesource.com/chromium/src/+/master/chrome/gles2_conform_test.isolate)
* [tab_capture_end2end_tests.isolate](https://chromium.googlesource.com/chromium/src/+/master/chrome/tab_capture_end2end_tests.isolate)
* [telemetry_gpu_test.isolate](https://chromium.googlesource.com/chromium/src/+/master/chrome/telemetry_gpu_test.isolate)
The isolates contain the full or partial command line for invoking the target.
The complete command line for any test can be deduced from the contents of the
isolate plus the stdio output from the test's run on the bot.
Note that for the GN build, the isolates are simply described by build targets,
and [gn_isolate_map.pyl] describes the mapping between isolate name and build
target, as well as the command line used to invoke the isolate. Once all
platforms have switched to GN, the .isolate files will be obsolete and be
removed.
(\* A few of the one-off GPU configurations on the chromium.gpu.fyi waterfall
run their tests locally rather than via swarming, in order to decrease the
number of physical machines needed.)
[gn_isolate_map.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/gn_isolate_map.pyl
## Running Binaries from the Bots Locally
Any binary run remotely on a bot can also be run locally, assuming the local
machine loosely matches the architecture and OS of the bot.
The easiest way to do this is to find the ID of the swarming task and use
"swarming.py reproduce" to re-run it:
* `./src/tools/swarming_client/swarming.py reproduce -S https://chromium-swarm.appspot.com [task ID]`
The task ID can be found in the stdio for the "trigger" step for the test. For
example, look at a recent build from the [Mac Release (Intel)] bot, and
look at the `gl_unittests` step. You will see something like:
[Mac Release (Intel)]: https://ci.chromium.org/buildbot/chromium.gpu/Mac%20Release%20%28Intel%29/
```
Triggered task: gl_unittests on Intel GPU on Mac/Mac-10.12.6/[TRUNCATED_ISOLATE_HASH]/Mac Release (Intel)/83664
To collect results, use:
swarming.py collect -S https://chromium-swarm.appspot.com --json /var/folders/[PATH_TO_TEMP_FILE].json
Or visit:
https://chromium-swarm.appspot.com/user/task/[TASK_ID]
```
There is a difference between the isolate's hash and Swarming's task ID. Make
sure you use the task ID and not the isolate's hash.
As of this writing, there seems to be a
[bug](https://github.com/luci/luci-py/issues/250)
when attempting to re-run the Telemetry based GPU tests in this way. For the
time being, this can be worked around by instead downloading the contents of
the isolate. To do so, look more deeply into the trigger step's log:
* <code>python -u
/b/build/slave/Mac_10_10_Release__Intel_/build/src/tools/swarming_client/swarming.py
trigger [...more args...] --tag data:[ISOLATE_HASH] [...more args...]
[ISOLATE_HASH] -- **[...TEST_ARGS...]**</code>
As of this writing, the isolate hash appears twice in the command line. To
download the isolate's contents into directory `foo` (note, this is in the
"Help" section associated with the page for the isolate's task, but I'm not
sure whether that's accessible only to Google employees or all members of the
chromium.org organization):
* `python isolateserver.py download -I https://isolateserver.appspot.com
--namespace default-gzip -s [ISOLATE_HASH] --target foo`
`isolateserver.py` will tell you the approximate command line to use. You
should concatenate the `TEST_ARGS` highlighted in red above with
`isolateserver.py`'s recommendation. The `ISOLATED_OUTDIR` variable can be
safely replaced with `/tmp`.
Note that `isolateserver.py` downloads a large number of files (everything
needed to run the test) and may take a while. There is a way to use
`run_isolated.py` to achieve the same result, but as of this writing, there
were problems doing so, so this procedure is not documented at this time.
Before attempting to download an isolate, you must ensure you have permission
to access the isolate server. Full instructions can be [found
here][isolate-server-credentials]. For most cases, you can simply run:
* `./src/tools/swarming_client/auth.py login
--service=https://isolateserver.appspot.com`
The above link requires that you log in with your @google.com credentials. It's
not known at the present time whether this works with @chromium.org accounts.
Email kbr@ if you try this and find it doesn't work.
[isolate-server-credentials]: gpu_testing_bot_details.md#Isolate-server-credentials
## Running Locally Built Binaries on the GPU Bots
See the [Swarming documentation] for instructions on how to upload your binaries to the isolate server and trigger execution on Swarming.
[Swarming documentation]: https://www.chromium.org/developers/testing/isolated-testing/for-swes#TOC-Run-a-test-built-locally-on-Swarming
## Adding New Tests to the GPU Bots
The goal of the GPU bots is to avoid regressions in Chrome's rendering stack.
To that end, let's add as many tests as possible that will help catch
regressions in the product. If you see a crazy bug in Chrome's rendering which
would be easy to catch with a pixel test running in Chrome and hard to catch in
any of the other test harnesses, please, invest the time to add a test!
There are a couple of different ways to add new tests to the bots:
1. Adding a new test to one of the existing harnesses.
2. Adding an entire new test step to the bots.
### Adding a new test to one of the existing test harnesses
Adding new tests to the GTest-based harnesses is straightforward and
essentially requires no explanation.
As of this writing it isn't as easy as desired to add a new test to one of the
Telemetry based harnesses. See [Issue 352807](http://crbug.com/352807). Let's
collectively work to address that issue. It would be great to reduce the number
of steps on the GPU bots, or at least to avoid significantly increasing the
number of steps on the bots. The WebGL conformance tests should probably remain
a separate step, but some of the smaller Telemetry based tests
(`context_lost_tests`, `memory_test`, etc.) should probably be combined into a
single step.
If you are adding a new test to one of the existing tests (e.g., `pixel_test`),
all you need to do is make sure that your new test runs correctly via isolates.
See the documentation from the GPU bot details on [adding new isolated
tests][new-isolates] for the `GYP_DEFINES` and authentication needed to upload
isolates to the isolate server. Most likely the new test will be Telemetry
based, and included in the `telemetry_gpu_test_run` isolate. You can then
invoke it via:
* `./src/tools/swarming_client/run_isolated.py -s [HASH]
-I https://isolateserver.appspot.com -- [TEST_NAME] [TEST_ARGUMENTS]`
[new-isolates]: gpu_testing_bot_details.md#Adding-a-new-isolated-test-to-the-bots
o## Adding new steps to the GPU Bots
The tests that are run by the GPU bots are described by a couple of JSON files
in the Chromium workspace:
* [`chromium.gpu.json`](https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.json)
* [`chromium.gpu.fyi.json`](https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.fyi.json)
These files are autogenerated by the following script:
* [`generate_buildbot_json.py`](https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/generate_buildbot_json.py)
This script is completely self-contained and should hopefully be
self-explanatory. The JSON files are parsed by the chromium and chromium_trybot
recipes, and describe two types of tests:
* GTests: those which use the Googletest and Chromium's `base/test/launcher/`
frameworks.
* Telemetry based tests: those which are built on the Telemetry framework and
launch the entire browser.
A prerequisite of adding a new test to the bots is that that test [run via
isolates][new-isolates]. Once that is done, modify `generate_buildbot_json.py` to add the
test to the appropriate set of bots. Be careful when adding large new test
steps to all of the bots, because the GPU bots are a limited resource and do
not currently have the capacity to absorb large new test suites. It is safer to
get new tests running on the chromium.gpu.fyi waterfall first, and expand from
there to the chromium.gpu waterfall (which will also make them run against
every Chromium CL by virtue of the `linux_chromium_rel_ng`,
`mac_chromium_rel_ng` and `win_chromium_rel_ng` tryservers' mirroring of the
bots on this waterfall so be careful!).
Tryjobs which add new test steps to the chromium.gpu.json file will run those
new steps during the tryjob, which helps ensure that the new test won't break
once it starts running on the waterfall.
Tryjobs which modify chromium.gpu.fyi.json can be sent to the
`win_optional_gpu_tests_rel`, `mac_optional_gpu_tests_rel` and
`linux_optional_gpu_tests_rel` tryservers to help ensure that they won't
break the FYI bots.
## Updating and Adding New Pixel Tests to the GPU Bots
Adding new pixel tests which require reference images is a slightly more
complex process than adding other kinds of tests which can validate their own
correctness. There are a few reasons for this.
* Reference image based pixel tests require different golden images for
different combinations of operating system, GPU, driver version, OS
version, and occasionally other variables.
* The reference images must be generated by the main waterfall. The try
servers are not allowed to produce new reference images, only consume them.
The reason for this is that a patch sent to the try servers might cause an
incorrect reference image to be generated. For this reason, the main
waterfall bots upload reference images to cloud storage, and the try
servers download them and verify their results against them.
* The try servers will fail if they run a pixel test requiring a reference
image that doesn't exist in cloud storage. This is deliberate, but needs
more thought; see [Issue 349262](http://crbug.com/349262).
If a reference image based pixel test's result is going to change because of a
change in ANGLE or Blink (for example), updating the reference images is a
slightly tricky process. Here's how to do it:
* Mark the pixel test as failing in the [pixel tests]' [test expectations]
* Commit the change to ANGLE, Blink, etc. which will change the test's
results
* Note that without the failure expectation, this commit would turn some bots
red; a Blink change will turn the GPU bots on the chromium.webkit waterfall
red, and an ANGLE change will turn the chromium.gpu.fyi bots red
* Wait for Blink/ANGLE/etc. to roll
* Commit a change incrementing the revision number associated with the test
in the [test pages]
* Commit a second change removing the failure expectation, once all of the
bots on the main waterfall have generated new reference images. This change
should go through the commit queue cleanly.
[pixel tests]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/pixel_test_pages.py
[test expectations]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/pixel_expectations.py
[test pages]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/pixel_test_pages.py
When adding a brand new pixel test that uses a reference image, the steps are
similar, but simpler:
* Mark the test as failing in the same commit which introduces the new test
* Wait for the reference images to be produced by all of the GPU bots on the
waterfalls (see [chromium-gpu-archive/reference-images])
* Commit a change un-marking the test as failing
When making a Chromium-side change which changes the pixel tests' results:
* In your CL, both mark the pixel test as failing in the pixel test's test
expectations and increment the test's version number in the page set (see
above)
* After your CL lands, land another CL removing the failure expectations. If
this second CL goes through the commit queue cleanly, you know reference
images were generated properly.
In general, when adding a new pixel test, it's better to spot check a few
pixels in the rendered image rather than using a reference image per platform.
The [GPU rasterization test] is a good example of a recently added test which
performs such spot checks.
[cloud storage bucket]: https://console.developers.google.com/storage/chromium-gpu-archive/reference-images
<!-- XXX: old link -->
[GPU rasterization test]: http://src.chromium.org/viewvc/chrome/trunk/src/content/test/gpu/gpu_tests/gpu_rasterization.py
## Stamping out Flakiness
It's critically important to aggressively investigate and eliminate the root
cause of any flakiness seen on the GPU bots. The bots have been known to run
reliably for days at a time, and any flaky failures that are tolerated on the
bots translate directly into instability of the browser experienced by
customers. Critical bugs in subsystems like WebGL, affecting high-profile
products like Google Maps, have escaped notice in the past because the bots
were unreliable. After much re-work, the GPU bots are now among the most
reliable automated test machines in the Chromium project. Let's keep them that
way.
Flakiness affecting the GPU tests can come in from highly unexpected sources.
Here are some examples:
* Intermittent pixel_test failures on Linux where the captured pixels were
black, caused by the Display Power Management System (DPMS) kicking in.
Disabled the X server's built-in screen saver on the GPU bots in response.
* GNOME dbus-related deadlocks causing intermittent timeouts ([Issue
309093](http://crbug.com/309093) and related bugs).
* Windows Audio system changes causing intermittent assertion failures in the
browser ([Issue 310838](http://crbug.com/310838)).
* Enabling assertion failures in the C++ standard library on Linux causing
random assertion failures ([Issue 328249](http://crbug.com/328249)).
* V8 bugs causing random crashes of the Maps pixel test (V8 issues
[3022](https://code.google.com/p/v8/issues/detail?id=3022),
[3174](https://code.google.com/p/v8/issues/detail?id=3174)).
* TLS changes causing random browser process crashes ([Issue
264406](http://crbug.com/264406)).
* Isolated test execution flakiness caused by failures to reliably clean up
temporary directories ([Issue 340415](http://crbug.com/340415)).
* The Telemetry-based WebGL conformance suite caught a bug in the memory
allocator on Android not caught by any other bot ([Issue
347919](http://crbug.com/347919)).
* context_lost test failures caused by the compositor's retry logic ([Issue
356453](http://crbug.com/356453)).
* Multiple bugs in Chromium's support for lost contexts causing flakiness of
the context_lost tests ([Issue 365904](http://crbug.com/365904)).
* Maps test timeouts caused by Content Security Policy changes in Blink
([Issue 395914](http://crbug.com/395914)).
* Weak pointer assertion failures in various webgl\_conformance\_tests caused
by changes to the media pipeline ([Issue 399417](http://crbug.com/399417)).
* A change to a default WebSocket timeout in Telemetry causing intermittent
failures to run all WebGL conformance tests on the Mac bots ([Issue
403981](http://crbug.com/403981)).
* Chrome leaking suspended sub-processes on Windows, apparently a preexisting
race condition that suddenly showed up ([Issue
424024](http://crbug.com/424024)).
* Changes to Chrome's cross-context synchronization primitives causing the
wrong tiles to be rendered ([Issue 584381](http://crbug.com/584381)).
* A bug in V8's handling of array literals causing flaky failures of
texture-related WebGL 2.0 tests ([Issue 606021](http://crbug.com/606021)).
* Assertion failures in sync point management related to lost contexts that
exposed a real correctness bug ([Issue 606112](http://crbug.com/606112)).
* A bug in glibc's `sem_post`/`sem_wait` primitives breaking V8's parallel
garbage collection ([Issue 609249](http://crbug.com/609249)).
If you notice flaky test failures either on the GPU waterfalls or try servers,
please file bugs right away with the component Internals>GPU>Testing and
include links to the failing builds and copies of the logs, since the logs
expire after a few days. [GPU pixel wranglers] should give the highest priority
to eliminating flakiness on the tree.
[GPU pixel wranglers]: pixel_wrangling.md

@ -0,0 +1,539 @@
# GPU Bot Details
This PAGE describes in detail how the GPU bots are set up, which files affect
their configuration, and how to both modify their behavior and add new bots.
[TOC]
## Overview of the GPU bots' setup
Chromium's GPU bots, compared to the majority of the project's test machines,
are physical pieces of hardware. When end users run the Chrome browser, they
are almost surely running it on a physical piece of hardware with a real
graphics processor. There are some portions of the code base which simply can
not be exercised by running the browser in a virtual machine, or on a software
implementation of the underlying graphics libraries. The GPU bots were
developed and deployed in order to cover these code paths, and avoid
regressions that are otherwise inevitable in a project the size of the Chromium
browser.
The GPU bots are utilized on the [chromium.gpu] and [chromium.gpu.fyi]
waterfalls, and various tryservers, as described in [Using the GPU Bots].
[chromium.gpu]: https://build.chromium.org/p/chromium.gpu/console
[chromium.gpu.fyi]: https://build.chromium.org/p/chromium.gpu.fyi/console
[Using the GPU Bots]: gpu_testing.md#Using-the-GPU-Bots
The vast majority of the hardware for the bots lives in the Chrome-GPU Swarming
pool. The waterfall bots are simply virtual machines which spawn Swarming tasks
with the appropriate tags to get them to run on the desired GPU and operating
system type. So, for example, the [Win10 Release (NVIDIA)] bot is actually a
virtual machine which spawns all of its jobs with the Swarming parameters:
[Win10 Release (NVIDIA)]: https://ci.chromium.org/buildbot/chromium.gpu/Win10%20Release%20%28NVIDIA%29/?limit=200
```json
{
"gpu": "10de:1cb3-23.21.13.8816",
"os": "Windows-10",
"pool": "Chrome-GPU"
}
```
Since the GPUs in the Swarming pool are mostly homogeneous, this is sufficient
to target the pool of Windows 10-like NVIDIA machines. (There are a few Windows
7-like NVIDIA bots in the pool, which necessitates the OS specifier.)
Details about the bots can be found on [chromium-swarm.appspot.com] and by
using `src/tools/swarming_client/swarming.py`, for example `swarming.py bots`.
If you are authenticated with @google.com credentials you will be able to make
queries of the bots and see, for example, which GPUs are available.
[chromium-swarm.appspot.com]: https://chromium-swarm.appspot.com/
The waterfall bots run tests on a single GPU type in order to make it easier to
see regressions or flakiness that affect only a certain type of GPU.
The tryservers like `win_chromium_rel_ng` which include GPU tests, on the other
hand, run tests on more than one GPU type. As of this writing, the Windows
tryservers ran tests on NVIDIA and AMD GPUs; the Mac tryservers ran tests on
Intel and NVIDIA GPUs. The way these tryservers' tests are specified is simply
by *mirroring* how one or more waterfall bots work. This is an inherent
property of the [`chromium_trybot` recipe][chromium_trybot.py], which was designed to eliminate
differences in behavior between the tryservers and waterfall bots. Since the
tryservers mirror waterfall bots, if the waterfall bot is working, the
tryserver must almost inherently be working as well.
[chromium_trybot.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipes/chromium_trybot.py
There are a few one-off GPU configurations on the waterfall where the tests are
run locally on physical hardware, rather than via Swarming. A few examples are:
<!-- XXX: update this list -->
* [Mac Pro Release (AMD)](https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Mac%20Pro%20Release%20%28AMD%29/)
* [Mac Pro Debug (AMD)](https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Mac%20Pro%20Debug%20%28AMD%29/)
* [Linux Release (Intel HD 630)](https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Linux%20Release%20%28Intel%20HD%20630%29/)
* [Linux Release (AMD R7 240)](https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Linux%20Release%20%28AMD%20R7%20240%29/)
There are a couple of reasons to continue to support running tests on a
specific machine: it might be too expensive to deploy the required multiple
copies of said hardware, or the configuration might not be reliable enough to
begin scaling it up.
## Adding a new isolated test to the bots
Adding a new test step to the bots requires that the test run via an isolate.
Isolates describe both the binary and data dependencies of an executable, and
are the underpinning of how the Swarming system works. See the [LUCI wiki] for
background on Isolates and Swarming.
<!-- XXX: broken link -->
[LUCI wiki]: https://github.com/luci/luci-py/wiki
### Adding a new isolate
1. Define your target using the `template("test")` template in
[`src/testing/test.gni`][testing/test.gni]. See `test("gl_tests")` in
[`src/gpu/BUILD.gn`][gpu/BUILD.gn] for an example. For a more complex
example which invokes a series of scripts which finally launches the
browser, see [`src/chrome/telemetry_gpu_test.isolate`][telemetry_gpu_test.isolate].
2. Add an entry to [`src/testing/buildbot/gn_isolate_map.pyl`][gn_isolate_map.pyl] that refers to
your target. Find a similar target to yours in order to determine the
`type`. The type is referenced in [`src/tools/mb/mb_config.pyl`][mb_config.pyl].
[testing/test.gni]: https://chromium.googlesource.com/chromium/src/+/master/testing/test.gni
[gpu/BUILD.gn]: https://chromium.googlesource.com/chromium/src/+/master/gpu/BUILD.gn
<!-- XXX: broken link -->
[telemetry_gpu_test.isolate]: https://chromium.googlesource.com/chromium/src/+/master/chrome/telemetry_gpu_test.isolate
[gn_isolate_map.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/gn_isolate_map.pyl
[mb_config.pyl]: https://chromium.googlesource.com/chromium/src/+/master/tools/mb/mb_config.pyl
At this point you can build and upload your isolate to the isolate server.
See [Isolated Testing for SWEs] for the most up-to-date instructions. These
instructions are a copy which show how to run an isolate that's been uploaded
to the isolate server on your local machine rather than on Swarming.
[Isolated Testing for SWEs]: https://www.chromium.org/developers/testing/isolated-testing/for-swes
If `cd`'d into `src/`:
1. `./tools/mb/mb.py isolate //out/Release [target name]`
* For example: `./tools/mb/mb.py isolate //out/Release angle_end2end_tests`
1. `python tools/swarming_client/isolate.py batcharchive -I https://isolateserver.appspot.com out/Release/[target name].isolated.gen.json`
* For example: `python tools/swarming_client/isolate.py batcharchive -I https://isolateserver.appspot.com out/Release/angle_end2end_tests.isolated.gen.json`
1. This will write a hash to stdout. You can run it via:
`python tools/swarming_client/run_isolated.py -I https://isolateserver.appspot.com -s [HASH] -- [any additional args for the isolate]`
See the section below on [isolate server credentials](#Isolate-server-credentials).
### Adding your new isolate to the tests that are run on the bots
See [Adding new steps to the GPU bots] for details on this process.
[Adding new steps to the GPU bots]: gpu_testing.md#Adding-new-steps-to-the-GPU-Bots
## Relevant files that control the operation of the GPU bots
In the [tools/build] workspace:
* [masters/master.chromium.gpu] and [masters/master.chromium.gpu.fyi]:
* builders.pyl in these two directories defines the bots that show up on
the waterfall. If you are adding a new bot, you need to add it to
builders.pyl and use go/bug-a-trooper to request a restart of either
master.chromium.gpu or master.chromium.gpu.fyi.
* Only changes under masters/ require a waterfall restart. All other
changes for example, to scripts/slave/ in this workspace, or the
Chromium workspace do not require a master restart (and go live the
minute they are committed).
* `scripts/slave/recipe_modules/chromium_tests/`:
* <code>[chromium_gpu.py]</code> and
<code>[chromium_gpu_fyi.py]</code> define the following for
each builder and tester:
* How the workspace is checked out (e.g., this is where top-of-tree
ANGLE is specified)
* The build configuration (e.g., this is where 32-bit vs. 64-bit is
specified)
* Various gclient defines (like compiling in the hardware-accelerated
video codecs, and enabling compilation of certain tests, like the
dEQP tests, that can't be built on all of the Chromium builders)
* Note that the GN configuration of the bots is also controlled by
<code>[mb_config.pyl]</code> in the Chromium workspace; see below.
* <code>[trybots.py]</code> defines how try bots *mirror* one or more
waterfall bots.
* The concept of try bots mirroring waterfall bots ensures there are
no differences in behavior between the waterfall bots and the try
bots. This helps ensure that a CL will not pass the commit queue
and then break on the waterfall.
* This file defines the behavior of the following GPU-related try
bots:
* `linux_chromium_rel_ng`, `mac_chromium_rel_ng`, and
`win_chromium_rel_ng`, which run against every Chromium CL, and
which mirror the behavior of bots on the chromium.gpu
waterfall.
* The ANGLE try bots, which run against ANGLE CLs, and mirror the
behavior of the chromium.gpu.fyi waterfall (including using
top-of-tree ANGLE, and running additional tests not run by the
regular Chromium try bots)
* The optional GPU try servers `linux_optional_gpu_tests_rel`,
`mac_optional_gpu_tests_rel` and
`win_optional_gpu_tests_rel`, which are triggered manually and
run some tests which can't be run on the regular Chromium try
servers mainly due to lack of hardware capacity.
[tools/build]: https://chromium.googlesource.com/chromium/tools/build/
[masters/master.chromium.gpu]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.chromium.gpu/
[masters/master.chromium.gpu.fyi]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.chromium.gpu.fyi/
[chromium_gpu.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipe_modules/chromium_tests/chromium_gpu.py
[chromium_gpu_fyi.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipe_modules/chromium_tests/chromium_gpu_fyi.py
[trybots.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipe_modules/chromium_tests/trybots.py
In the [chromium/src] workspace:
* [src/testing/buildbot]:
* <code>[chromium.gpu.json]</code> and
<code>[chromium.gpu.fyi.json]</code> define which steps are run on
which bots. These files are autogenerated. Don't modify them directly!
* <code>[gn_isolate_map.pyl]</code> defines all of the isolates' behavior in the GN
build.
* [`src/tools/mb/mb_config.pyl`][mb_config.pyl]
* Defines the GN arguments for all of the bots.
* [`src/content/test/gpu/generate_buildbot_json.py`][generate_buildbot_json.py]
* The generator script for `chromium.gpu.json` and
`chromium.gpu.fyi.json`. It defines on which GPUs various tests run.
* It's completely self-contained and should hopefully be fairly
comprehensible.
* When modifying this script, don't forget to also run it, to regenerate
the JSON files.
* See [Adding new steps to the GPU bots] for more details.
[chromium/src]: https://chromium.googlesource.com/chromium/src/
[src/testing/buildbot]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot
[chromium.gpu.json]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.json
[chromium.gpu.fyi.json]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.fyi.json
[gn_isolate_map.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/gn_isolate_map.pyl
[mb_config.pyl]: https://chromium.googlesource.com/chromium/src/+/master/tools/mb/mb_config.pyl
[generate_buildbot_json.py]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/generate_buildbot_json.py
In the [infradata/config] workspace (Google internal only, sorry):
* [configs/chromium-swarm/bots.cfg]
* Defines a `Chrome-GPU` Swarming pool which contains most of the
specialized hardware: as of this writing, the Windows and Linux NVIDIA
bots, the Windows AMD bots, and the MacBook Pros with NVIDIA and AMD
GPUs. New GPU hardware should be added to this pool.
[infradata/config]: https://chrome-internal.googlesource.com/infradata/config
[configs/chromium-swarm/bots.cfg]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/bots.cfg
## Walkthroughs of various maintenance scenarios
This section describes various common scenarios that might arise when
maintaining the GPU bots, and how they'd be addressed.
### How to add a new test or an entire new step to the bots
This is described in [Adding new tests to the GPU bots].
[Adding new tests to the GPU bots]: https://www.chromium.org/developers/testing/gpu-testing/#TOC-Adding-New-Tests-to-the-GPU-Bots
### How to add a new bot
The first decision point when adding a new GPU bot is whether it is a one-off
piece of hardware, or one which is expected to be scaled up at some point. If
it's a one-off piece of hardware, it can be added to the chromium.gpu.fyi
waterfall as a non-swarmed test machine. If it's expected to be scaled up at
some point, the hardware should be added to the swarming pool. These two
scenarios are described in more detail below.
#### How to add a new, non-swarmed, physical bot to the chromium.gpu.fyi waterfall
1. Work with the Chrome Infrastructure Labs team to get the hardware deployed
so it can talk to the chromium.gpu.fyi master.
1. Create a CL in the build workspace which:
1. Add the new machine to
[`masters/master.chromium.gpu.fyi/builders.pyl`][master.chromium.gpu.fyi/builders.pyl].
1. Add the new machine to
[`scripts/slave/recipe_modules/chromium_tests/chromium_gpu_fyi.py`][chromium_gpu_fyi.py].
Set the `enable_swarming` property to `False`.
1. Retrain recipe expectations
(`scripts/slave/recipes.py --use-bootstrap test train`) and add the
newly created JSON file(s) corresponding to the new machines to your CL.
1. Create a CL in the Chromium workspace to:
1. Add the new machine to
[`src/content/test/gpu/generate_buildbot_json.py`][generate_buildbot_json.py].
Make sure to set the `swarming` property to `False`.
1. If the machine runs GN, add a description to
[`src/tools/mb/mb_config.pyl`][mb_config.pyl].
1. Once the build workspace CL lands, use go/bug-a-trooper (or contact kbr@)
to schedule a restart of the chromium.gpu.fyi waterfall. This is only
necessary when modifying files under the masters/ directory. A reboot of
the machine may be needed once the waterfall has been restarted in order to
make it connect properly.
1. The CLs from (2) and (3) can land in either order, though it is preferable
to land the Chromium-side CL first so that the machine knows what tests to
run the first time it boots up.
[master.chromium.gpu.fyi/builders.pyl]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.chromium.gpu.fyi/builders.pyl
#### How to add a new swarmed bot to the chromium.gpu.fyi waterfall
When deploying a new GPU configuration, it should be added to the
chromium.gpu.fyi waterfall first. The chromium.gpu waterfall should be reserved
for those GPUs which are tested on the commit queue. (Some of the bots violate
this rule namely, the Debug bots though we should strive to eliminate these
differences.) Once the new configuration is ready to be fully deployed on
tryservers, bots can be added to the chromium.gpu waterfall, and the tryservers
changed to mirror them.
In order to add Release and Debug waterfall bots for a new configuration,
experience has shown that at least 4 physical machines are needed in the
swarming pool. The reason is that the tests all run in parallel on the Swarming
cluster, so the load induced on the swarming bots is higher than it would be
for a non-swarmed bot that executes its tests serially.
With these prerequisites, these are the steps to add a new swarmed bot.
(Actually, pair of bots -- Release and Debug.)
1. Work with the Chrome Infrastructure Labs team to get the (minimum 4)
physical machines added to the Swarming pool. Use
[chromium-swarm.appspot.com] or `src/tools/swarming_client/swarming.py bots`
to determine the PCI IDs of the GPUs in the bots. (These instructions will
need to be updated for Android bots which don't have PCI buses.)
1. Make sure to add these new machines to the Chrome-GPU Swarming pool by
creating a CL against [`configs/chromium-swarm/bots.cfg`][bots.cfg] in
the [infradata/config] workspace.
1. File a Chrome Infrastructure Labs ticket requesting 2 virtual machines for
the testers. These need to match the OS of the physical machines and
builders because of limitations in the scripts which transfer builds from
the builder to the tester; see [this feature
request](http://crbug.com/581953). For example, if you're adding a "Windows
7 CoolNewGPUType" tester, you'll need 2 Windows VMs.
1. Once the VMs are ready, create a CL in the build workspace which:
1. Adds the new VMs as the Release and Debug bots in
[`master.chromium.gpu.fyi/builders.pyl`][master.chromium.gpu.fyi/builders.pyl].
1. Adds the new VMs to [`chromium_gpu_fyi.py`][chromium_gpu_fyi.py]. Make
sure to set the `enable_swarming` and `serialize_tests` properties to
`True`. Double-check the `parent_buildername` property for each. It
must match the Release/Debug flavor of the builder.
1. Retrain recipe expectations
(`scripts/slave/recipes.py --use-bootstrap test train`) and add the
newly created JSON file(s) corresponding to the new machines to your CL.
1. Create a CL in the Chromium workspace which:
1. Adds the new machine to
`src/content/test/gpu/generate_buildbot_json.py`.
1. The swarming dimensions are crucial. These must match the GPU and
OS type of the physical hardware in the Swarming pool. This is what
causes the VMs to spawn their tests on the correct hardware. Make
sure to use the Chrome-GPU pool, and that the new machines were
specifically added to that pool.
1. Make sure to set the `swarming` property to `True` for both the
Release and Debug bots.
1. Make triply sure that there are no collisions between the new
hardware you're adding and hardware already in the Swarming pool.
For example, it used to be the case that all of the Windows NVIDIA
bots ran the same OS version. Later, the Windows 8 flavor bots were
added. In order to avoid accidentally running tests on Windows 8
when Windows 7 was intended, the OS in the swarming dimensions of
the Win7 bots had to be changed from `win` to
`Windows-2008ServerR2-SP1` (the Win7-like flavor running in our
data center). Similarly, the Win8 bots had to have a very precise
OS description (`Windows-2012ServerR2-SP0`).
1. If the machine runs GN, adds a description to
[`src/tools/mb/mb_config.pyl`][mb_config.pyl].
1. Once the tools/build CL lands, use go/bug-a-trooper (or contact kbr@) to
schedule a restart of the chromium.gpu.fyi waterfall. This is only
necessary when modifying files under the masters/ directory. A reboot of
the VMs may be needed once the waterfall has been restarted in order to
make them connect properly.
1. The CLs from (3) and (4) can land in either order, though it is preferable
to land the Chromium-side CL first so that the machine knows what tests to
run the first time it boots up.
[bots.cfg]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/bots.cfg
[infradata/config]: https://chrome-internal.googlesource.com/infradata/config/
#### How to start running tests on a new GPU type on an existing try bot
Let's say that you want to cause the `win_chromium_rel_ng` try bot to run tests
on CoolNewGPUType in addition to the types it currently runs (as of this
writing, NVIDIA and AMD). To do this:
1. Make sure there is enough hardware capacity. Unfortunately, tools to report
utilization of the Swarming pool are still being developed, but a
back-of-the-envelope estimate is that you will need a minimum of 30
machines in the Swarming pool to run the current set of GPU tests on the
tryservers. We estimate that 90 machines will be needed in order to
additionally run the WebGL 2.0 conformance tests. Plan for the larger
capacity, as it's desired to run the larger test suite on as many
configurations as possible.
2. Deploy Release and Debug testers on the chromium.gpu waterfall, following
the instructions for the chromium.gpu.fyi waterfall above. You will also
need to temporarily add suppressions to
[`tests/masters_recipes_test.py`][tests/masters_recipes_test.py] for these
new testers since they aren't yet covered by try bots and are going on a
non-FYI waterfall. Make sure these run green for a day or two before
proceeding.
3. Create a CL in the tools/build workspace, adding the new Release tester
to `win_chromium_rel_ng`'s `bot_ids` list
in `scripts/slave/recipe_modules/chromium_tests/trybots.py`. Rerun
`scripts/slave/recipes.py --use-bootstrap test train`.
4. Once the CL in (3) lands, the commit queue will **immediately** start
running tests on the CoolNewGPUType configuration. Be vigilant and make
sure that tryjobs are green. If they are red for any reason, revert the CL
and figure out offline what went wrong.
[tests/masters_recipes_test.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/tests/masters_recipes_test.py
#### How to add a new optional try bot
The "optional" GPU try bots are a concession to the reality that there are some
long-running GPU test suites that simply can not run against every Chromium CL.
They run some additional tests that are usually run only on the
chromium.gpu.fyi waterfall. Some of these tests, like the WebGL 2.0 conformance
suite, are intended to be run on the normal try bots once hardware capacity is
available. Some are not intended to ever run on the normal try bots.
The optional try bots are a little different because they mirror waterfall bots
that don't actually exist. The waterfall bots' specifications exist only to
tell the optional try bots which tests to run.
Let's say that you intended to add a new such optional try bot on Windows. Call
it `win_new_optional_tests_rel` for example. Now, if you wanted to just add
this GPU type to the existing `win_optional_gpu_tests_rel` try bot, you'd
just follow the instructions above
([How to start running tests on a new GPU type on an existing try bot](#How-to-start-running-tests-on-a-new-GPU-type-on-an-existing-try-bot)). The steps below describe how to spin up
an entire new optional try bot.
1. Make sure that you have some swarming capacity for the new GPU type. Since
it's not running against all Chromium CLs you don't need the recommended 30
minimum bots, though ~10 would be good.
1. Create a CL in the Chromium workspace:
1. Add your new bot (for example, "Optional Win7 Release
(CoolNewGPUType)") to the chromium.gpu.fyi waterfall in
[generate_buildbot_json.py]. (Note, this is a bad example: the
"optional" bots have special semantics in this script. You'd probably
want to define some new category of bot if you didn't intend to add
this to `win_optional_gpu_tests_rel`.)
1. Re-run the script to regenerate the JSON files.
1. Land the above CL.
1. Create a CL in the tools/build workspace:
1. Modify `masters/master.tryserver.chromium.win`'s [master.cfg] and
[slaves.cfg] to add the new tryserver. Follow the pattern for the
existing `win_optional_gpu_tests_rel` tryserver. Namely, add the new
entry to master.cfg, and add the new tryserver to the
`optional_builders` list in `slaves.cfg`.
1. Modify [`chromium_gpu_fyi.py`][chromium_gpu_fyi.py] to add the new
"Optional Win7 Release (CoolNewGPUType)" entry.
1. Modify [`trybots.py`][trybots.py] to add
the new `win_new_optional_tests_rel` try bot, mirroring "Optional
Win7 Release (CoolNewGPUType)".
1. Land the above CL and request an off-hours restart of the
tryserver.chromium.win waterfall.
1. Now you can send CLs to the new bot with:
`git cl try -m tryserver.chromium.win -b win_new_optional_tests_rel`
[master.cfg]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.tryserver.chromium.win/master.cfg
[slaves.cfg]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.tryserver.chromium.win/slaves.cfg
#### How to test and deploy a driver update
Let's say that you want to roll out an update to the graphics drivers on one of
the configurations like the Win7 NVIDIA bots. The responsible way to do this is
to run the new driver on one of the waterfalls for a day or two to make sure
the tests are running reliably green before rolling out the driver update
everywhere. To do this:
1. Work with the Chrome Infrastructure Labs team to deploy a single,
non-swarmed, physical machine on the chromium.gpu.fyi waterfall running the
new driver. The OS and GPU should exactly match the configuration you
intend to upgrade. See
[How to add a new, non-swarmed, physical bot to the chromium.gpu.fyi waterfall](#How-to-add-a-new_non-swarmed_physical-bot-to-the-chromium_gpu_fyi-waterfall).
2. Hopefully, the new machine will pass the pixel tests. If it doesn't, then
unfortunately, it'll be necessary to follow the instructions on
[updating the pixel tests] to temporarily suppress the failures on this
particular configuration. Keep the time window for these test suppressions
as narrow as possible.
3. Watch the new machine for a day or two to make sure it's stable.
4. When it is, ask the Chrome Infrastructure Labs team to roll out the driver
update across all of the similarly configured bots in the swarming pool.
5. If necessary, update pixel test expectations and remove the suppressions
added above.
6. Prepare and land a CL removing the temporary machine from the
chromium.gpu.fyi waterfall. Request a waterfall restart.
7. File a ticket with the Chrome Infrastructure Labs team to reclaim the
temporary machine.
Note that with recent improvements to Swarming, in particular [this
RFE](https://github.com/luci/luci-py/issues/253) and others, these steps are no
longer strictly necessary it's possible to target Swarming jobs at a
particular driver version. If
[`generate_buildbot_json.py`][generate_buildbot_json.py] were improved to be
more specific about the driver version on the various bots, then the machines
with the new drivers could simply be added to the Swarming pool, and this
process could be a lot simpler. Patches welcome. :)
[updating the pixel tests]: https://www.chromium.org/developers/testing/gpu-testing/#TOC-Updating-and-Adding-New-Pixel-Tests-to-the-GPU-Bots
## Credentials for various servers
Working with the GPU bots requires credentials to various services: the isolate
server, the swarming server, and cloud storage.
### Isolate server credentials
To upload and download isolates you must first authenticate to the isolate
server. From a Chromium checkout, run:
* `./src/tools/swarming_client/auth.py login
--service=https://isolateserver.appspot.com`
This will open a web browser to complete the authentication flow. A @google.com
email address is required in order to properly authenticate.
To test your authentication, find a hash for a recent isolate. Consult the
instructions on [Running Binaries from the Bots Locally] to find a random hash
from a target like `gl_tests`. Then run the following:
[Running Binaries from the Bots Locally]: https://www.chromium.org/developers/testing/gpu-testing#TOC-Running-Binaries-from-the-Bots-Locally
If authentication succeeded, this will silently download a file called
`delete_me` into the current working directory. If it failed, the script will
report multiple authentication errors. In this case, use the following command
to log out and then try again:
* `./src/tools/swarming_client/auth.py logout
--service=https://isolateserver.appspot.com`
### Swarming server credentials
The swarming server uses the same `auth.py` script as the isolate server. You
will need to authenticate if you want to manually download the results of
previous swarming jobs, trigger your own jobs, or run `swarming.py reproduce`
to re-run a remote job on your local workstation. Follow the instructions
above, replacing the service with `https://chromium-swarm.appspot.com`.
### Cloud storage credentials
Authentication to Google Cloud Storage is needed for a couple of reasons:
uploading pixel test results to the cloud, and potentially uploading and
downloading builds as well, at least in Debug mode. Use the copy of gsutil in
`depot_tools/third_party/gsutil/gsutil`, and follow the [Google Cloud Storage
instructions] to authenticate. You must use your @google.com email address and
be a member of the Chrome GPU team in order to receive read-write access to the
appropriate cloud storage buckets. Roughly:
1. Run `gsutil config`
2. Copy/paste the URL into your browser
3. Log in with your @google.com account
4. Allow the app to access the information it requests
5. Copy-paste the resulting key back into your Terminal
6. Press "enter" when prompted for a project-id (i.e., leave it empty)
At this point you should be able to write to the cloud storage bucket.
Navigate to
<https://console.developers.google.com/storage/chromium-gpu-archive> to view
the contents of the cloud storage bucket.
[Google Cloud Storage instructions]: https://developers.google.com/storage/docs/gsutil

Binary file not shown.

After

(image error) Size: 12 KiB

298
docs/gpu/pixel_wrangling.md Normal file

@ -0,0 +1,298 @@
# GPU Bots & Pixel Wrangling
![](images/wrangler.png)
(December 2017: presentation on GPU bots and pixel wrangling: see [slides].)
GPU Pixel Wrangling is the process of keeping various GPU bots green. On the
GPU bots, tests run on physical hardware with real GPUs, not in VMs like the
majority of the bots on the Chromium waterfall.
[slides]: https://docs.google.com/presentation/d/1sZjyNe2apUhwr5sinRfPs7eTzH-3zO0VQ-Cj-8DlEDQ/edit?usp=sharing
[TOC]
## Fleet Status
The following links (sorry, Google employees only) show the status of various
GPU bots in the fleet.
Primary configurations:
* [Windows 10 Quadro P400 Pool](http://shortn/_dmtaFfY2Jq)
* [Windows 10 Intel HD 630 Pool](http://shortn/_QsoGIGIFYd)
* [Linux Quadro P400 Pool](http://shortn/_fNgNs1uROQ)
* [Linux Intel HD 630 Pool](http://shortn/_dqEGjCGMHT)
* [Mac AMD Retina 10.12.6 GPU Pool](http://shortn/_BcrVmfRoSo)
* [Mac Mini Chrome Pool](http://shortn/_Ru8NESapPM)
* [Android Nexus 5X Chrome Pool](http://shortn/_G3j7AVmuNR)
Secondary configurations:
* [Windows 7 Quadro P400 Pool](http://shortn/_cuxSKC15UX)
* [Windows AMD R7 240 GPU Pool](http://shortn/_XET7RTMHQm)
* [Mac NVIDIA Retina 10.12.6 GPU Pool](http://shortn/_jQWG7W71Ek)
## GPU Bots' Waterfalls
The waterfalls work much like any other; see the [Tour of the Chromium Buildbot
Waterfall] for a more detailed explanation of how this is laid out. We have
more subtle configurations because the GPU matters, not just the OS and release
v. debug. Hence we have Windows Nvidia Release bots, Mac Intel Debug bots, and
so on. The waterfalls were interested in are:
* [Chromium GPU]
* Various operating systems, configurations, GPUs, etc.
* [Chromium GPU FYI]
* These bots run less-standard configurations like Windows with AMD GPUs,
Linux with Intel GPUs, etc.
* These bots build with top of tree ANGLE rather than the `DEPS` version.
* The [ANGLE tryservers] help ensure that these bots stay green. However,
it is possible that due to ANGLE changes these bots may be red while
the chromium.gpu bots are green.
* The [ANGLE Wrangler] is on-call to help resolve ANGLE-related breakage
on this watefall.
* To determine if a different ANGLE revision was used between two builds,
compare the `got_angle_revision` buildbot property on the GPU builders
or `parent_got_angle_revision` on the testers. This revision can be
used to do a `git log` in the `third_party/angle` repository.
<!-- TODO(kainino): update link when the page is migrated -->
[Tour of the Chromium Buildbot Waterfall]: http://www.chromium.org/developers/testing/chromium-build-infrastructure/tour-of-the-chromium-buildbot
[Chromium GPU]: https://ci.chromium.org/p/chromium/g/chromium.gpu/console?reload=120
[Chromium GPU FYI]: https://ci.chromium.org/p/chromium/g/chromium.gpu.fyi/console?reload=120
[ANGLE tryservers]: https://build.chromium.org/p/tryserver.chromium.angle/waterfall
<!-- TODO(kainino): update link when the page is migrated -->
[ANGLE Wrangler]: https://sites.google.com/a/chromium.org/dev/developers/how-tos/angle-wrangling
## Test Suites
The bots run several test suites. The majority of them have been migrated to
the Telemetry harness, and are run within the full browser, in order to better
test the code that is actually shipped. As of this writing, the tests included:
* Tests using the Telemetry harness:
* The WebGL conformance tests: `webgl_conformance_integration_test.py`
* A Google Maps test: `maps_integration_test.py`
* Context loss tests: `context_lost_integration_test.py`
* Depth capture tests: `depth_capture_integration_test.py`
* GPU process launch tests: `gpu_process_integration_test.py`
* Hardware acceleration validation tests:
`hardware_accelerated_feature_integration_test.py`
* Pixel tests validating the end-to-end rendering pipeline:
`pixel_integration_test.py`
* Stress tests of the screenshot functionality other tests use:
`screenshot_sync_integration_test.py`
* `angle_unittests`: see `src/gpu/gpu.gyp`
* drawElements tests (on the chromium.gpu.fyi waterfall): see
`src/third_party/angle/src/tests/BUILD.gn`
* `gles2_conform_test` (requires internal sources): see
`src/gpu/gles2_conform_support/gles2_conform_test.gyp`
* `gl_tests`: see `src/gpu/BUILD.gn`
* `gl_unittests`: see `src/ui/gl/BUILD.gn`
And more. See `src/content/test/gpu/generate_buildbot_json.py` for the
complete description of bots and tests.
Additionally, the Release bots run:
* `tab_capture_end2end_tests:` see
`src/chrome/browser/extensions/api/tab_capture/tab_capture_apitest.cc` and
`src/chrome/browser/extensions/api/cast_streaming/cast_streaming_apitest.cc`
### More Details
More details about the bots' setup can be found on the [GPU Testing] page.
[GPU Testing]: https://sites.google.com/a/chromium.org/dev/developers/testing/gpu-testing
## Wrangling
### Prerequisites
1. Ideally a wrangler should be a Chromium committer. If you're on the GPU
pixel wrangling rotation, there will be an email notifying you of the upcoming
shift, and a calendar appointment.
* If you aren't a committer, don't panic. It's still best for everyone on
the team to become acquainted with the procedures of maintaining the
GPU bots.
* In this case you'll upload CLs to Gerrit to perform reverts (optionally
using the new "Revert" button in the UI), and might consider using
`TBR=` to speed through trivial and urgent CLs. In general, try to send
all CLs through the commit queue.
* Contact bajones, kainino, kbr, vmiura, zmo, or another member of the
Chrome GPU team who's already a committer for help landing patches or
reverts during your shift.
2. Apply for [access to the bots].
[access to the bots]: https://sites.google.com/a/google.com/chrome-infrastructure/golo/remote-access?pli=1
### How to Keep the Bots Green
1. Watch for redness on the tree.
1. [Sheriff-O-Matic now has support for the chromium.gpu.fyi waterfall]!
1. The chromium.gpu bots are covered under Sheriff-O-Matic's [Chromium
tab]. As pixel wrangler, ignore any non-GPU test failures in this tab.
1. The bots are expected to be green all the time. Flakiness on these bots
is neither expected nor acceptable.
1. If a bot goes consistently red, it's necessary to figure out whether a
recent CL caused it, or whether it's a problem with the bot or
infrastructure.
1. If it looks like a problem with the bot (deep problems like failing to
check out the sources, the isolate server failing, etc.) notify the
Chromium troopers and file a P1 bug with labels: Infra\>Labs,
Infra\>Troopers and Internals\>GPU\>Testing. See the general [tree
sheriffing page] for more details.
1. Otherwise, examine the builds just before and after the redness was
introduced. Look at the revisions in the builds before and after the
failure was introduced.
1. **File a bug** capturing the regression range and excerpts of any
associated logs. Regressions should be marked P1. CC engineers who you
think may be able to help triage the issue. Keep in mind that the logs
on the bots expire after a few days, so make sure to add copies of
relevant logs to the bug report.
1. Use the `Hotlist=PixelWrangler` label to mark bugs that require the
pixel wrangler's attention, so it's easy to find relevant bugs when
handing off shifts.
1. Study the regression range carefully. Use drover to revert any CLs
which break the chromium.gpu bots. Use your judgment about
chromium.gpu.fyi, since not all bots are covered by trybots. In the
revert message, provide a clear description of what broke, links to
failing builds, and excerpts of the failure logs, because the build
logs expire after a few days.
1. Make sure the bots are running jobs.
1. Keep an eye on the console views of the various bots.
1. Make sure the bots are all actively processing jobs. If they go offline
for a long period of time, the "summary bubble" at the top may still be
green, but the column in the console view will be gray.
1. Email the Chromium troopers if you find a bot that's not processing
jobs.
1. Make sure the GPU try servers are in good health.
1. The GPU try servers are no longer distinct bots on a separate
waterfall, but instead run as part of the regular tryjobs on the
Chromium waterfalls. The GPU tests run as part of the following
tryservers' jobs:
1. <code>[linux_chromium_rel_ng]</code> on the [luci.chromium.try]
waterfall
<!-- TODO(kainino): update link to luci.chromium.try -->
1. <code>[mac_chromium_rel_ng]</code> on the [tryserver.chromium.mac]
waterfall
<!-- TODO(kainino): update link to luci.chromium.try -->
1. <code>[win7_chromium_rel_ng]</code> on the [tryserver.chromium.win]
waterfall
1. The best tool to use to quickly find flakiness on the tryservers is the
new [Chromium Try Flakes] tool. Look for the names of GPU tests (like
maps_pixel_test) as well as the test machines (e.g.
mac_chromium_rel_ng). If you see a flaky test, file a bug like [this
one](http://crbug.com/444430). Also look for compile flakes that may
indicate that a bot needs to be clobbered. Contact the Chromium
sheriffs or troopers if so.
1. Glance at these trybots from time to time and see if any GPU tests are
failing frequently. **Note** that test failures are **expected** on
these bots: individuals' patches may fail to apply, fail to compile, or
break various tests. Look specifically for patterns in the failures. It
isn't necessary to spend a lot of time investigating each individual
failure. (Use the "Show: 200" link at the bottom of the page to see
more history.)
1. If the same set of tests are failing repeatedly, look at the individual
runs. Examine the swarming results and see whether they're all running
on the same machine. (This is the "Bot assigned to task" when clicking
any of the test's shards in the build logs.) If they are, something
might be wrong with the hardware. Use the [Swarming Server Stats] tool
to drill down into the specific builder.
1. If you see the same test failing in a flaky manner across multiple
machines and multiple CLs, it's crucial to investigate why it's
happening. [crbug.com/395914](http://crbug.com/395914) was one example
of an innocent-looking Blink change which made it through the commit
queue and introduced widespread flakiness in a range of GPU tests. The
failures were also most visible on the try servers as opposed to the
main waterfalls.
1. Check if any pixel test failures are actual failures or need to be
rebaselined.
1. For a given build failing the pixel tests, click the "stdio" link of
the "pixel" step.
1. The output will contain a link of the form
<http://chromium-browser-gpu-tests.commondatastorage.googleapis.com/view_test_results.html?242523_Linux_Release_Intel__telemetry>
1. Visit the link to see whether the generated or reference images look
incorrect.
1. All of the reference images for all of the bots are stored in cloud
storage under [chromium-gpu-archive/reference-images]. They are indexed
by version number, OS, GPU vendor, GPU device, and whether or not
antialiasing is enabled in that configuration. You can download the
reference images individually to examine them in detail.
1. Rebaseline pixel test reference images if necessary.
1. Follow the [instructions on the GPU testing page].
1. Alternatively, if absolutely necessary, you can use the [Chrome
Internal GPU Pixel Wrangling Instructions] to delete just the broken
reference images for a particular configuration.
1. Update Telemetry-based test expectations if necessary.
1. Most of the GPU tests are run inside a full Chromium browser, launched
by Telemetry, rather than a Gtest harness. The tests and their
expectations are contained in [src/content/test/gpu/gpu_tests/] . See
for example <code>[webgl_conformance_expectations.py]</code>,
<code>[gpu_process_expectations.py]</code> and
<code>[pixel_expectations.py]</code>.
1. See the header of the file a list of modifiers to specify a bot
configuration. It is possible to specify OS (down to a specific
version, say, Windows 7 or Mountain Lion), GPU vendor
(NVIDIA/AMD/Intel), and a specific GPU device.
1. The key is to maintain the highest coverage: if you have to disable a
test, disable it only on the specific configurations it's failing. Note
that it is not possible to discern between Debug and Release
configurations.
1. Mark tests failing or skipped, which will suppress flaky failures, only
as a last resort. It is only really necessary to suppress failures that
are showing up on the GPU tryservers, since failing tests no longer
close the Chromium tree.
1. Please read the section on [stamping out flakiness] for motivation on
how important it is to eliminate flakiness rather than hiding it.
1. For the remaining Gtest-style tests, use the [`DISABLED_`
modifier][gtest-DISABLED] to suppress any failures if necessary.
[Sheriff-O-Matic now has support for the chromium.gpu.fyi waterfall]: https://sheriff-o-matic.appspot.com/chromium.gpu.fyi
[Chromium tab]: https://sheriff-o-matic.appspot.com/chromium
[tree sheriffing page]: https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs
[linux_chromium_rel_ng]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_chromium_rel_ng
[luci.chromium.try]: https://ci.chromium.org/p/chromium/g/luci.chromium.try/builders
[mac_chromium_rel_ng]: https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_chromium_rel_ng/
[tryserver.chromium.mac]: https://ci.chromium.org/p/chromium/g/tryserver.chromium.mac/builders
[win7_chromium_rel_ng]: https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/
[tryserver.chromium.win]: https://ci.chromium.org/p/chromium/g/tryserver.chromium.win/builders
[Chromium Try Flakes]: http://chromium-try-flakes.appspot.com/
<!-- TODO(kainino): link doesn't work, but is still included from chromium-swarm homepage so not removing it now -->
[Swarming Server Stats]: https://chromium-swarm.appspot.com/stats
[chromium-gpu-archive/reference-images]: https://console.developers.google.com/storage/chromium-gpu-archive/reference-images
[instructions on the GPU testing page]: https://sites.google.com/a/chromium.org/dev/developers/testing/gpu-testing#TOC-Updating-and-Adding-New-Pixel-Tests-to-the-GPU-Bots
[Chrome Internal GPU Pixel Wrangling Instructions]: https://sites.google.com/a/google.com/client3d/documents/chrome-internal-gpu-pixel-wrangling-instructions
[src/content/test/gpu/gpu_tests/]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/
[webgl_conformance_expectations.py]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/webgl_conformance_expectations.py
[gpu_process_expectations.py]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/gpu_process_expectations.py
[pixel_expectations.py]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/pixel_expectations.py
[stamping out flakiness]: gpu_testing.md#Stamping-out-Flakiness
[gtest-DISABLED]: https://github.com/google/googletest/blob/master/googletest/docs/AdvancedGuide.md#temporarily-disabling-tests
### When Bots Misbehave (SSHing into a bot)
1. See the [Chrome Internal GPU Pixel Wrangling Instructions] for information
on ssh'ing in to the GPU bots.
[Chrome Internal GPU Pixel Wrangling Instructions]: https://sites.google.com/a/google.com/client3d/documents/chrome-internal-gpu-pixel-wrangling-instructions
### Reproducing WebGL conformance test failures locally
1. From the buildbot build output page, click on the failed shard to get to
the swarming task page. Scroll to the bottom of the left panel for a
command to run the task locally. This will automatically download the build
and any other inputs needed.
2. Alternatively, to run the test on a local build, pass the arguments
`--browser=exact --browser-executable=/path/to/binary` to
`content/test/gpu/run_gpu_integration_test.py`.
Also see the [telemetry documentation].
[telemetry documentation]: https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/docs/run_benchmarks_locally.md
## Extending the GPU Pixel Wrangling Rotation
See the [Chrome Internal GPU Pixel Wrangling Instructions] for information on extending the rotation.
[Chrome Internal GPU Pixel Wrangling Instructions]: https://sites.google.com/a/google.com/client3d/documents/chrome-internal-gpu-pixel-wrangling-instructions