Port GPU documentation to Markdown

This ports the following wiki pages into markdown:
https://www.chromium.org/developers/testing/gpu-testing
https://www.chromium.org/developers/testing/gpu-testing/gpu-bot-details
https://www.chromium.org/developers/how-tos/gpu-wrangling
https://www.chromium.org/developers/how-tos/debugging-gpu-related-code

and updates *some* of the old outdated content.

Bug: 813153
Change-Id: Ic5f1b58659bbdb691343785cb18c50f4d55c177f
Reviewed-on: https://chromium-review.googlesource.com/987233
Reviewed-by: Kenneth Russell <kbr@chromium.org>
Commit-Queue: Kenneth Russell <kbr@chromium.org>
Cr-Commit-Position: refs/heads/master@{#547060}

This commit is contained in:

Kai Ninomiya

2018-03-30 01:30:56 +00:00

committed by

Commit Bot

parent 0e542189cc

commit a6429fb3a6

5 changed files with 1643 additions and 0 deletions

docs/gpu

debugging_gpu_related_code.md gpu_testing.md gpu_testing_bot_details.md

images

wrangler.png

pixel_wrangling.md

									
										235

docs/gpu/debugging_gpu_related_code.md
									
										Normal file
									
				@ -0,0 +1,235 @@

				# Debugging GPU related code

				Chromium's GPU system is multi-process, which can make debugging it rather

				difficult. See [GPU Command Buffer] for some of the nitty gitty. These are just

				a few notes to help with debugging.

				[TOC]

				<!-- TODO(kainino): update link if the page moves -->

				[GPU Command Buffer]: https://sites.google.com/a/chromium.org/dev/developers/design-documents/gpu-command-buffer

				## Renderer Process Code

				### `--enable-gpu-client-logging`

				If you are trying to track down a bug in a GPU client process (compositing,

				WebGL, Skia/Ganesh, Aura), then in a debug build you can use the

				`--enable-gpu-client-logging` flag, which will show every GL call sent to the

				GPU service process. (From the point of view of a GPU client, it's calling

				OpenGL ES functions - but the real driver calls are made in the GPU process.)

				```

				[4782:4782:1219/141706:INFO:gles2_implementation.cc(1026)] [.WebGLRenderingContext] glUseProgram(3)

				[4782:4782:1219/141706:INFO:gles2_implementation_impl_autogen.h(401)] [.WebGLRenderingContext] glGenBuffers(1, 0x7fffc9e1269c)

				[4782:4782:1219/141706:INFO:gles2_implementation_impl_autogen.h(416)]   0: 1

				[4782:4782:1219/141706:INFO:gles2_implementation_impl_autogen.h(23)] [.WebGLRenderingContext] glBindBuffer(GL_ARRAY_BUFFER, 1)

				[4782:4782:1219/141706:INFO:gles2_implementation.cc(1313)] [.WebGLRenderingContext] glBufferData(GL_ARRAY_BUFFER, 36, 0x7fd268580120, GL_STATIC_DRAW)

				[4782:4782:1219/141706:INFO:gles2_implementation.cc(2480)] [.WebGLRenderingContext] glEnableVertexAttribArray(0)

				[4782:4782:1219/141706:INFO:gles2_implementation.cc(1140)] [.WebGLRenderingContext] glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, 0)

				[4782:4782:1219/141706:INFO:gles2_implementation_impl_autogen.h(135)] [.WebGLRenderingContext] glClear(16640)

				[4782:4782:1219/141706:INFO:gles2_implementation.cc(2490)] [.WebGLRenderingContext] glDrawArrays(GL_TRIANGLES, 0, 3)

				```

				### Checking about:gpu

				The GPU process logs many errors and warnings. You can see these by navigating

				to `about:gpu`. Logs appear at the bottom of the page. You can also see them

				on standard output if Chromium is run from the command line on Linux/Mac.

				On Windows, you need debugging tools (like VS, WinDbg, etc.) to connect to the

				debug output stream.

				**Note:** If `about:gpu` is telling you that your GPU is disabled and

				hardware acceleration is unavailable, it might be a problem with your GPU being

				unsupported. To override this and turn on hardware acceleration anyway, you can

				use the `--ignore-gpu-blacklist` command line option when starting Chromium.

				### Breaking on GL Error

				In <code>[gles2_implementation.h]</code>, there is some code like this:

				```cpp

				// Set to 1 to have the client fail when a GL error is generated.

				// This helps find bugs in the renderer since the debugger stops on the error.

				#if DCHECK_IS_ON()

				#if 0

				#define GL_CLIENT_FAIL_GL_ERRORS

				#endif

				#endif

				```

				Change that `#if 0` to `#if 1`, build a debug build, then run in a debugger.

				The debugger will break when any renderer code sees a GL error, and you should

				be able to examine the call stack to find the issue.

				[gles2_implementation.h]: https://chromium.googlesource.com/chromium/src/+/master/gpu/command_buffer/client/gles2_implementation.h

				### Labeling your calls

				The output of all of the errors, warnings and debug logs are prefixed. You can

				set this prefix by calling `glPushGroupMarkerEXT`, `glPopGroupMarkerEXT` and

				`glInsertEventMarkerEXT`. `glPushGroupMarkerEXT` appends a string to the end of

				the current log prefix (think namespace in C++). `glPopGroupmarkerEXT` pops off

				the last string appended. `glInsertEventMarkerEXT` sets a suffix for the

				current string. Example:

				```cpp

				glPushGroupMarkerEXT(0, "Foo");        // -> log prefix = "Foo"

				glInsertEventMarkerEXT(0, "This");     // -> log prefix = "Foo.This"

				glInsertEventMarkerEXT(0, "That");     // -> log prefix = "Foo.That"

				glPushGroupMarkerEXT(0, "Bar");        // -> log prefix = "Foo.Bar"

				glInsertEventMarkerEXT(0, "Orange");   // -> log prefix = "Foo.Bar.Orange"

				glInsertEventMarkerEXT(0, "Banana");   // -> log prefix = "Foo.Bar.Banana"

				glPopGroupMarkerEXT();                 // -> log prefix = "Foo.That"

				```

				### Making a reduced test case.

				You can often make a simple OpenGL-ES-2.0-only C++ reduced test case that is

				relatively quick to compile and test, by adding tests to the `gl_tests` target.

				Those tests exist in `src/gpu/command_buffer/tests` and are made part of the

				build in `src/gpu/gpu.gyp`. Build with `ninja -C out/Debug gl_tests`. All the

				same command line options listed on this page will work with the `gl_tests`,

				plus `--gtest_filter=NameOfTest` to run a specific test. Note the `gl_tests`

				are not multi-process, so they probably won't help with race conditions, but

				they do go through most of the same code and are much easier to debug.

				### Debugging the renderer process

				Given that Chrome starts many renderer processes I find it's easier if I either

				have a remote webpage I can access or I make one locally and then use a local

				server to serve it like `python -m SimpleHTTPServer`. Then

				On Linux this works for me:

				*   `out/Debug/chromium --no-sandbox --renderer-cmd-prefix="xterm -e gdb

				    --args" http://localhost:8000/page-to-repro.html`

				On OSX this works for me:

				*   `out/Debug/Chromium.app/Contents/MacOSX/Chromium --no-sandbox

				    --renderer-cmd-prefix="xterm -e gdb --args"

				    http://localhost:8000/page-to-repro.html`

				On Windows I use `--renderer-startup-dialog` and then connect to the listed process.

				Note 1: On Linux and OSX I use `cgdb` instead of `gdb`.

				Note 2: GDB can take minutes to index symbol. To save time, you can precache

				that computation by running `build/gdb-add-index out/Debug/chrome`.

				## GPU Process Code

				### `--enable-gpu-service-logging`

				In a debug build, this will print all actual calls into the GL driver.

				```

				[5497:5497:1219/142413:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kEnableVertexAttribArray

				[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(905)] glEnableVertexAttribArray(0)

				[5497:5497:1219/142413:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kVertexAttribPointer

				[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(1573)] glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, 0)

				[5497:5497:1219/142413:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kClear

				[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(746)] glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE)

				[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(840)] glDepthMask(GL_TRUE)

				[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(900)] glEnable(GL_DEPTH_TEST)

				[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(1371)] glStencilMaskSeparate(GL_FRONT, 4294967295)

				[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(1371)] glStencilMaskSeparate(GL_BACK, 4294967295)

				[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(860)] glDisable(GL_STENCIL_TEST)

				[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(860)] glDisable(GL_CULL_FACE)

				[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(860)] glDisable(GL_SCISSOR_TEST)

				[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(900)] glEnable(GL_BLEND)

				[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(721)] glClear(16640)

				[5497:5497:1219/142413:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kDrawArrays

				[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(870)] glDrawArrays(GL_TRIANGLES, 0, 3)

				```

				Note that GL calls into the driver are not currently prefixed (todo?). But, you

				can tell from the commands logged which command, from which context caused the

				following GL calls to be made.

				Also note that client resource IDs are virtual IDs, so calls into the real GL

				driver will not match (though some commands print the mapping). Examples:

				```

				[5497:5497:1219/142413:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kBindTexture

				[5497:5497:1219/142413:INFO:gles2_cmd_decoder.cc(837)] [.WebGLRenderingContext] glBindTexture: client_id = 2, service_id = 10

				[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(662)] glBindTexture(GL_TEXTURE_2D, 10)

				[5497:5497:1219/142413:ERROR:gles2_cmd_decoder.cc(3301)] [0052064A367F0000]cmd: kBindBuffer

				[5497:5497:1219/142413:INFO:gles2_cmd_decoder.cc(837)] [0052064A367F0000] glBindBuffer: client_id = 2, service_id = 6

				[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(637)] glBindBuffer(GL_ARRAY_BUFFER, 6)

				[5497:5497:1219/142413:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kBindFramebuffer

				[5497:5497:1219/142413:INFO:gles2_cmd_decoder.cc(837)] [.WebGLRenderingContext] glBindFramebuffer: client_id = 1, service_id = 3

				[5497:5497:1219/142413:INFO:gl_bindings_autogen_gl.cc(652)] glBindFramebufferEXT(GL_FRAMEBUFFER, 3)

				```

				etc... so that you can see renderer process code would be using the client IDs

				where as the gpu process is using the service IDs. This is useful for matching

				up calls if you're dumping both client and service GL logs.

				### `--enable-gpu-debugging`

				In any build, this will call glGetError after each command

				### `--enable-gpu-command-logging`

				This will print the name of each GPU command before it is executed.

				```

				[5234:5234:1219/052139:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kBindBuffer

				[5234:5234:1219/052139:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kBufferData

				[5234:5234:1219/052139:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: SetToken

				[5234:5234:1219/052139:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kEnableVertexAttribArray

				[5234:5234:1219/052139:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kVertexAttribPointer

				[5234:5234:1219/052139:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kClear

				[5234:5234:1219/052139:ERROR:gles2_cmd_decoder.cc(3301)] [.WebGLRenderingContext]cmd: kDrawArrays

				```

				### Debugging in the GPU Process

				Given the multi-processness of chromium it can be hard to debug both sides.

				Turing on all the logging and having a small test case is useful. One minor

				suggestion, if you have some idea where the bug is happening a call to some

				obscure gl function like `glHint()` can give you a place to catch a command

				being processed in the GPU process (put a break point on

				`gpu::gles2::GLES2DecoderImpl::HandleHint`. Once in you can follow the commands

				after that. All of them go through `gpu::gles2::GLES2DecoderImpl::DoCommand`.

				To actually debug the GPU process:

				On Linux this works for me:

				*   `out/Debug/chromium --no-sandbox --gpu-launcher="xterm -e gdb --args"

				    http://localhost:8000/page-to-repro.html`

				On OSX this works for me:

				*   `out/Debug/Chromium.app/Contents/MacOSX/Chromium --no-sandbox

				    --gpu-launcher="xterm -e gdb --args"

				    http://localhost:8000/page-to-repro.html`

				On Windows I use `--gpu-startup-dialog` and then connect to the listed process.

				### `GPU PARSE ERROR`

				If you see this message in `about:gpu` or your console and you didn't cause it

				directly (by calling `glLoseContextCHROMIUM`) and it's something other than 5

				that means there's likely a bug. Please file an issue at <http://crbug.com/new>.

				## Debugging Performance

				If you have something to add here please add it. Most perf debugging is done

				using `about:tracing` (see [Trace Event Profiling] for details). Otherwise,

				be aware that, since the system is multi-process, calling:

				```

				start = GetTime()

				DoSomething()

				glFinish()

				end = GetTime

				printf("elapsedTime = %f\n", end - start);

				```

				**will not** give you meaningful results.

				[See Trace Event Profiling for details]: https://sites.google.com/a/chromium.org/dev/developers/how-tos/trace-event-profiling-tool

									
										571

docs/gpu/gpu_testing.md
									
										Normal file
									
				@ -0,0 +1,571 @@

				# GPU Testing

				This set of pages documents the setup and operation of the GPU bots and try

				servers, which verify the correctness of Chrome's graphically accelerated

				rendering pipeline.

				[TOC]

				## Overview

				The GPU bots run a different set of tests than the majority of the Chromium

				test machines. The GPU bots specifically focus on tests which exercise the

				graphics processor, and whose results are likely to vary between graphics card

				vendors.

				Most of the tests on the GPU bots are run via the [Telemetry framework].

				Telemetry was originally conceived as a performance testing framework, but has

				proven valuable for correctness testing as well. Telemetry directs the browser

				to perform various operations, like page navigation and test execution, from

				external scripts written in Python. The GPU bots launch the full Chromium

				browser via Telemetry for the majority of the tests. Using the full browser to

				execute tests, rather than smaller test harnesses, has yielded several

				advantages: testing what is shipped, improved reliability, and improved

				performance.

				[Telemetry framework]: https://github.com/catapult-project/catapult/tree/master/telemetry

				A subset of the tests, called "pixel tests", grab screen snapshots of the web

				page in order to validate Chromium's rendering architecture end-to-end. Where

				necessary, GPU-specific results are maintained for these tests. Some of these

				tests verify just a few pixels, using handwritten code, in order to use the

				same validation for all brands of GPUs.

				The GPU bots use the Chrome infrastructure team's [recipe framework], and

				specifically the [`chromium`][recipes/chromium] and

				[`chromium_trybot`][recipes/chromium_trybot] recipes, to describe what tests to

				execute. Compared to the legacy master-side buildbot scripts, recipes make it

				easy to add new steps to the bots, change the bots' configuration, and run the

				tests locally in the same way that they are run on the bots. Additionally, the

				`chromium` and `chromium_trybot` recipes make it possible to send try jobs which

				add new steps to the bots. This single capability is a huge step forward from

				the previous configuration where new steps were added blindly, and could cause

				failures on the tryservers. For more details about the configuration of the

				bots, see the [GPU bot details].

				[recipe framework]: https://chromium.googlesource.com/external/github.com/luci/recipes-py/+/master/doc/user_guide.md

				[recipes/chromium]:        https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipes/chromium.py

				[recipes/chromium_trybot]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipes/chromium_trybot.py

				[GPU bot details]: gpu_testing_bot_details.md

				The physical hardware for the GPU bots lives in the Swarming pool\*. The

				Swarming infrastructure ([new docs][new-testing-infra], [older but currently

				more complete docs][isolated-testing-infra]) provides many benefits:

				*   Increased parallelism for the tests; all steps for a given tryjob or

				    waterfall build run in parallel.

				*   Simpler scaling: just add more hardware in order to get more capacity. No

				    manual configuration or distribution of hardware needed.

				*   Easier to run certain tests only on certain operating systems or types of

				    GPUs.

				*   Easier to add new operating systems or types of GPUs.

				*   Clearer description of the binary and data dependencies of the tests. If

				    they run successfully locally, they'll run successfully on the bots.

				(\* All but a few one-off GPU bots are in the swarming pool. The exceptions to

				the rule are described in the [GPU bot details].)

				The bots on the [chromium.gpu.fyi] waterfall are configured to always test

				top-of-tree ANGLE. This setup is done with a few lines of code in the

				[tools/build workspace]; search the code for "angle".

				These aspects of the bots are described in more detail below, and in linked

				pages. There is a [presentation][bots-presentation] which gives a brief

				overview of this documentation and links back to various portions.

				<!-- XXX: broken link -->

				[new-testing-infra]: https://github.com/luci/luci-py/wiki

				[isolated-testing-infra]: https://www.chromium.org/developers/testing/isolated-testing/infrastructure

				[chromium.gpu]: https://build.chromium.org/p/chromium.gpu/console

				[chromium.gpu.fyi]: https://build.chromium.org/p/chromium.gpu.fyi/console

				[tools/build workspace]: https://code.google.com/p/chromium/codesearch#chromium/build/scripts/slave/recipe_modules/chromium_tests/chromium_gpu_fyi.py

				[bots-presentation]: https://docs.google.com/presentation/d/1BC6T7pndSqPFnituR7ceG7fMY7WaGqYHhx5i9ECa8EI/edit?usp=sharing

				## Fleet Status

				Please see the [GPU Pixel Wrangling instructions] for links to dashboards

				showing the status of various bots in the GPU fleet.

				[GPU Pixel Wrangling instructions]: pixel_wrangling.md#Fleet-Status

				## Using the GPU Bots

				Most Chromium developers interact with the GPU bots in two ways:

				1.  Observing the bots on the waterfalls.

				2.  Sending try jobs to them.

				The GPU bots are grouped on the [chromium.gpu] and [chromium.gpu.fyi]

				waterfalls. Their current status can be easily observed there.

				To send try jobs, you must first upload your CL to the codereview server. Then,

				either clicking the "CQ dry run" link or running from the command line:

				```sh

				git cl try

				```

				Sends your job to the default set of try servers.

				The GPU tests are part of the default set for Chromium CLs, and are run as part

				of the following tryservers' jobs:

				*   [linux_chromium_rel_ng] on the [tryserver.chromium.linux] waterfall

				*   [mac_chromium_rel_ng]   on the [tryserver.chromium.mac]   waterfall

				*   [win_chromium_rel_ng]   on the [tryserver.chromium.win]   waterfall

				[linux_chromium_rel_ng]:    http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_rel_ng?numbuilds=100

				[mac_chromium_rel_ng]:      http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_ng?numbuilds=100

				[win_chromium_rel_ng]:      http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng?numbuilds=100

				[tryserver.chromium.linux]: http://build.chromium.org/p/tryserver.chromium.linux/waterfall?numbuilds=100

				[tryserver.chromium.mac]:   http://build.chromium.org/p/tryserver.chromium.mac/waterfall?numbuilds=100

				[tryserver.chromium.win]:   http://build.chromium.org/p/tryserver.chromium.win/waterfall?numbuilds=100

				Scan down through the steps looking for the text "GPU"; that identifies those

				tests run on the GPU bots. For each test the "trigger" step can be ignored; the

				step further down for the test of the same name contains the results.

				It's usually not necessary to explicitly send try jobs just for verifying GPU

				tests. If you want to, you must invoke "git cl try" separately for each

				tryserver master you want to reference, for example:

				```sh

				git cl try -b linux_chromium_rel_ng

				git cl try -b mac_chromium_rel_ng

				git cl try -b win_chromium_rel_ng

				```

				Alternatively, the Gerrit UI can be used to send a patch set to these try

				servers.

				Three optional tryservers are also available which run additional tests. As of

				this writing, they ran longer-running tests that can't run against all Chromium

				CLs due to lack of hardware capacity. They are added as part of the included

				tryservers for code changes to certain sub-directories.

				*   [linux_optional_gpu_tests_rel] on the [tryserver.chromium.linux] waterfall

				*   [mac_optional_gpu_tests_rel]   on the [tryserver.chromium.mac]   waterfall

				*   [win_optional_gpu_tests_rel]   on the [tryserver.chromium.win]   waterfall

				[linux_optional_gpu_tests_rel]: https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_optional_gpu_tests_rel?numbuilds=200

				[mac_optional_gpu_tests_rel]:   https://build.chromium.org/p/tryserver.chromium.mac/builders/mac_optional_gpu_tests_rel?numbuilds=200

				[win_optional_gpu_tests_rel]:   https://build.chromium.org/p/tryserver.chromium.win/builders/win_optional_gpu_tests_rel?numbuilds=200

				Tryservers for the [ANGLE project] are also present on the

				[tryserver.chromium.angle] waterfall. These are invoked from the Gerrit user

				interface. They are configured similarly to the tryservers for regular Chromium

				patches, and run the same tests that are run on the [chromium.gpu.fyi]

				waterfall, in the same way (e.g., against ToT ANGLE).

				If you find it necessary to try patches against other sub-repositories than

				Chromium (`src/`) and ANGLE (`src/third_party/angle/`), please

				[file a bug](http://crbug.com/new) with component Internals\>GPU\>Testing.

				[ANGLE project]: https://chromium.googlesource.com/angle/angle/+/master/README.md

				[tryserver.chromium.angle]: https://build.chromium.org/p/tryserver.chromium.angle/waterfall

				[file a bug]: http://crbug.com/new

				## Running the GPU Tests Locally

				All of the GPU tests running on the bots can be run locally from a Chromium

				build. Many of the tests are simple executables:

				*   `angle_unittests`

				*   `content_gl_tests`

				*   `gl_tests`

				*   `gl_unittests`

				*   `tab_capture_end2end_tests`

				Some run only on the chromium.gpu.fyi waterfall, either because there isn't

				enough machine capacity at the moment, or because they're closed-source tests

				which aren't allowed to run on the regular Chromium waterfalls:

				*   `angle_deqp_gles2_tests`

				*   `angle_deqp_gles3_tests`

				*   `angle_end2end_tests`

				*   `audio_unittests`

				The remaining GPU tests are run via Telemetry.  In order to run them, just

				build the `chrome` target and then

				invoke `src/content/test/gpu/run_gpu_integration_test.py` with the appropriate

				argument. The tests this script can invoke are

				in `src/content/test/gpu/gpu_tests/`. For example:

				*   `run_gpu_integration_test.py context_lost --browser=release`

				*   `run_gpu_integration_test.py pixel --browser=release`

				*   `run_gpu_integration_test.py webgl_conformance --browser=release --webgl-conformance-version=1.0.2`

				*   `run_gpu_integration_test.py maps --browser=release`

				*   `run_gpu_integration_test.py screenshot_sync --browser=release`

				*   `run_gpu_integration_test.py trace_test --browser=release`

				**Note:** If you are on Linux and see this test harness exit immediately with

				`**Non zero exit code**`, it's probably because of some incompatible Python

				packages being installed. Please uninstall the `python-egenix-mxdatetime` and

				`python-logilab-common` packages in this case; see

				[Issue 716241](http://crbug.com/716241).

				You can also run a subset of tests with this harness:

				*   `run_gpu_integration_test.py webgl_conformance --browser=release

				    --test-filter=conformance_attribs`

				Figuring out the exact command line that was used to invoke the test on the

				bots can be a little tricky. The bots all\* run their tests via Swarming and

				isolates, meaning that the invocation of a step like `[trigger]

				webgl_conformance_tests on NVIDIA GPU...` will look like:

				*   `python -u

				    'E:\b\build\slave\Win7_Release__NVIDIA_\build\src\tools\swarming_client\swarming.py'

				    trigger --swarming https://chromium-swarm.appspot.com

				    --isolate-server https://isolateserver.appspot.com

				    --priority 25 --shards 1 --task-name 'webgl_conformance_tests on NVIDIA GPU...'`

				You can figure out the additional command line arguments that were passed to

				each test on the bots by examining the trigger step and searching for the

				argument separator (<code> -- </code>). For a recent invocation of

				`webgl_conformance_tests`, this looked like:

				*   `webgl_conformance --show-stdout '--browser=release' -v

				    '--extra-browser-args=--enable-logging=stderr --js-flags=--expose-gc'

				    '--isolated-script-test-output=${ISOLATED_OUTDIR}/output.json'`

				You can leave off the --isolated-script-test-output argument, so this would

				leave a full command line of:

				*   `run_gpu_integration_test.py

				    webgl_conformance --show-stdout '--browser=release' -v

				    '--extra-browser-args=--enable-logging=stderr --js-flags=--expose-gc'`

				The Maps test requires you to authenticate to cloud storage in order to access

				the Web Page Reply archive containing the test. See [Cloud Storage Credentials]

				for documentation on setting this up.

				[Cloud Storage Credentials]: gpu_testing_bot_details.md#Cloud-storage-credentials

				Pixel tests use reference images from cloud storage, bots pass

				`--upload-refimg-to-cloud-storage` argument, but to run locally you need to pass

				`--download-refimg-from-cloud-storage` argument, as well as other arguments bot

				uses, like `--refimg-cloud-storage-bucket` and `--os-type`.

				Sample command line for Android:

				*   `run_gpu_integration_test.py pixel --show-stdout --browser=android-chromium

				    -v --passthrough --extra-browser-args='--enable-logging=stderr

				    --js-flags=--expose-gc' --refimg-cloud-storage-bucket

				    chromium-gpu-archive/reference-images --os-type android

				    --download-refimg-from-cloud-storage`

				<!-- XXX: update this section; these isolates don't exist anymore -->

				You can find the isolates for the various tests in

				[src/chrome/](http://src.chromium.org/viewvc/chrome/trunk/src/chrome/):

				*   [angle_unittests.isolate](https://chromium.googlesource.com/chromium/src/+/master/chrome/angle_unittests.isolate)

				*   [content_gl_tests.isolate](https://chromium.googlesource.com/chromium/src/+/master/content/content_gl_tests.isolate)

				*   [gl_tests.isolate](https://chromium.googlesource.com/chromium/src/+/master/chrome/gl_tests.isolate)

				*   [gles2_conform_test.isolate](https://chromium.googlesource.com/chromium/src/+/master/chrome/gles2_conform_test.isolate)

				*   [tab_capture_end2end_tests.isolate](https://chromium.googlesource.com/chromium/src/+/master/chrome/tab_capture_end2end_tests.isolate)

				*   [telemetry_gpu_test.isolate](https://chromium.googlesource.com/chromium/src/+/master/chrome/telemetry_gpu_test.isolate)

				The isolates contain the full or partial command line for invoking the target.

				The complete command line for any test can be deduced from the contents of the

				isolate plus the stdio output from the test's run on the bot.

				Note that for the GN build, the isolates are simply described by build targets,

				and [gn_isolate_map.pyl] describes the mapping between isolate name and build

				target, as well as the command line used to invoke the isolate. Once all

				platforms have switched to GN, the .isolate files will be obsolete and be

				removed.

				(\* A few of the one-off GPU configurations on the chromium.gpu.fyi waterfall

				run their tests locally rather than via swarming, in order to decrease the

				number of physical machines needed.)

				[gn_isolate_map.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/gn_isolate_map.pyl

				## Running Binaries from the Bots Locally

				Any binary run remotely on a bot can also be run locally, assuming the local

				machine loosely matches the architecture and OS of the bot.

				The easiest way to do this is to find the ID of the swarming task and use

				"swarming.py reproduce" to re-run it:

				*   `./src/tools/swarming_client/swarming.py reproduce -S https://chromium-swarm.appspot.com [task ID]`

				The task ID can be found in the stdio for the "trigger" step for the test. For

				example, look at a recent build from the [Mac Release (Intel)] bot, and

				look at the `gl_unittests` step. You will see something like:

				[Mac Release (Intel)]: https://ci.chromium.org/buildbot/chromium.gpu/Mac%20Release%20%28Intel%29/

				```

				Triggered task: gl_unittests on Intel GPU on Mac/Mac-10.12.6/[TRUNCATED_ISOLATE_HASH]/Mac Release (Intel)/83664

				To collect results, use:

				  swarming.py collect -S https://chromium-swarm.appspot.com --json /var/folders/[PATH_TO_TEMP_FILE].json

				Or visit:

				  https://chromium-swarm.appspot.com/user/task/[TASK_ID]

				```

				There is a difference between the isolate's hash and Swarming's task ID. Make

				sure you use the task ID and not the isolate's hash.

				As of this writing, there seems to be a

				[bug](https://github.com/luci/luci-py/issues/250)

				when attempting to re-run the Telemetry based GPU tests in this way. For the

				time being, this can be worked around by instead downloading the contents of

				the isolate. To do so, look more deeply into the trigger step's log:

				*   <code>python -u

				    /b/build/slave/Mac_10_10_Release__Intel_/build/src/tools/swarming_client/swarming.py

				    trigger [...more args...] --tag data:[ISOLATE_HASH] [...more args...]

				    [ISOLATE_HASH] -- **[...TEST_ARGS...]**</code>

				As of this writing, the isolate hash appears twice in the command line. To

				download the isolate's contents into directory `foo` (note, this is in the

				"Help" section associated with the page for the isolate's task, but I'm not

				sure whether that's accessible only to Google employees or all members of the

				chromium.org organization):

				*   `python isolateserver.py download -I https://isolateserver.appspot.com

				    --namespace default-gzip -s [ISOLATE_HASH] --target foo`

				`isolateserver.py` will tell you the approximate command line to use. You

				should concatenate the `TEST_ARGS` highlighted in red above with

				`isolateserver.py`'s recommendation. The `ISOLATED_OUTDIR` variable can be

				safely replaced with `/tmp`.

				Note that `isolateserver.py` downloads a large number of files (everything

				needed to run the test) and may take a while. There is a way to use

				`run_isolated.py` to achieve the same result, but as of this writing, there

				were problems doing so, so this procedure is not documented at this time.

				Before attempting to download an isolate, you must ensure you have permission

				to access the isolate server. Full instructions can be [found

				here][isolate-server-credentials]. For most cases, you can simply run:

				*   `./src/tools/swarming_client/auth.py login

				    --service=https://isolateserver.appspot.com`

				The above link requires that you log in with your @google.com credentials. It's

				not known at the present time whether this works with @chromium.org accounts.

				Email kbr@ if you try this and find it doesn't work.

				[isolate-server-credentials]: gpu_testing_bot_details.md#Isolate-server-credentials

				## Running Locally Built Binaries on the GPU Bots

				See the [Swarming documentation] for instructions on how to upload your binaries to the isolate server and trigger execution on Swarming.

				[Swarming documentation]: https://www.chromium.org/developers/testing/isolated-testing/for-swes#TOC-Run-a-test-built-locally-on-Swarming

				## Adding New Tests to the GPU Bots

				The goal of the GPU bots is to avoid regressions in Chrome's rendering stack.

				To that end, let's add as many tests as possible that will help catch

				regressions in the product. If you see a crazy bug in Chrome's rendering which

				would be easy to catch with a pixel test running in Chrome and hard to catch in

				any of the other test harnesses, please, invest the time to add a test!

				There are a couple of different ways to add new tests to the bots:

				1.  Adding a new test to one of the existing harnesses.

				2.  Adding an entire new test step to the bots.

				### Adding a new test to one of the existing test harnesses

				Adding new tests to the GTest-based harnesses is straightforward and

				essentially requires no explanation.

				As of this writing it isn't as easy as desired to add a new test to one of the

				Telemetry based harnesses. See [Issue 352807](http://crbug.com/352807). Let's

				collectively work to address that issue. It would be great to reduce the number

				of steps on the GPU bots, or at least to avoid significantly increasing the

				number of steps on the bots. The WebGL conformance tests should probably remain

				a separate step, but some of the smaller Telemetry based tests

				(`context_lost_tests`, `memory_test`, etc.) should probably be combined into a

				single step.

				If you are adding a new test to one of the existing tests (e.g., `pixel_test`),

				all you need to do is make sure that your new test runs correctly via isolates.

				See the documentation from the GPU bot details on [adding new isolated

				tests][new-isolates] for the `GYP_DEFINES` and authentication needed to upload

				isolates to the isolate server. Most likely the new test will be Telemetry

				based, and included in the `telemetry_gpu_test_run` isolate. You can then

				invoke it via:

				*   `./src/tools/swarming_client/run_isolated.py -s [HASH]

				    -I https://isolateserver.appspot.com -- [TEST_NAME] [TEST_ARGUMENTS]`

				[new-isolates]: gpu_testing_bot_details.md#Adding-a-new-isolated-test-to-the-bots

				o## Adding new steps to the GPU Bots

				The tests that are run by the GPU bots are described by a couple of JSON files

				in the Chromium workspace:

				*   [`chromium.gpu.json`](https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.json)

				*   [`chromium.gpu.fyi.json`](https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.fyi.json)

				These files are autogenerated by the following script:

				*   [`generate_buildbot_json.py`](https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/generate_buildbot_json.py)

				This script is completely self-contained and should hopefully be

				self-explanatory. The JSON files are parsed by the chromium and chromium_trybot

				recipes, and describe two types of tests:

				*   GTests: those which use the Googletest and Chromium's `base/test/launcher/`

				    frameworks.

				*   Telemetry based tests: those which are built on the Telemetry framework and

				    launch the entire browser.

				A prerequisite of adding a new test to the bots is that that test [run via

				isolates][new-isolates]. Once that is done, modify `generate_buildbot_json.py` to add the

				test to the appropriate set of bots. Be careful when adding large new test

				steps to all of the bots, because the GPU bots are a limited resource and do

				not currently have the capacity to absorb large new test suites. It is safer to

				get new tests running on the chromium.gpu.fyi waterfall first, and expand from

				there to the chromium.gpu waterfall (which will also make them run against

				every Chromium CL by virtue of the `linux_chromium_rel_ng`,

				`mac_chromium_rel_ng` and `win_chromium_rel_ng` tryservers' mirroring of the

				bots on this waterfall – so be careful!).

				Tryjobs which add new test steps to the chromium.gpu.json file will run those

				new steps during the tryjob, which helps ensure that the new test won't break

				once it starts running on the waterfall.

				Tryjobs which modify chromium.gpu.fyi.json can be sent to the

				`win_optional_gpu_tests_rel`, `mac_optional_gpu_tests_rel` and

				`linux_optional_gpu_tests_rel` tryservers to help ensure that they won't

				break the FYI bots.

				## Updating and Adding New Pixel Tests to the GPU Bots

				Adding new pixel tests which require reference images is a slightly more

				complex process than adding other kinds of tests which can validate their own

				correctness. There are a few reasons for this.

				*   Reference image based pixel tests require different golden images for

				    different combinations of operating system, GPU, driver version, OS

				    version, and occasionally other variables.

				*   The reference images must be generated by the main waterfall. The try

				    servers are not allowed to produce new reference images, only consume them.

				    The reason for this is that a patch sent to the try servers might cause an

				    incorrect reference image to be generated. For this reason, the main

				    waterfall bots upload reference images to cloud storage, and the try

				    servers download them and verify their results against them.

				*   The try servers will fail if they run a pixel test requiring a reference

				    image that doesn't exist in cloud storage. This is deliberate, but needs

				    more thought; see [Issue 349262](http://crbug.com/349262).

				If a reference image based pixel test's result is going to change because of a

				change in ANGLE or Blink (for example), updating the reference images is a

				slightly tricky process. Here's how to do it:

				*   Mark the pixel test as failing in the [pixel tests]' [test expectations]

				*   Commit the change to ANGLE, Blink, etc. which will change the test's

				    results

				*   Note that without the failure expectation, this commit would turn some bots

				    red; a Blink change will turn the GPU bots on the chromium.webkit waterfall

				    red, and an ANGLE change will turn the chromium.gpu.fyi bots red

				*   Wait for Blink/ANGLE/etc. to roll

				*   Commit a change incrementing the revision number associated with the test

				    in the [test pages]

				*   Commit a second change removing the failure expectation, once all of the

				    bots on the main waterfall have generated new reference images. This change

				    should go through the commit queue cleanly.

				[pixel tests]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/pixel_test_pages.py

				[test expectations]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/pixel_expectations.py

				[test pages]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/pixel_test_pages.py

				When adding a brand new pixel test that uses a reference image, the steps are

				similar, but simpler:

				*   Mark the test as failing in the same commit which introduces the new test

				*   Wait for the reference images to be produced by all of the GPU bots on the

				    waterfalls (see [chromium-gpu-archive/reference-images])

				*   Commit a change un-marking the test as failing

				When making a Chromium-side change which changes the pixel tests' results:

				*   In your CL, both mark the pixel test as failing in the pixel test's test

				    expectations and increment the test's version number in the page set (see

				    above)

				*   After your CL lands, land another CL removing the failure expectations. If

				    this second CL goes through the commit queue cleanly, you know reference

				    images were generated properly.

				In general, when adding a new pixel test, it's better to spot check a few

				pixels in the rendered image rather than using a reference image per platform.

				The [GPU rasterization test] is a good example of a recently added test which

				performs such spot checks.

				[cloud storage bucket]: https://console.developers.google.com/storage/chromium-gpu-archive/reference-images

				<!-- XXX: old link -->

				[GPU rasterization test]: http://src.chromium.org/viewvc/chrome/trunk/src/content/test/gpu/gpu_tests/gpu_rasterization.py

				## Stamping out Flakiness

				It's critically important to aggressively investigate and eliminate the root

				cause of any flakiness seen on the GPU bots. The bots have been known to run

				reliably for days at a time, and any flaky failures that are tolerated on the

				bots translate directly into instability of the browser experienced by

				customers. Critical bugs in subsystems like WebGL, affecting high-profile

				products like Google Maps, have escaped notice in the past because the bots

				were unreliable. After much re-work, the GPU bots are now among the most

				reliable automated test machines in the Chromium project. Let's keep them that

				way.

				Flakiness affecting the GPU tests can come in from highly unexpected sources.

				Here are some examples:

				*   Intermittent pixel_test failures on Linux where the captured pixels were

				    black, caused by the Display Power Management System (DPMS) kicking in.

				    Disabled the X server's built-in screen saver on the GPU bots in response.

				*   GNOME dbus-related deadlocks causing intermittent timeouts ([Issue

				    309093](http://crbug.com/309093) and related bugs).

				*   Windows Audio system changes causing intermittent assertion failures in the

				    browser ([Issue 310838](http://crbug.com/310838)).

				*   Enabling assertion failures in the C++ standard library on Linux causing

				    random assertion failures ([Issue 328249](http://crbug.com/328249)).

				*   V8 bugs causing random crashes of the Maps pixel test (V8 issues

				    [3022](https://code.google.com/p/v8/issues/detail?id=3022),

				    [3174](https://code.google.com/p/v8/issues/detail?id=3174)).

				*   TLS changes causing random browser process crashes ([Issue

				    264406](http://crbug.com/264406)).

				*   Isolated test execution flakiness caused by failures to reliably clean up

				    temporary directories ([Issue 340415](http://crbug.com/340415)).

				*   The Telemetry-based WebGL conformance suite caught a bug in the memory

				    allocator on Android not caught by any other bot ([Issue

				    347919](http://crbug.com/347919)).

				*   context_lost test failures caused by the compositor's retry logic ([Issue

				    356453](http://crbug.com/356453)).

				*   Multiple bugs in Chromium's support for lost contexts causing flakiness of

				    the context_lost tests ([Issue 365904](http://crbug.com/365904)).

				*   Maps test timeouts caused by Content Security Policy changes in Blink

				    ([Issue 395914](http://crbug.com/395914)).

				*   Weak pointer assertion failures in various webgl\_conformance\_tests caused

				    by changes to the media pipeline ([Issue 399417](http://crbug.com/399417)).

				*   A change to a default WebSocket timeout in Telemetry causing intermittent

				    failures to run all WebGL conformance tests on the Mac bots ([Issue

				    403981](http://crbug.com/403981)).

				*   Chrome leaking suspended sub-processes on Windows, apparently a preexisting

				    race condition that suddenly showed up ([Issue

				    424024](http://crbug.com/424024)).

				*   Changes to Chrome's cross-context synchronization primitives causing the

				    wrong tiles to be rendered ([Issue 584381](http://crbug.com/584381)).

				*   A bug in V8's handling of array literals causing flaky failures of

				    texture-related WebGL 2.0 tests ([Issue 606021](http://crbug.com/606021)).

				*   Assertion failures in sync point management related to lost contexts that

				    exposed a real correctness bug ([Issue 606112](http://crbug.com/606112)).

				*   A bug in glibc's `sem_post`/`sem_wait` primitives breaking V8's parallel

				    garbage collection ([Issue 609249](http://crbug.com/609249)).

				If you notice flaky test failures either on the GPU waterfalls or try servers,

				please file bugs right away with the component Internals>GPU>Testing and

				include links to the failing builds and copies of the logs, since the logs

				expire after a few days. [GPU pixel wranglers] should give the highest priority

				to eliminating flakiness on the tree.

				[GPU pixel wranglers]: pixel_wrangling.md

									
										539

docs/gpu/gpu_testing_bot_details.md
									
										Normal file
									
				@ -0,0 +1,539 @@

				# GPU Bot Details

				This PAGE describes in detail how the GPU bots are set up, which files affect

				their configuration, and how to both modify their behavior and add new bots.

				[TOC]

				## Overview of the GPU bots' setup

				Chromium's GPU bots, compared to the majority of the project's test machines,

				are physical pieces of hardware. When end users run the Chrome browser, they

				are almost surely running it on a physical piece of hardware with a real

				graphics processor. There are some portions of the code base which simply can

				not be exercised by running the browser in a virtual machine, or on a software

				implementation of the underlying graphics libraries. The GPU bots were

				developed and deployed in order to cover these code paths, and avoid

				regressions that are otherwise inevitable in a project the size of the Chromium

				browser.

				The GPU bots are utilized on the [chromium.gpu] and [chromium.gpu.fyi]

				waterfalls, and various tryservers, as described in [Using the GPU Bots].

				[chromium.gpu]: https://build.chromium.org/p/chromium.gpu/console

				[chromium.gpu.fyi]: https://build.chromium.org/p/chromium.gpu.fyi/console

				[Using the GPU Bots]: gpu_testing.md#Using-the-GPU-Bots

				The vast majority of the hardware for the bots lives in the Chrome-GPU Swarming

				pool. The waterfall bots are simply virtual machines which spawn Swarming tasks

				with the appropriate tags to get them to run on the desired GPU and operating

				system type. So, for example, the [Win10 Release (NVIDIA)] bot is actually a

				virtual machine which spawns all of its jobs with the Swarming parameters:

				[Win10 Release (NVIDIA)]: https://ci.chromium.org/buildbot/chromium.gpu/Win10%20Release%20%28NVIDIA%29/?limit=200

				```json

				{

				    "gpu": "10de:1cb3-23.21.13.8816",

				    "os": "Windows-10",

				    "pool": "Chrome-GPU"

				}

				```

				Since the GPUs in the Swarming pool are mostly homogeneous, this is sufficient

				to target the pool of Windows 10-like NVIDIA machines. (There are a few Windows

				7-like NVIDIA bots in the pool, which necessitates the OS specifier.)

				Details about the bots can be found on [chromium-swarm.appspot.com] and by

				using `src/tools/swarming_client/swarming.py`, for example `swarming.py bots`.

				If you are authenticated with @google.com credentials you will be able to make

				queries of the bots and see, for example, which GPUs are available.

				[chromium-swarm.appspot.com]: https://chromium-swarm.appspot.com/

				The waterfall bots run tests on a single GPU type in order to make it easier to

				see regressions or flakiness that affect only a certain type of GPU.

				The tryservers like `win_chromium_rel_ng` which include GPU tests, on the other

				hand, run tests on more than one GPU type. As of this writing, the Windows

				tryservers ran tests on NVIDIA and AMD GPUs; the Mac tryservers ran tests on

				Intel and NVIDIA GPUs. The way these tryservers' tests are specified is simply

				by *mirroring* how one or more waterfall bots work. This is an inherent

				property of the [`chromium_trybot` recipe][chromium_trybot.py], which was designed to eliminate

				differences in behavior between the tryservers and waterfall bots. Since the

				tryservers mirror waterfall bots, if the waterfall bot is working, the

				tryserver must almost inherently be working as well.

				[chromium_trybot.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipes/chromium_trybot.py

				There are a few one-off GPU configurations on the waterfall where the tests are

				run locally on physical hardware, rather than via Swarming. A few examples are:

				<!-- XXX: update this list -->

				*   [Mac Pro Release (AMD)](https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Mac%20Pro%20Release%20%28AMD%29/)

				*   [Mac Pro Debug (AMD)](https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Mac%20Pro%20Debug%20%28AMD%29/)

				*   [Linux Release (Intel HD 630)](https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Linux%20Release%20%28Intel%20HD%20630%29/)

				*   [Linux Release (AMD R7 240)](https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Linux%20Release%20%28AMD%20R7%20240%29/)

				There are a couple of reasons to continue to support running tests on a

				specific machine: it might be too expensive to deploy the required multiple

				copies of said hardware, or the configuration might not be reliable enough to

				begin scaling it up.

				## Adding a new isolated test to the bots

				Adding a new test step to the bots requires that the test run via an isolate.

				Isolates describe both the binary and data dependencies of an executable, and

				are the underpinning of how the Swarming system works. See the [LUCI wiki] for

				background on Isolates and Swarming.

				<!-- XXX: broken link -->

				[LUCI wiki]: https://github.com/luci/luci-py/wiki

				### Adding a new isolate

				1.  Define your target using the `template("test")` template in

				    [`src/testing/test.gni`][testing/test.gni]. See `test("gl_tests")` in

				    [`src/gpu/BUILD.gn`][gpu/BUILD.gn] for an example. For a more complex

				    example which invokes a series of scripts which finally launches the

				    browser, see [`src/chrome/telemetry_gpu_test.isolate`][telemetry_gpu_test.isolate].

				2.  Add an entry to [`src/testing/buildbot/gn_isolate_map.pyl`][gn_isolate_map.pyl] that refers to

				    your target. Find a similar target to yours in order to determine the

				    `type`. The type is referenced in [`src/tools/mb/mb_config.pyl`][mb_config.pyl].

				[testing/test.gni]:           https://chromium.googlesource.com/chromium/src/+/master/testing/test.gni

				[gpu/BUILD.gn]:               https://chromium.googlesource.com/chromium/src/+/master/gpu/BUILD.gn

				<!-- XXX: broken link -->

				[telemetry_gpu_test.isolate]: https://chromium.googlesource.com/chromium/src/+/master/chrome/telemetry_gpu_test.isolate

				[gn_isolate_map.pyl]:         https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/gn_isolate_map.pyl

				[mb_config.pyl]:              https://chromium.googlesource.com/chromium/src/+/master/tools/mb/mb_config.pyl

				At this point you can build and upload your isolate to the isolate server.

				See [Isolated Testing for SWEs] for the most up-to-date instructions. These

				instructions are a copy which show how to run an isolate that's been uploaded

				to the isolate server on your local machine rather than on Swarming.

				[Isolated Testing for SWEs]: https://www.chromium.org/developers/testing/isolated-testing/for-swes

				If `cd`'d into `src/`:

				1.  `./tools/mb/mb.py isolate //out/Release [target name]`

				    *   For example: `./tools/mb/mb.py isolate //out/Release angle_end2end_tests`

				1.  `python tools/swarming_client/isolate.py batcharchive -I https://isolateserver.appspot.com out/Release/[target name].isolated.gen.json`

				    *   For example: `python tools/swarming_client/isolate.py batcharchive -I https://isolateserver.appspot.com out/Release/angle_end2end_tests.isolated.gen.json`

				1.  This will write a hash to stdout. You can run it via:

				    `python tools/swarming_client/run_isolated.py -I https://isolateserver.appspot.com -s [HASH] -- [any additional args for the isolate]`

				See the section below on [isolate server credentials](#Isolate-server-credentials).

				### Adding your new isolate to the tests that are run on the bots

				See [Adding new steps to the GPU bots] for details on this process.

				[Adding new steps to the GPU bots]: gpu_testing.md#Adding-new-steps-to-the-GPU-Bots

				## Relevant files that control the operation of the GPU bots

				In the [tools/build] workspace:

				*   [masters/master.chromium.gpu] and [masters/master.chromium.gpu.fyi]:

				    *   builders.pyl in these two directories defines the bots that show up on

				        the waterfall. If you are adding a new bot, you need to add it to

				        builders.pyl and use go/bug-a-trooper to request a restart of either

				        master.chromium.gpu or master.chromium.gpu.fyi.

				    *   Only changes under masters/ require a waterfall restart. All other

				        changes – for example, to scripts/slave/ in this workspace, or the

				        Chromium workspace – do not require a master restart (and go live the

				        minute they are committed).

				*   `scripts/slave/recipe_modules/chromium_tests/`:

				    *   <code>[chromium_gpu.py]</code> and

				        <code>[chromium_gpu_fyi.py]</code> define the following for

				        each builder and tester:

				        *   How the workspace is checked out (e.g., this is where top-of-tree

				            ANGLE is specified)

				        *   The build configuration (e.g., this is where 32-bit vs. 64-bit is

				            specified)

				        *   Various gclient defines (like compiling in the hardware-accelerated

				            video codecs, and enabling compilation of certain tests, like the

				            dEQP tests, that can't be built on all of the Chromium builders)

				        *   Note that the GN configuration of the bots is also controlled by

				            <code>[mb_config.pyl]</code> in the Chromium workspace; see below.

				    *   <code>[trybots.py]</code> defines how try bots *mirror* one or more

				        waterfall bots.

				        *   The concept of try bots mirroring waterfall bots ensures there are

				            no differences in behavior between the waterfall bots and the try

				            bots. This helps ensure that a CL will not pass the commit queue

				            and then break on the waterfall.

				        *   This file defines the behavior of the following GPU-related try

				            bots:

				            *   `linux_chromium_rel_ng`, `mac_chromium_rel_ng`, and

				                `win_chromium_rel_ng`, which run against every Chromium CL, and

				                which mirror the behavior of bots on the chromium.gpu

				                waterfall.

				            *   The ANGLE try bots, which run against ANGLE CLs, and mirror the

				                behavior of the chromium.gpu.fyi waterfall (including using

				                top-of-tree ANGLE, and running additional tests not run by the

				                regular Chromium try bots)

				           *   The optional GPU try servers `linux_optional_gpu_tests_rel`,

				               `mac_optional_gpu_tests_rel` and

				               `win_optional_gpu_tests_rel`, which are triggered manually and

				               run some tests which can't be run on the regular Chromium try

				               servers mainly due to lack of hardware capacity.

				[tools/build]:                     https://chromium.googlesource.com/chromium/tools/build/

				[masters/master.chromium.gpu]:     https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.chromium.gpu/

				[masters/master.chromium.gpu.fyi]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.chromium.gpu.fyi/

				[chromium_gpu.py]:                 https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipe_modules/chromium_tests/chromium_gpu.py

				[chromium_gpu_fyi.py]:             https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipe_modules/chromium_tests/chromium_gpu_fyi.py

				[trybots.py]:                      https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipe_modules/chromium_tests/trybots.py

				In the [chromium/src] workspace:

				*   [src/testing/buildbot]:

				    *   <code>[chromium.gpu.json]</code> and

				        <code>[chromium.gpu.fyi.json]</code> define which steps are run on

				        which bots. These files are autogenerated. Don't modify them directly!

				    *   <code>[gn_isolate_map.pyl]</code> defines all of the isolates' behavior in the GN

				        build.

				*   [`src/tools/mb/mb_config.pyl`][mb_config.pyl]

				    *   Defines the GN arguments for all of the bots.

				*   [`src/content/test/gpu/generate_buildbot_json.py`][generate_buildbot_json.py]

				    *   The generator script for `chromium.gpu.json` and

				        `chromium.gpu.fyi.json`. It defines on which GPUs various tests run.

				    *   It's completely self-contained and should hopefully be fairly

				        comprehensible.

				    *   When modifying this script, don't forget to also run it, to regenerate

				        the JSON files.

				    *   See [Adding new steps to the GPU bots] for more details.

				[chromium/src]:              https://chromium.googlesource.com/chromium/src/

				[src/testing/buildbot]:      https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot

				[chromium.gpu.json]:         https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.json

				[chromium.gpu.fyi.json]:     https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.fyi.json

				[gn_isolate_map.pyl]:        https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/gn_isolate_map.pyl

				[mb_config.pyl]:             https://chromium.googlesource.com/chromium/src/+/master/tools/mb/mb_config.pyl

				[generate_buildbot_json.py]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/generate_buildbot_json.py

				In the [infradata/config] workspace (Google internal only, sorry):

				*   [configs/chromium-swarm/bots.cfg]

				    *   Defines a `Chrome-GPU` Swarming pool which contains most of the

				        specialized hardware: as of this writing, the Windows and Linux NVIDIA

				        bots, the Windows AMD bots, and the MacBook Pros with NVIDIA and AMD

				        GPUs. New GPU hardware should be added to this pool.

				[infradata/config]:                https://chrome-internal.googlesource.com/infradata/config

				[configs/chromium-swarm/bots.cfg]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/bots.cfg

				## Walkthroughs of various maintenance scenarios

				This section describes various common scenarios that might arise when

				maintaining the GPU bots, and how they'd be addressed.

				### How to add a new test or an entire new step to the bots

				This is described in [Adding new tests to the GPU bots].

				[Adding new tests to the GPU bots]: https://www.chromium.org/developers/testing/gpu-testing/#TOC-Adding-New-Tests-to-the-GPU-Bots

				### How to add a new bot

				The first decision point when adding a new GPU bot is whether it is a one-off

				piece of hardware, or one which is expected to be scaled up at some point. If

				it's a one-off piece of hardware, it can be added to the chromium.gpu.fyi

				waterfall as a non-swarmed test machine. If it's expected to be scaled up at

				some point, the hardware should be added to the swarming pool. These two

				scenarios are described in more detail below.

				#### How to add a new, non-swarmed, physical bot to the chromium.gpu.fyi waterfall

				1.  Work with the Chrome Infrastructure Labs team to get the hardware deployed

				    so it can talk to the chromium.gpu.fyi master.

				1.  Create a CL in the build workspace which:

				    1.  Add the new machine to

				        [`masters/master.chromium.gpu.fyi/builders.pyl`][master.chromium.gpu.fyi/builders.pyl].

				    1.  Add the new machine to

				        [`scripts/slave/recipe_modules/chromium_tests/chromium_gpu_fyi.py`][chromium_gpu_fyi.py].

				        Set the `enable_swarming` property to `False`.

				    1.  Retrain recipe expectations

				        (`scripts/slave/recipes.py --use-bootstrap test train`) and add the

				        newly created JSON file(s) corresponding to the new machines to your CL.

				1.  Create a CL in the Chromium workspace to:

				    1.  Add the new machine to

				        [`src/content/test/gpu/generate_buildbot_json.py`][generate_buildbot_json.py].

				        Make sure to set the `swarming` property to `False`.

				    1.  If the machine runs GN, add a description to

				        [`src/tools/mb/mb_config.pyl`][mb_config.pyl].

				1.  Once the build workspace CL lands, use go/bug-a-trooper (or contact kbr@)

				    to schedule a restart of the chromium.gpu.fyi waterfall. This is only

				    necessary when modifying files under the masters/ directory. A reboot of

				    the machine may be needed once the waterfall has been restarted in order to

				    make it connect properly.

				1.  The CLs from (2) and (3) can land in either order, though it is preferable

				    to land the Chromium-side CL first so that the machine knows what tests to

				    run the first time it boots up.

				[master.chromium.gpu.fyi/builders.pyl]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.chromium.gpu.fyi/builders.pyl

				#### How to add a new swarmed bot to the chromium.gpu.fyi waterfall

				When deploying a new GPU configuration, it should be added to the

				chromium.gpu.fyi waterfall first. The chromium.gpu waterfall should be reserved

				for those GPUs which are tested on the commit queue. (Some of the bots violate

				this rule – namely, the Debug bots – though we should strive to eliminate these

				differences.) Once the new configuration is ready to be fully deployed on

				tryservers, bots can be added to the chromium.gpu waterfall, and the tryservers

				changed to mirror them.

				In order to add Release and Debug waterfall bots for a new configuration,

				experience has shown that at least 4 physical machines are needed in the

				swarming pool. The reason is that the tests all run in parallel on the Swarming

				cluster, so the load induced on the swarming bots is higher than it would be

				for a non-swarmed bot that executes its tests serially.

				With these prerequisites, these are the steps to add a new swarmed bot.

				(Actually, pair of bots -- Release and Debug.)

				1.  Work with the Chrome Infrastructure Labs team to get the (minimum 4)

				    physical machines added to the Swarming pool. Use

				    [chromium-swarm.appspot.com] or `src/tools/swarming_client/swarming.py bots`

				    to determine the PCI IDs of the GPUs in the bots. (These instructions will

				    need to be updated for Android bots which don't have PCI buses.)

				    1.  Make sure to add these new machines to the Chrome-GPU Swarming pool by

				        creating a CL against [`configs/chromium-swarm/bots.cfg`][bots.cfg] in

				        the [infradata/config] workspace.

				1.  File a Chrome Infrastructure Labs ticket requesting 2 virtual machines for

				    the testers. These need to match the OS of the physical machines and

				    builders because of limitations in the scripts which transfer builds from

				    the builder to the tester; see [this feature

				    request](http://crbug.com/581953). For example, if you're adding a "Windows

				    7 CoolNewGPUType" tester, you'll need 2 Windows VMs.

				1.  Once the VMs are ready, create a CL in the build workspace which:

				    1.  Adds the new VMs as the Release and Debug bots in

				        [`master.chromium.gpu.fyi/builders.pyl`][master.chromium.gpu.fyi/builders.pyl].

				    1.  Adds the new VMs to [`chromium_gpu_fyi.py`][chromium_gpu_fyi.py]. Make

				        sure to set the `enable_swarming` and `serialize_tests` properties to

				        `True`. Double-check the `parent_buildername` property for each. It

				        must match the Release/Debug flavor of the builder.

				    1.  Retrain recipe expectations

				        (`scripts/slave/recipes.py --use-bootstrap test train`) and add the

				        newly created JSON file(s) corresponding to the new machines to your CL.

				1.  Create a CL in the Chromium workspace which:

				    1.  Adds the new machine to

				        `src/content/test/gpu/generate_buildbot_json.py`.

				        1.  The swarming dimensions are crucial. These must match the GPU and

				            OS type of the physical hardware in the Swarming pool. This is what

				            causes the VMs to spawn their tests on the correct hardware. Make

				            sure to use the Chrome-GPU pool, and that the new machines were

				            specifically added to that pool.

				        1.  Make sure to set the `swarming` property to `True` for both the

				            Release and Debug bots.

				        1.  Make triply sure that there are no collisions between the new

				            hardware you're adding and hardware already in the Swarming pool.

				            For example, it used to be the case that all of the Windows NVIDIA

				            bots ran the same OS version. Later, the Windows 8 flavor bots were

				            added. In order to avoid accidentally running tests on Windows 8

				            when Windows 7 was intended, the OS in the swarming dimensions of

				            the Win7 bots had to be changed from `win` to

				            `Windows-2008ServerR2-SP1` (the Win7-like flavor running in our

				            data center). Similarly, the Win8 bots had to have a very precise

				            OS description (`Windows-2012ServerR2-SP0`).

				    1.  If the machine runs GN, adds a description to

				        [`src/tools/mb/mb_config.pyl`][mb_config.pyl].

				1.  Once the tools/build CL lands, use go/bug-a-trooper (or contact kbr@) to

				    schedule a restart of the chromium.gpu.fyi waterfall. This is only

				    necessary when modifying files under the masters/ directory. A reboot of

				    the VMs may be needed once the waterfall has been restarted in order to

				    make them connect properly.

				1.  The CLs from (3) and (4) can land in either order, though it is preferable

				    to land the Chromium-side CL first so that the machine knows what tests to

				    run the first time it boots up.

				[bots.cfg]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/bots.cfg

				[infradata/config]: https://chrome-internal.googlesource.com/infradata/config/

				#### How to start running tests on a new GPU type on an existing try bot

				Let's say that you want to cause the `win_chromium_rel_ng` try bot to run tests

				on CoolNewGPUType in addition to the types it currently runs (as of this

				writing, NVIDIA and AMD). To do this:

				1.  Make sure there is enough hardware capacity. Unfortunately, tools to report

				    utilization of the Swarming pool are still being developed, but a

				    back-of-the-envelope estimate is that you will need a minimum of 30

				    machines in the Swarming pool to run the current set of GPU tests on the

				    tryservers. We estimate that 90 machines will be needed in order to

				    additionally run the WebGL 2.0 conformance tests. Plan for the larger

				    capacity, as it's desired to run the larger test suite on as many

				    configurations as possible.

				2.  Deploy Release and Debug testers on the chromium.gpu waterfall, following

				    the instructions for the chromium.gpu.fyi waterfall above. You will also

				    need to temporarily add suppressions to

				    [`tests/masters_recipes_test.py`][tests/masters_recipes_test.py] for these

				    new testers since they aren't yet covered by try bots and are going on a

				    non-FYI waterfall. Make sure these run green for a day or two before

				    proceeding.

				3.  Create a CL in the tools/build workspace, adding the new Release tester

				    to `win_chromium_rel_ng`'s `bot_ids` list

				    in `scripts/slave/recipe_modules/chromium_tests/trybots.py`. Rerun

				    `scripts/slave/recipes.py --use-bootstrap test train`.

				4.  Once the CL in (3) lands, the commit queue will **immediately** start

				    running tests on the CoolNewGPUType configuration. Be vigilant and make

				    sure that tryjobs are green. If they are red for any reason, revert the CL

				    and figure out offline what went wrong.

				[tests/masters_recipes_test.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/tests/masters_recipes_test.py

				#### How to add a new optional try bot

				The "optional" GPU try bots are a concession to the reality that there are some

				long-running GPU test suites that simply can not run against every Chromium CL.

				They run some additional tests that are usually run only on the

				chromium.gpu.fyi waterfall. Some of these tests, like the WebGL 2.0 conformance

				suite, are intended to be run on the normal try bots once hardware capacity is

				available. Some are not intended to ever run on the normal try bots.

				The optional try bots are a little different because they mirror waterfall bots

				that don't actually exist. The waterfall bots' specifications exist only to

				tell the optional try bots which tests to run.

				Let's say that you intended to add a new such optional try bot on Windows. Call

				it `win_new_optional_tests_rel` for example. Now, if you wanted to just add

				this GPU type to the existing `win_optional_gpu_tests_rel` try bot, you'd

				just follow the instructions above

				([How to start running tests on a new GPU type on an existing try bot](#How-to-start-running-tests-on-a-new-GPU-type-on-an-existing-try-bot)). The steps below describe how to spin up

				an entire new optional try bot.

				1.  Make sure that you have some swarming capacity for the new GPU type. Since

				    it's not running against all Chromium CLs you don't need the recommended 30

				    minimum bots, though ~10 would be good.

				1.  Create a CL in the Chromium workspace:

				    1.  Add your new bot (for example, "Optional Win7 Release

				        (CoolNewGPUType)") to the chromium.gpu.fyi waterfall in

				        [generate_buildbot_json.py]. (Note, this is a bad example: the

				        "optional" bots have special semantics in this script. You'd probably

				        want to define some new category of bot if you didn't intend to add

				        this to `win_optional_gpu_tests_rel`.) 

				    1.  Re-run the script to regenerate the JSON files.

				1.  Land the above CL.

				1.  Create a CL in the tools/build workspace:

				    1.  Modify `masters/master.tryserver.chromium.win`'s [master.cfg] and

				        [slaves.cfg] to add the new tryserver. Follow the pattern for the

				        existing `win_optional_gpu_tests_rel` tryserver. Namely, add the new

				        entry to master.cfg, and add the new tryserver to the

				        `optional_builders` list in `slaves.cfg`.

				    1.  Modify [`chromium_gpu_fyi.py`][chromium_gpu_fyi.py] to add the new

				        "Optional Win7 Release (CoolNewGPUType)" entry.

				    1.  Modify [`trybots.py`][trybots.py] to add

				        the new `win_new_optional_tests_rel` try bot, mirroring "Optional

				        Win7 Release (CoolNewGPUType)".

				1.  Land the above CL and request an off-hours restart of the

				    tryserver.chromium.win waterfall.

				1.  Now you can send CLs to the new bot with:

				    `git cl try -m tryserver.chromium.win -b win_new_optional_tests_rel`

				[master.cfg]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.tryserver.chromium.win/master.cfg

				[slaves.cfg]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.tryserver.chromium.win/slaves.cfg

				#### How to test and deploy a driver update

				Let's say that you want to roll out an update to the graphics drivers on one of

				the configurations like the Win7 NVIDIA bots. The responsible way to do this is

				to run the new driver on one of the waterfalls for a day or two to make sure

				the tests are running reliably green before rolling out the driver update

				everywhere. To do this:

				1.  Work with the Chrome Infrastructure Labs team to deploy a single,

				    non-swarmed, physical machine on the chromium.gpu.fyi waterfall running the

				    new driver. The OS and GPU should exactly match the configuration you

				    intend to upgrade. See

				    [How to add a new, non-swarmed, physical bot to the chromium.gpu.fyi waterfall](#How-to-add-a-new_non-swarmed_physical-bot-to-the-chromium_gpu_fyi-waterfall).

				2.  Hopefully, the new machine will pass the pixel tests. If it doesn't, then

				    unfortunately, it'll be necessary to follow the instructions on

				    [updating the pixel tests] to temporarily suppress the failures on this

				    particular configuration. Keep the time window for these test suppressions

				    as narrow as possible.

				3.  Watch the new machine for a day or two to make sure it's stable.

				4.  When it is, ask the Chrome Infrastructure Labs team to roll out the driver

				    update across all of the similarly configured bots in the swarming pool.

				5.  If necessary, update pixel test expectations and remove the suppressions

				    added above.

				6.  Prepare and land a CL removing the temporary machine from the

				    chromium.gpu.fyi waterfall. Request a waterfall restart.

				7.  File a ticket with the Chrome Infrastructure Labs team to reclaim the

				    temporary machine.

				Note that with recent improvements to Swarming, in particular [this

				RFE](https://github.com/luci/luci-py/issues/253) and others, these steps are no

				longer strictly necessary – it's possible to target Swarming jobs at a

				particular driver version. If

				[`generate_buildbot_json.py`][generate_buildbot_json.py] were improved to be

				more specific about the driver version on the various bots, then the machines

				with the new drivers could simply be added to the Swarming pool, and this

				process could be a lot simpler. Patches welcome. :)

				[updating the pixel tests]: https://www.chromium.org/developers/testing/gpu-testing/#TOC-Updating-and-Adding-New-Pixel-Tests-to-the-GPU-Bots

				## Credentials for various servers

				Working with the GPU bots requires credentials to various services: the isolate

				server, the swarming server, and cloud storage.

				### Isolate server credentials

				To upload and download isolates you must first authenticate to the isolate

				server. From a Chromium checkout, run:

				*   `./src/tools/swarming_client/auth.py login

				    --service=https://isolateserver.appspot.com`

				This will open a web browser to complete the authentication flow. A @google.com

				email address is required in order to properly authenticate.

				To test your authentication, find a hash for a recent isolate. Consult the

				instructions on [Running Binaries from the Bots Locally] to find a random hash

				from a target like `gl_tests`. Then run the following:

				[Running Binaries from the Bots Locally]: https://www.chromium.org/developers/testing/gpu-testing#TOC-Running-Binaries-from-the-Bots-Locally

				If authentication succeeded, this will silently download a file called

				`delete_me` into the current working directory. If it failed, the script will

				report multiple authentication errors. In this case, use the following command

				to log out and then try again:

				*   `./src/tools/swarming_client/auth.py logout

				    --service=https://isolateserver.appspot.com`

				### Swarming server credentials

				The swarming server uses the same `auth.py` script as the isolate server. You

				will need to authenticate if you want to manually download the results of

				previous swarming jobs, trigger your own jobs, or run `swarming.py reproduce`

				to re-run a remote job on your local workstation. Follow the instructions

				above, replacing the service with `https://chromium-swarm.appspot.com`.

				### Cloud storage credentials

				Authentication to Google Cloud Storage is needed for a couple of reasons:

				uploading pixel test results to the cloud, and potentially uploading and

				downloading builds as well, at least in Debug mode. Use the copy of gsutil in

				`depot_tools/third_party/gsutil/gsutil`, and follow the [Google Cloud Storage

				instructions] to authenticate. You must use your @google.com email address and

				be a member of the Chrome GPU team in order to receive read-write access to the

				appropriate cloud storage buckets. Roughly:

				1.  Run `gsutil config`

				2.  Copy/paste the URL into your browser

				3.  Log in with your @google.com account

				4.  Allow the app to access the information it requests

				5.  Copy-paste the resulting key back into your Terminal

				6.  Press "enter" when prompted for a project-id (i.e., leave it empty)

				At this point you should be able to write to the cloud storage bucket.

				Navigate to

				<https://console.developers.google.com/storage/chromium-gpu-archive> to view

				the contents of the cloud storage bucket.

				[Google Cloud Storage instructions]: https://developers.google.com/storage/docs/gsutil

BIN
docs/gpu/images/wrangler.png Normal file

Binary file not shown.

After

(image error) Size: 12 KiB

									
										298

docs/gpu/pixel_wrangling.md
									
										Normal file
									
				@ -0,0 +1,298 @@

				# GPU Bots & Pixel Wrangling

				![](images/wrangler.png)

				(December 2017: presentation on GPU bots and pixel wrangling: see [slides].)

				GPU Pixel Wrangling is the process of keeping various GPU bots green. On the

				GPU bots, tests run on physical hardware with real GPUs, not in VMs like the

				majority of the bots on the Chromium waterfall.

				[slides]: https://docs.google.com/presentation/d/1sZjyNe2apUhwr5sinRfPs7eTzH-3zO0VQ-Cj-8DlEDQ/edit?usp=sharing

				[TOC]

				## Fleet Status

				The following links (sorry, Google employees only) show the status of various

				GPU bots in the fleet.

				Primary configurations:

				*   [Windows 10 Quadro P400 Pool](http://shortn/_dmtaFfY2Jq)

				*   [Windows 10 Intel HD 630 Pool](http://shortn/_QsoGIGIFYd)

				*   [Linux Quadro P400 Pool](http://shortn/_fNgNs1uROQ)

				*   [Linux Intel HD 630 Pool](http://shortn/_dqEGjCGMHT)

				*   [Mac AMD Retina 10.12.6 GPU Pool](http://shortn/_BcrVmfRoSo)

				*   [Mac Mini Chrome Pool](http://shortn/_Ru8NESapPM)

				*   [Android Nexus 5X Chrome Pool](http://shortn/_G3j7AVmuNR)

				Secondary configurations:

				*   [Windows 7 Quadro P400 Pool](http://shortn/_cuxSKC15UX)

				*   [Windows AMD R7 240 GPU Pool](http://shortn/_XET7RTMHQm)

				*   [Mac NVIDIA Retina 10.12.6 GPU Pool](http://shortn/_jQWG7W71Ek)

				## GPU Bots' Waterfalls

				The waterfalls work much like any other; see the [Tour of the Chromium Buildbot

				Waterfall] for a more detailed explanation of how this is laid out. We have

				more subtle configurations because the GPU matters, not just the OS and release

				v. debug. Hence we have Windows Nvidia Release bots, Mac Intel Debug bots, and

				so on. The waterfalls we’re interested in are:

				*   [Chromium GPU]

				    *   Various operating systems, configurations, GPUs, etc.

				*   [Chromium GPU FYI]

				    *   These bots run less-standard configurations like Windows with AMD GPUs,

				        Linux with Intel GPUs, etc.

				    *   These bots build with top of tree ANGLE rather than the `DEPS` version.

				    *   The [ANGLE tryservers] help ensure that these bots stay green. However,

				        it is possible that due to ANGLE changes these bots may be red while

				        the chromium.gpu bots are green.

				    *   The [ANGLE Wrangler] is on-call to help resolve ANGLE-related breakage

				        on this watefall.

				    *   To determine if a different ANGLE revision was used between two builds,

				        compare the `got_angle_revision` buildbot property on the GPU builders

				        or `parent_got_angle_revision` on the testers. This revision can be

				        used to do a `git log` in the `third_party/angle` repository.

				<!-- TODO(kainino): update link when the page is migrated -->

				[Tour of the Chromium Buildbot Waterfall]: http://www.chromium.org/developers/testing/chromium-build-infrastructure/tour-of-the-chromium-buildbot

				[Chromium GPU]: https://ci.chromium.org/p/chromium/g/chromium.gpu/console?reload=120

				[Chromium GPU FYI]: https://ci.chromium.org/p/chromium/g/chromium.gpu.fyi/console?reload=120

				[ANGLE tryservers]: https://build.chromium.org/p/tryserver.chromium.angle/waterfall

				<!-- TODO(kainino): update link when the page is migrated -->

				[ANGLE Wrangler]: https://sites.google.com/a/chromium.org/dev/developers/how-tos/angle-wrangling

				## Test Suites

				The bots run several test suites. The majority of them have been migrated to

				the Telemetry harness, and are run within the full browser, in order to better

				test the code that is actually shipped. As of this writing, the tests included:

				*   Tests using the Telemetry harness:

				    *   The WebGL conformance tests: `webgl_conformance_integration_test.py`

				    *   A Google Maps test: `maps_integration_test.py`

				    *   Context loss tests: `context_lost_integration_test.py`

				    *   Depth capture tests: `depth_capture_integration_test.py`

				    *   GPU process launch tests: `gpu_process_integration_test.py`

				    *   Hardware acceleration validation tests:

				        `hardware_accelerated_feature_integration_test.py`

				    *   Pixel tests validating the end-to-end rendering pipeline:

				        `pixel_integration_test.py`

				    *   Stress tests of the screenshot functionality other tests use:

				        `screenshot_sync_integration_test.py`

				*   `angle_unittests`: see `src/gpu/gpu.gyp`

				*   drawElements tests (on the chromium.gpu.fyi waterfall): see

				    `src/third_party/angle/src/tests/BUILD.gn`

				*   `gles2_conform_test` (requires internal sources): see

				    `src/gpu/gles2_conform_support/gles2_conform_test.gyp`

				*   `gl_tests`: see `src/gpu/BUILD.gn`

				*   `gl_unittests`: see `src/ui/gl/BUILD.gn`

				And more. See `src/content/test/gpu/generate_buildbot_json.py` for the

				complete description of bots and tests.

				Additionally, the Release bots run:

				*   `tab_capture_end2end_tests:` see

				    `src/chrome/browser/extensions/api/tab_capture/tab_capture_apitest.cc` and

				    `src/chrome/browser/extensions/api/cast_streaming/cast_streaming_apitest.cc`

				### More Details

				More details about the bots' setup can be found on the [GPU Testing] page.

				[GPU Testing]: https://sites.google.com/a/chromium.org/dev/developers/testing/gpu-testing

				## Wrangling

				### Prerequisites

				1.  Ideally a wrangler should be a Chromium committer. If you're on the GPU

				pixel wrangling rotation, there will be an email notifying you of the upcoming

				shift, and a calendar appointment.

				    *   If you aren't a committer, don't panic. It's still best for everyone on

				        the team to become acquainted with the procedures of maintaining the

				        GPU bots.

				    *   In this case you'll upload CLs to Gerrit to perform reverts (optionally

				        using the new "Revert" button in the UI), and might consider using

				        `TBR=` to speed through trivial and urgent CLs. In general, try to send

				        all CLs through the commit queue.

				    *   Contact bajones, kainino, kbr, vmiura, zmo, or another member of the

				        Chrome GPU team who's already a committer for help landing patches or

				        reverts during your shift.

				2.  Apply for [access to the bots].

				[access to the bots]: https://sites.google.com/a/google.com/chrome-infrastructure/golo/remote-access?pli=1

				### How to Keep the Bots Green

				1.  Watch for redness on the tree.

				    1.  [Sheriff-O-Matic now has support for the chromium.gpu.fyi waterfall]!

				    1.  The chromium.gpu bots are covered under Sheriff-O-Matic's [Chromium

				        tab]. As pixel wrangler, ignore any non-GPU test failures in this tab.

				    1.  The bots are expected to be green all the time. Flakiness on these bots

				        is neither expected nor acceptable.

				    1.  If a bot goes consistently red, it's necessary to figure out whether a

				        recent CL caused it, or whether it's a problem with the bot or

				        infrastructure.

				    1.  If it looks like a problem with the bot (deep problems like failing to

				        check out the sources, the isolate server failing, etc.) notify the

				        Chromium troopers and file a P1 bug with labels: Infra\>Labs,

				        Infra\>Troopers and Internals\>GPU\>Testing. See the general [tree

				        sheriffing page] for more details.

				    1.  Otherwise, examine the builds just before and after the redness was

				        introduced. Look at the revisions in the builds before and after the

				        failure was introduced.

				    1.  **File a bug** capturing the regression range and excerpts of any

				        associated logs. Regressions should be marked P1. CC engineers who you

				        think may be able to help triage the issue. Keep in mind that the logs

				        on the bots expire after a few days, so make sure to add copies of

				        relevant logs to the bug report.

				    1.  Use the `Hotlist=PixelWrangler` label to mark bugs that require the

				        pixel wrangler's attention, so it's easy to find relevant bugs when

				        handing off shifts.

				    1.  Study the regression range carefully. Use drover to revert any CLs

				        which break the chromium.gpu bots. Use your judgment about

				        chromium.gpu.fyi, since not all bots are covered by trybots. In the

				        revert message, provide a clear description of what broke, links to

				        failing builds, and excerpts of the failure logs, because the build

				        logs expire after a few days.

				1.  Make sure the bots are running jobs.

				    1.  Keep an eye on the console views of the various bots.

				    1.  Make sure the bots are all actively processing jobs. If they go offline

				        for a long period of time, the "summary bubble" at the top may still be

				        green, but the column in the console view will be gray.

				    1.  Email the Chromium troopers if you find a bot that's not processing

				        jobs.

				1.  Make sure the GPU try servers are in good health.

				    1.  The GPU try servers are no longer distinct bots on a separate

				        waterfall, but instead run as part of the regular tryjobs on the

				        Chromium waterfalls. The GPU tests run as part of the following

				        tryservers' jobs:

				        1.  <code>[linux_chromium_rel_ng]</code> on the [luci.chromium.try]

				            waterfall

				<!-- TODO(kainino): update link to luci.chromium.try -->

				        1.  <code>[mac_chromium_rel_ng]</code> on the [tryserver.chromium.mac]

				            waterfall

				<!-- TODO(kainino): update link to luci.chromium.try -->

				        1.  <code>[win7_chromium_rel_ng]</code> on the [tryserver.chromium.win]

				            waterfall

				    1.  The best tool to use to quickly find flakiness on the tryservers is the

				        new [Chromium Try Flakes] tool. Look for the names of GPU tests (like

				        maps_pixel_test) as well as the test machines (e.g.

				        mac_chromium_rel_ng). If you see a flaky test, file a bug like [this

				        one](http://crbug.com/444430). Also look for compile flakes that may

				        indicate that a bot needs to be clobbered. Contact the Chromium

				        sheriffs or troopers if so.

				    1.  Glance at these trybots from time to time and see if any GPU tests are

				        failing frequently. **Note** that test failures are **expected** on

				        these bots: individuals' patches may fail to apply, fail to compile, or

				        break various tests. Look specifically for patterns in the failures. It

				        isn't necessary to spend a lot of time investigating each individual

				        failure. (Use the "Show: 200" link at the bottom of the page to see

				        more history.)

				    1.  If the same set of tests are failing repeatedly, look at the individual

				        runs. Examine the swarming results and see whether they're all running

				        on the same machine. (This is the "Bot assigned to task" when clicking

				        any of the test's shards in the build logs.) If they are, something

				        might be wrong with the hardware. Use the [Swarming Server Stats] tool

				        to drill down into the specific builder.

				    1.  If you see the same test failing in a flaky manner across multiple

				        machines and multiple CLs, it's crucial to investigate why it's

				        happening. [crbug.com/395914](http://crbug.com/395914) was one example

				        of an innocent-looking Blink change which made it through the commit

				        queue and introduced widespread flakiness in a range of GPU tests. The

				        failures were also most visible on the try servers as opposed to the

				        main waterfalls.

				1.  Check if any pixel test failures are actual failures or need to be

				    rebaselined.

				    1.  For a given build failing the pixel tests, click the "stdio" link of

				        the "pixel" step.

				    1.  The output will contain a link of the form

				        <http://chromium-browser-gpu-tests.commondatastorage.googleapis.com/view_test_results.html?242523_Linux_Release_Intel__telemetry>

				    1.  Visit the link to see whether the generated or reference images look

				        incorrect.

				    1.  All of the reference images for all of the bots are stored in cloud

				        storage under [chromium-gpu-archive/reference-images]. They are indexed

				        by version number, OS, GPU vendor, GPU device, and whether or not

				        antialiasing is enabled in that configuration. You can download the

				        reference images individually to examine them in detail.

				1.  Rebaseline pixel test reference images if necessary.

				    1.  Follow the [instructions on the GPU testing page].

				    1.  Alternatively, if absolutely necessary, you can use the [Chrome

				        Internal GPU Pixel Wrangling Instructions] to delete just the broken

				        reference images for a particular configuration.

				1.  Update Telemetry-based test expectations if necessary.

				    1.  Most of the GPU tests are run inside a full Chromium browser, launched

				        by Telemetry, rather than a Gtest harness. The tests and their

				        expectations are contained in [src/content/test/gpu/gpu_tests/] . See

				        for example <code>[webgl_conformance_expectations.py]</code>,

				        <code>[gpu_process_expectations.py]</code> and

				        <code>[pixel_expectations.py]</code>.

				    1.  See the header of the file a list of modifiers to specify a bot

				        configuration. It is possible to specify OS (down to a specific

				        version, say, Windows 7 or Mountain Lion), GPU vendor

				        (NVIDIA/AMD/Intel), and a specific GPU device.

				    1.  The key is to maintain the highest coverage: if you have to disable a

				        test, disable it only on the specific configurations it's failing. Note

				        that it is not possible to discern between Debug and Release

				        configurations.

				    1.  Mark tests failing or skipped, which will suppress flaky failures, only

				        as a last resort. It is only really necessary to suppress failures that

				        are showing up on the GPU tryservers, since failing tests no longer

				        close the Chromium tree.

				    1.  Please read the section on [stamping out flakiness] for motivation on

				        how important it is to eliminate flakiness rather than hiding it.

				1.  For the remaining Gtest-style tests, use the [`DISABLED_`

				    modifier][gtest-DISABLED] to suppress any failures if necessary.

				[Sheriff-O-Matic now has support for the chromium.gpu.fyi waterfall]: https://sheriff-o-matic.appspot.com/chromium.gpu.fyi

				[Chromium tab]: https://sheriff-o-matic.appspot.com/chromium

				[tree sheriffing page]: https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs

				[linux_chromium_rel_ng]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_chromium_rel_ng

				[luci.chromium.try]: https://ci.chromium.org/p/chromium/g/luci.chromium.try/builders

				[mac_chromium_rel_ng]: https://ci.chromium.org/buildbot/tryserver.chromium.mac/mac_chromium_rel_ng/

				[tryserver.chromium.mac]: https://ci.chromium.org/p/chromium/g/tryserver.chromium.mac/builders

				[win7_chromium_rel_ng]: https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/

				[tryserver.chromium.win]: https://ci.chromium.org/p/chromium/g/tryserver.chromium.win/builders

				[Chromium Try Flakes]: http://chromium-try-flakes.appspot.com/

				<!-- TODO(kainino): link doesn't work, but is still included from chromium-swarm homepage so not removing it now -->

				[Swarming Server Stats]: https://chromium-swarm.appspot.com/stats

				[chromium-gpu-archive/reference-images]: https://console.developers.google.com/storage/chromium-gpu-archive/reference-images

				[instructions on the GPU testing page]: https://sites.google.com/a/chromium.org/dev/developers/testing/gpu-testing#TOC-Updating-and-Adding-New-Pixel-Tests-to-the-GPU-Bots

				[Chrome Internal GPU Pixel Wrangling Instructions]: https://sites.google.com/a/google.com/client3d/documents/chrome-internal-gpu-pixel-wrangling-instructions

				[src/content/test/gpu/gpu_tests/]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/

				[webgl_conformance_expectations.py]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/webgl_conformance_expectations.py

				[gpu_process_expectations.py]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/gpu_process_expectations.py

				[pixel_expectations.py]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/pixel_expectations.py

				[stamping out flakiness]: gpu_testing.md#Stamping-out-Flakiness

				[gtest-DISABLED]: https://github.com/google/googletest/blob/master/googletest/docs/AdvancedGuide.md#temporarily-disabling-tests

				### When Bots Misbehave (SSHing into a bot)

				1.  See the [Chrome Internal GPU Pixel Wrangling Instructions] for information

				    on ssh'ing in to the GPU bots.

				[Chrome Internal GPU Pixel Wrangling Instructions]: https://sites.google.com/a/google.com/client3d/documents/chrome-internal-gpu-pixel-wrangling-instructions

				### Reproducing WebGL conformance test failures locally

				1.  From the buildbot build output page, click on the failed shard to get to

				    the swarming task page. Scroll to the bottom of the left panel for a

				    command to run the task locally. This will automatically download the build

				    and any other inputs needed.

				2.  Alternatively, to run the test on a local build, pass the arguments

				    `--browser=exact --browser-executable=/path/to/binary` to

				    `content/test/gpu/run_gpu_integration_test.py`.

				    Also see the [telemetry documentation].

				[telemetry documentation]: https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/docs/run_benchmarks_locally.md

				## Extending the GPU Pixel Wrangling Rotation

				See the [Chrome Internal GPU Pixel Wrangling Instructions] for information on extending the rotation.

				[Chrome Internal GPU Pixel Wrangling Instructions]: https://sites.google.com/a/google.com/client3d/documents/chrome-internal-gpu-pixel-wrangling-instructions

Port GPU documentation to Markdown

235 docs/gpu/debugging_gpu_related_code.md Normal file

571 docs/gpu/gpu_testing.md Normal file

539 docs/gpu/gpu_testing_bot_details.md Normal file

BIN docs/gpu/images/wrangler.png Normal file

298 docs/gpu/pixel_wrangling.md Normal file

235

docs/gpu/debugging_gpu_related_code.md Normal file

571

docs/gpu/gpu_testing.md Normal file

539

docs/gpu/gpu_testing_bot_details.md Normal file

BIN
docs/gpu/images/wrangler.png Normal file

298

docs/gpu/pixel_wrangling.md Normal file