0

Update GPU upgrade documentation

Updates the GPU documentation on performing a driver/OS upgrade to use
the swarming OR operator instead of the synthetic swarming dimensions.

Bug: 920665
Change-Id: Id510e75e34b577282ad7b45d1b1282c135e00288
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2212922
Auto-Submit: Brian Sheedy <bsheedy@chromium.org>
Commit-Queue: Yuly Novikov <ynovikov@chromium.org>
Reviewed-by: Yuly Novikov <ynovikov@chromium.org>
Cr-Commit-Position: refs/heads/master@{#771202}
This commit is contained in:
Brian Sheedy
2020-05-21 21:34:14 +00:00
committed by Commit Bot
parent 3fd577db56
commit 811cca77b7

@ -251,14 +251,11 @@ sorry):
* [`pools.cfg`][pools.cfg]
* Defines the Swarming pools for GCEs and Mac VMs used for manually
triggered trybots.
* [`bot_config.py`][bot_config.py]
* Defines the stable GPU driver and OS versions in GPU Swarming pools.
[infradata/config]: https://chrome-internal.googlesource.com/infradata/config
[gpu.star]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/starlark/bots/chromium/gpu.star
[chromium.star]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/starlark/bots/chromium/chromium.star
[pools.cfg]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/pools.cfg
[bot_config.py]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/scripts/bot_config.py
[main.star]: https://chrome-internal.googlesource.com/infradata/config/+/master/main.star
[vms.cfg]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/gce-provider/vms.cfg
@ -666,10 +663,6 @@ or OS update. To do this:
1. Make sure that all of the current Swarming jobs for this OS and GPU
configuration are targeted at the "stable" version of the driver and the OS
in [`waterfalls.pyl`][waterfalls.pyl] and [`mixins.pyl`][mixins.pyl].
Make sure that there are "named" stable versions of the driver and the OS
there, which target the `_TARGETED_DRIVER_VERSIONS` and
`_TARGETED_OS_VERSIONS` dictionaries in [`bot_config.py`][bot_config.py]
(Google internal).
1. File a `Build Infrastructure` bug, component `Infra>Labs`, to have ~4 of
the physical machines already in the Swarming pool upgraded to the new
version of the driver or the OS.
@ -684,51 +677,41 @@ or OS update. To do this:
it'll be necessary to follow the instructions on
[updating Gold baselines (step #4)][updating gold baselines].
1. Watch the new machine for a day or two to make sure it's stable.
1. When it is, update [`bot_config.py`][bot_config.py] (Google internal)
to *add* a mapping between the new driver version and the "stable" version.
For example:
1. When it is, add the experimental driver/OS to the `_stable` mixin using the
swarming OR operator `|`. For example:
```
_TARGETED_DRIVER_VERSIONS = {
# NVIDIA Quadro P400, Ubuntu Stable version
'10de:1cb3-384.90': 'nvidia-quadro-p400-ubuntu-stable',
# NVIDIA Quadro P400, new Ubuntu Stable version
'10de:1cb3-410.78': 'nvidia-quadro-p400-ubuntu-stable',
# ...
}
```
And/or a mapping between the new OS version and the "stable" version.
For example:
```
_TARGETED_OS_VERSIONS = {
# Linux NVIDIA Quadro P400
'10de:1cb3': {
'Ubuntu-14.04': 'linux-nvidia-stable',
'Ubuntu-19.04': 'linux-nvidia-stable',
'win10_intel_hd_630_stable': {
'swarming': {
'dimensions': {
'gpu': '8086:5912-26.20.100.7870|8086:5912-26.20.100.8141',
'os': 'Windows-10',
'pool': 'chromium.tests.gpu',
},
},
# ...
}
```
The new driver or OS version should match the one just added for the
experimental bot. Get this CL reviewed and landed.
[Sample CL (Google internal)][sample targeted version cl].
This will cause tests triggered using the `_stable` mixin to run on either
the old stable dimension or the experimental/new stable dimension.
**NOTE** There is a hard cap of 8 combinations in swarming, so you can only
use the OR operator in up to 3 dimensions if each dimension only has two
options. More than two options per dimension is allowed as long as the total
number of combinations is 8 or less.
1. After it lands, ask the Chrome Infrastructure Labs team to roll out the
driver update across all of the similarly configured bots in the swarming
pool.
1. If necessary, update pixel test expectations and remove the suppressions
added above.
1. Remove the old driver or OS version from [`bot_config.py`][bot_config.py],
leaving the "stable" driver version pointing at the newly upgraded version.
1. Remove the old driver or OS version from the `_stable` mixin, leaving just
the new stable version.
Note that we leave the experimental bot in place. We could reclaim it, but it
seems worthwhile to continuously test the "next" version of graphics drivers as
well as the current stable ones.
[sample driver cl]: https://chromium-review.googlesource.com/c/chromium/src/+/1726875
[sample targeted version cl]: https://chrome-internal-review.googlesource.com/c/infradata/config/+/1602377
[updating gold baselines]: https://chromium.googlesource.com/chromium/src/+/HEAD/docs/gpu/pixel_wrangling.md#how-to-keep-the-bots-green
## Credentials for various servers