
The sample code snippets in gpu_synchronization.md were missing the required `sync_token` argument, this adds them for clarity. Change-Id: Ib2cf842004e348deafa09bf6708f3a9428a1f1b3 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3550610 Auto-Submit: Klaus Weidner <klausw@chromium.org> Reviewed-by: Brandon Jones <bajones@chromium.org> Commit-Queue: Brandon Jones <bajones@chromium.org> Cr-Commit-Position: refs/heads/main@{#985026}
385 lines
16 KiB
Markdown
385 lines
16 KiB
Markdown
# GPU Synchronization in Chrome
|
|
|
|
Chrome supports multiple mechanisms for sequencing GPU drawing operations, this
|
|
document provides a brief overview. The main focus is a high-level explanation
|
|
of when synchronization is needed and which mechanism is appropriate.
|
|
|
|
[TOC]
|
|
|
|
## Glossary
|
|
|
|
**GL Sync Object**: Generic GL-level synchronization object that can be in a
|
|
"unsignaled" or "signaled" state. The only current implementation of this is a
|
|
GL fence.
|
|
|
|
**GL Fence**: A GL sync object that is inserted into the GL command stream. It
|
|
starts out unsignaled and becomes signaled when the GPU reaches this point in the
|
|
command stream, implying that all previous commands have completed.
|
|
|
|
**Client Wait**: Block the client thread until a sync object becomes signaled,
|
|
or until a timeout occurs.
|
|
|
|
**Server Wait**: Tells the GPU to defer executing commands issued after a fence
|
|
until the fence signals. The client thread continues executing immediately and
|
|
can continue submitting GL commands.
|
|
|
|
**CHROMIUM fence sync**: A command buffer specific GL fence that sequences
|
|
operations among command buffer GL contexts without requiring driver-level
|
|
execution of previous commands.
|
|
|
|
**Native GL Fence**: A GL Fence backed by a platform-specific cross-process
|
|
synchronization mechanism.
|
|
|
|
**GPU Fence Handle**: An IPC-transportable object (typically a file descriptor)
|
|
that can be used to duplicate a native GL fence into a different process's
|
|
context.
|
|
|
|
**GPU Fence**: A Chrome abstraction that owns a GPU fence handle representing a
|
|
native GL fence, usable for cross-process synchronization.
|
|
|
|
## Use case overview
|
|
|
|
The core scenario is synchronizing read and write access to a shared resource,
|
|
for example drawing an image into an offscreen texture and compositing the
|
|
result into a final image. The drawing operations need to be completed before
|
|
reading to ensure correct output. A typical effect of wrong synchronization is
|
|
that the output contains blank or incomplete results instead of the expected
|
|
rendered sub-images, causing flickering or tearing.
|
|
|
|
"Completed" in this case means that the end result of using a resource as input
|
|
will be equivalent to waiting for everything to finish rendering, but it does
|
|
not necessarily mean that the GPU has fully finished all drawing operations at
|
|
that time.
|
|
|
|
## Single GL context: no synchronization needed
|
|
|
|
If all access to the shared resource happens in the same GL context, there is no
|
|
need for explicit synchronization. GL guarantees that commands are logically
|
|
processed in the order they are submitted. This is true both for local GL
|
|
contexts (GL calls via ui/gl/ interfaces) and for a single command buffer GL
|
|
context.
|
|
|
|
## Multiple driver-level GL contexts in the same share group: use GLFence
|
|
|
|
A process can create multiple GL contexts that are part of the same share group.
|
|
These contexts can be created in different threads within this process.
|
|
|
|
In this case, GL fences must be used for sequencing, for example:
|
|
|
|
1. Context A: draw image, create GLFence
|
|
1. Context B: server wait or client wait for GLFence, read image
|
|
|
|
[gl::GLFence](/ui/gl/gl_fence.h) and its subclasses provide wrappers for
|
|
GL/EGL fence handling methods such as `eglFenceSyncKHR` and `eglWaitSyncKHR`.
|
|
These fence objects can be used cross-thread as long as both thread's GL
|
|
contexts are part of the same share group.
|
|
|
|
For more details, please refer to the underlying extension documentation, for example:
|
|
|
|
* https://www.khronos.org/opengl/wiki/Synchronization
|
|
* https://www.khronos.org/registry/EGL/extensions/KHR/EGL_KHR_fence_sync.txt
|
|
* https://www.khronos.org/registry/EGL/extensions/KHR/EGL_KHR_wait_sync.txt
|
|
|
|
## Implementation-dependent: same-thread driver-level GL contexts
|
|
|
|
Many GL driver implementations are based on a per-thread command queue,
|
|
with the effect that commands are processed in order even if they were issued
|
|
from different contexts on that thread without explicit synchronization.
|
|
|
|
This behavior is not part of the GL standard, and some driver implementations
|
|
use a per-context command queue where this assumption is not true.
|
|
|
|
See [issue 510232](http://crbug.com/510243#c23) for an example of a problematic
|
|
sequence:
|
|
|
|
```
|
|
// In one thread:
|
|
MakeCurrent(A);
|
|
Render1();
|
|
MakeCurrent(B);
|
|
Render2();
|
|
CreateSync(X);
|
|
|
|
// And in another thread:
|
|
MakeCurrent(C);
|
|
WaitSync(X);
|
|
Render3();
|
|
MakeCurrent(D);
|
|
Render4();
|
|
```
|
|
|
|
The only serialization guarantee is that Render2 will complete before Render3,
|
|
but Render4 could theoretically complete before Render1.
|
|
|
|
Chrome assumes that the render steps happen in order Render1, Render2, Render3,
|
|
and Render4, and requires this behavior to ensure security. If the driver doesn't
|
|
ensure this sequencing, Chrome has to emulate it using virtual contexts. (Or by
|
|
using explicit synchronization, but it doesn't do that today.) See also the
|
|
"CHROMIUM fence sync" section below.
|
|
|
|
## Command buffer GL clients: use CHROMIUM sync tokens
|
|
|
|
Chrome's command buffer IPC interface uses multiple layers. There are multiple
|
|
active IPC channels (typically one per process, i.e. one per Renderer and one
|
|
for Browser). Each IPC channel has multiple scheduling groups (also called
|
|
streams), and each stream can contain multiple command buffers, which in turn
|
|
contain a sequence of GL commands.
|
|
|
|
Command buffers in the same client-side share group must be in the same stream.
|
|
Command scheduling granuarity is at the stream level, and a client can choose to
|
|
create and use multiple streams with different stream priorities. Stream IDs are
|
|
arbitrary integers assigned by the client at creation time, see for example the
|
|
[viz::ContextProviderCommandBuffer](/services/viz/public/cpp/gpu/context_provider_command_buffer.h)
|
|
constructor.
|
|
|
|
The CHROMIUM sync token is intended to order operations among command buffer GL
|
|
instructions. It inserts an internal fence sync command in the stream, flushing
|
|
it appropriately (see below), and generating a sync token from it which is a
|
|
cross-context transportable reference to the underlying fence sync. A
|
|
WaitSyncTokenCHROMIUM call does **not** ensure that the underlying GL commands
|
|
have been executed at the GPU driver level, this mechanism is not suitable for
|
|
synchronizing command buffer GL operations with a local driver-level GL context.
|
|
|
|
See the
|
|
[CHROMIUM_sync_point](/gpu/GLES2/extensions/CHROMIUM/CHROMIUM_sync_point.txt)
|
|
documentation for details.
|
|
|
|
Commands issued within a single command buffer don't need to be synchronized
|
|
explicitly, they will be executed in the same order that they were issued.
|
|
|
|
Multiple command buffers within the same stream can use an ordering barrier to
|
|
sequence their commands. Sync tokens are not necessary. Example:
|
|
|
|
```c++
|
|
// Command buffers gl1 and gl2 are in the same stream.
|
|
Render1(gl1);
|
|
gl1->OrderingBarrierCHROMIUM()
|
|
Render2(gl2); // will happen after Render1.
|
|
```
|
|
|
|
Command buffers that are in different streams need to use sync tokens. If both
|
|
are using the same IPC channel (i.e. same client process), an unverified sync
|
|
token is sufficient, and commands do not need to be flushed to the server:
|
|
|
|
```c++
|
|
// stream A
|
|
Render1(glA);
|
|
glA->GenUnverifiedSyncTokenCHROMIUM(out_sync_token);
|
|
|
|
// stream B
|
|
glB->WaitSyncTokenCHROMIUM(sync_token);
|
|
Render2(glB); // will happen after Render1.
|
|
```
|
|
|
|
Command buffers that are using different IPC channels must use verified sync
|
|
tokens. Verification is a check that the underlying fence sync was flushed to
|
|
the server. Cross-process synchronization always uses verified sync tokens.
|
|
`GenSyncTokenCHROMIUM` will force a shallow flush as a side effect if necessary.
|
|
Example:
|
|
|
|
```c++
|
|
// IPC channel in process X
|
|
Render1(glX);
|
|
glX->GenSyncTokenCHROMIUM(out_sync_token);
|
|
|
|
// IPC channel in process Y
|
|
glY->WaitSyncTokenCHROMIUM(sync_token);
|
|
Render2(glY); // will happen after Render1.
|
|
```
|
|
|
|
Alternatively, unverified sync tokens can be converted to verified ones in bulk
|
|
by calling `VerifySyncTokensCHROMIUM`. This will wait for a flush to complete as
|
|
necessary. Use this to avoid multiple sequential flushes:
|
|
|
|
```c++
|
|
gl->GenUnverifiedSyncTokenCHROMIUM(out_sync_tokens[0]);
|
|
gl->GenUnverifiedSyncTokenCHROMIUM(out_sync_tokens[1]);
|
|
gl->VerifySyncTokensCHROMIUM(out_sync_tokens, 2);
|
|
```
|
|
|
|
### Implementation notes
|
|
|
|
Correctness of the CHROMIUM fence sync mechanism depends on the assumption that
|
|
commands issued from the command buffer service side happen in the order they
|
|
were issued in that thread. This is handled in different ways:
|
|
|
|
* Issue a glFlush on switching contexts on platforms where glFlush is sufficient
|
|
to ensure ordering, i.e. MacOS. (This approach would not be well suited to
|
|
tiling GPUs as used on many mobile GPUs where glFlush is an expensive
|
|
operation, it may force content load/store between tile memory and main
|
|
memory.) See for example
|
|
[gl::GLContextCGL::MakeCurrent](/ui/gl/gl_context_cgl.cc):
|
|
```c++
|
|
// It's likely we're going to switch OpenGL contexts at this point.
|
|
// Before doing so, if there is a current context, flush it. There
|
|
// are many implicit assumptions of flush ordering between contexts
|
|
// at higher levels, and if a flush isn't performed, OpenGL commands
|
|
// may be issued in unexpected orders, causing flickering and other
|
|
// artifacts.
|
|
```
|
|
|
|
* Force context virtualization so that all commands are issued into a single
|
|
driver-level GL context. This is used on Qualcomm/Adreno chipsets, see [issue
|
|
691102](http://crbug.com/691102).
|
|
|
|
* Assume per-thread command queues without explicit synchronization. GLX
|
|
effectively ensures this. On Windows, ANGLE uses a single D3D device
|
|
underneath all contexts which ensures strong ordering.
|
|
|
|
GPU control tasks are processed out of band and are only partially ordered in
|
|
respect to GL commands. A gpu_control task always happens before any following
|
|
GL commands issued on the same IPC channel. It usually executes before any
|
|
preceding unflushed GL commands, but this is not guaranteed. A
|
|
`ShallowFlushCHROMIUM` ensures that any following gpu_control tasks will execute
|
|
after the flushed GL commands.
|
|
|
|
In this example, DoTask will execute after GLCommandA and before GLCommandD, but
|
|
there is no ordering guarantee relative to CommandB and CommandC:
|
|
|
|
```c++
|
|
// gles2_implementation.cc
|
|
|
|
helper_->GLCommandA();
|
|
ShallowFlushCHROMIUM();
|
|
|
|
helper_->GLCommandB();
|
|
helper_->GLCommandC();
|
|
gpu_control_->DoTask();
|
|
|
|
helper_->GLCommandD();
|
|
|
|
// Execution order is one of:
|
|
// A | DoTask B C | D
|
|
// A | B DoTask C | D
|
|
// A | B C DoTask | D
|
|
```
|
|
|
|
The shallow flush adds the pending GL commands to the service's task queue, and
|
|
this task queue is also used by incoming gpu control tasks and processed in
|
|
order. The `ShallowFlushCHROMIUM` command returns as soon as the tasks are
|
|
queued and does not wait for them to be processed.
|
|
|
|
## Cross-process transport: GpuFence and GpuFenceHandle
|
|
|
|
Some platforms such as Android (most devices N and above) and ChromeOS support
|
|
synchronizing a native GL context with a command buffer GL context through a
|
|
GpuFence.
|
|
|
|
Use the static `gl::GLFence::IsGpuFenceSupported()` method to check at runtime if
|
|
the current platform has support for the GpuFence mechanism including
|
|
GpuFenceHandle transport.
|
|
|
|
The GpuFence mechanism supports two use cases:
|
|
|
|
* Create a GLFence object in a local context, convert it to a client-side
|
|
GpuFence, duplicate it into a command buffer service-side gpu fence, and
|
|
issue a server wait on the command buffer service side. That service-side
|
|
wait will be unblocked when the *client-side* GpuFence signals.
|
|
|
|
* Create a new command buffer service-side gpu fence, request a GpuFenceHandle
|
|
from it, use this handle to create a native GL fence object in the local
|
|
context, then issue a server wait on the local GL fence object. This local
|
|
server wait will be unblocked when the *service-side* gpu fence signals.
|
|
|
|
The [CHROMIUM_gpu_fence
|
|
extension](/gpu/GLES2/extensions/CHROMIUM/CHROMIUM_gpu_fence.txt) documents
|
|
the GLES API as used through the command buffer interface. This section contains
|
|
additional information about the integration with local GL contexts that is
|
|
needed to work with these objects.
|
|
|
|
### Driver-level wrappers
|
|
|
|
In general, you should use the static `gl::GLFence::CreateForGpuFence()` and
|
|
`gl::GLFence::CreateFromGpuFence()` factory methods to create a
|
|
platform-specific local fence object instead of using an implementation class
|
|
directly.
|
|
|
|
For Android and ChromeOS, the
|
|
[gl::GLFenceAndroidNativeFenceSync](/ui/gl/gl_fence_android_native_fence_sync.h)
|
|
implementation wraps the
|
|
[EGL_ANDROID_native_fence_sync](https://www.khronos.org/registry/EGL/extensions/ANDROID/EGL_ANDROID_native_fence_sync.txt)
|
|
extension that allows creating a special EGLFence object from which a file
|
|
descriptor can be extracted, and then creating a duplicate fence object from
|
|
that file descriptor that is synchronized with the original fence.
|
|
|
|
### GpuFence and GpuFenceHandle
|
|
|
|
A [gfx::GpuFence](/ui/gfx/gpu_fence.h) object owns a GPU fence handle
|
|
representing a native GL fence. The `AsClientGpuFence` method casts it to a
|
|
ClientGpuFence type for use with the [CHROMIUM_gpu_fence
|
|
extension](/gpu/GLES2/extensions/CHROMIUM/CHROMIUM_gpu_fence.txt)'s
|
|
`CreateClientGpuFenceCHROMIUM` call.
|
|
|
|
A [gfx::GpuFenceHandle](/ui/gfx/gpu_fence_handle.h) is an IPC-transportable
|
|
wrapper for a file descriptor or other underlying primitive object, and is used
|
|
to duplicate a native GL fence into another process. It has value semantics and
|
|
can be copied multiple times, and then consumed exactly one time. Consumers take
|
|
ownership of the underlying resource. Current GpuFenceHandle consumers are:
|
|
|
|
* The `gfx::GpuFence(gpu_fence_handle)` constructor takes ownership of the
|
|
handle's resources without constructing a local fence.
|
|
|
|
* The IPC subsystem closes resources after sending. The typical idiom is to call
|
|
`gfx::CloneHandleForIPC(handle)` on a GpuFenceHandle retrieved from a
|
|
scope-lifetime object to create a copied handle that will be owned by the IPC
|
|
subsystem.
|
|
|
|
### Sample Code
|
|
|
|
A usage example for two-process synchronization is to sequence access to a
|
|
globally shared drawable such as an AHardwareBuffer on Android, where the
|
|
writer uses a local GL context and the reader is a command buffer context in
|
|
the GPU process. The writer process draws into an AHardwareBuffer-backed
|
|
GLImage in the local GL context, then creates a gpu fence to mark the end of
|
|
drawing operations:
|
|
|
|
```c++
|
|
// This example assumes that GpuFence is supported. If not, the application
|
|
// should fall back to a different transport or synchronization method.
|
|
DCHECK(gl::GLFence::IsGpuFenceSupported())
|
|
|
|
// ... write to the shared drawable in local context, then create
|
|
// a local fence.
|
|
std::unique_ptr<gl::GLFence> local_fence = gl::GLFence::CreateForGpuFence();
|
|
|
|
// Convert to a GpuFence.
|
|
std::unique_ptr<gfx::GpuFence> gpu_fence = local_fence->GetGpuFence();
|
|
// It's ok for local_fence to be destroyed now, the GpuFence remains valid.
|
|
|
|
// Create a matching gpu fence on the command buffer context, issue
|
|
// server wait, and destroy it.
|
|
GLuint id = gl->CreateClientGpuFenceCHROMIUM(gpu_fence.AsClientGpuFence());
|
|
// It's ok for gpu_fence to be destroyed now.
|
|
gl->WaitGpuFenceCHROMIUM(id);
|
|
gl->DestroyGpuFenceCHROMIUM(id);
|
|
|
|
// ... read from the shared drawable via command buffer. These reads
|
|
// will happen after the local_fence has signalled. The local
|
|
// fence and gpu_fence dn't need to remain alive for this.
|
|
```
|
|
|
|
If a process wants to consume a drawable that was produced through a command
|
|
buffer context in the GPU process, the sequence is as follows:
|
|
|
|
```c++
|
|
// Set up callback that's waiting for the drawable to be ready.
|
|
void callback(std::unique_ptr<gfx::GpuFence> gpu_fence) {
|
|
// Create a local context GL fence from the GpuFence.
|
|
std::unique_ptr<gl::GLFence> local_fence =
|
|
gl::GLFence::CreateFromGpuFence(*gpu_fence);
|
|
local_fence->ServerWait();
|
|
// ... read from the shared drawable in the local context.
|
|
}
|
|
|
|
// ... write to the shared drawable via command buffer, then
|
|
// create a gpu fence:
|
|
GLuint id = gl->CreateGpuFenceCHROMIUM();
|
|
context_support->GetGpuFenceHandle(id, base::BindOnce(callback));
|
|
gl->DestroyGpuFenceCHROMIUM(id);
|
|
```
|
|
|
|
It is legal to create the GpuFence on a separate command buffer context instead
|
|
of on the command buffer channel that did the drawing operations, but in that
|
|
case `gl->WaitSyncTokenCHROMIUM()` or equivalent must be used to sequence the
|
|
operations between the distinct command buffer contexts as usual.
|