0

Add documentation for non-fatal crash reporting

Bug: None
Change-Id: I3e26d39ad2cee8ca02685f78affb04ccf3035ed0
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5951744
Commit-Queue: Nico Weber <thakis@chromium.org>
Auto-Submit: Peter Boström <pbos@chromium.org>
Reviewed-by: Greg Thompson <grt@chromium.org>
Reviewed-by: Nico Weber <thakis@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1373387}
This commit is contained in:
Peter Boström
2024-10-24 16:59:24 +00:00
committed by Chromium LUCI CQ
parent 58089e51c1
commit 09d03e12a8

@ -185,6 +185,40 @@ problem rather than resolving it. In rare exceptions you could use
timeline expectations, though in this case you must also handle failure as best
you can as failures are known to happen.
## Non-fatal crash reporting
For non-invariant situations we'd like to be notified about, such as an OS API
returning undocumented or unexpected values, we'd like to collect enough
information to diagnose what's going on. Here non-fatal crash reporting can be
done with `base::debug::DumpWithoutCrashing()`. Using crash keys is helpful for
gathering enough information to take action. When doing so, provide enough
context (such as a link to a bug) to explain why the information is being
collected and actions to take when it fires.
Note that this should only be used in cases where crash dumping yields something
actionable and should not be kept dumping indefinitely. Crash dumping causes
jank and is rate limited which hides (throttles) other crash reporting. As a
`DumpWithoutCrashing()` starts firing, it should be made to stop firing. Either
remove it if this was part of a one-off investigation (and we have enough data)
or update the code to make sure it no longer generates reports (for instance,
handle a new OS API result). In either case consider merging to release branches
to avoid generating a large number of crash reports.
As an illustrative example here's a snippet that notifies us of unexpected OS
API results and the last reported error from WaitableEvent on Windows. When this
hits we want to update surrounding code to handle the new return code or prevent
it from happening. If we're generating a concerning number of crash reports we
should also decide whether to merge a fix to release branches or remove
`base::debug::DumpWithoutCrashing();` on branch to prevent excessive flooding.
```c++
NOINLINE void ReportInvalidWaitableEventResult(DWORD result) {
SCOPED_CRASH_KEY_NUMBER("WaitableEvent", "result", result);
SCOPED_CRASH_KEY_NUMBER("WaitableEvent", "last_error", ::GetLastError());
base::debug::DumpWithoutCrashing(); // https://crbug.com/1478972.
}
```
## Alternatives in tests
For failures in tests, GoogleTest macros such as `EXPECT_*`, `ASSERT_*` or