
Fixed: 330268473 Change-Id: I709ed6130cbc0318fa92675d6ba8e53094c30a09 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5754224 Reviewed-by: Yuki Shiino <yukishiino@chromium.org> Reviewed-by: Kalvin Lee <kdlee@chromium.org> Reviewed-by: Daniel Cheng <dcheng@chromium.org> Commit-Queue: Mikihito Matsuura <mikt@google.com> Cr-Commit-Position: refs/heads/main@{#1345269}
264 lines
12 KiB
Markdown
264 lines
12 KiB
Markdown
# Debugging Memory Issues
|
|
|
|
This page is designed to help Chromium developers debug memory issues.
|
|
|
|
When in doubt, reach out to memory-dev@chromium.org.
|
|
|
|
[TOC]
|
|
|
|
## Investigating Reproducible Memory Regression
|
|
|
|
Let's say that there's a CL or feature that reproducibly increases memory usage
|
|
when it's landed/enabled, given a particular set of repro steps.
|
|
|
|
* Take a look at [the documentation](/docs/memory/README.md) for both
|
|
taking and navigating memory-infra traces.
|
|
* Take two memory-infra traces. One with the reproducible memory regression, and
|
|
one without.
|
|
* Load the memory-infra traces into two tabs.
|
|
* Compare the memory dump providers and look for the one that shows the
|
|
regression. Follow the relevant link.
|
|
* [The regression is in the Malloc MemoryDumpProvider.](#Investigating-Reproducible-Memory-Issues)
|
|
* [The regression is in a non-Malloc
|
|
MemoryDumpProvider.](#Regression-in-Non-Malloc-MemoryDumpProvider)
|
|
* [The regression is only observed in **private
|
|
footprint**.](#Regression-only-in-Private-Footprint)
|
|
* [No regression is observed.](#No-observed-regression)
|
|
|
|
### Regression in Malloc MemoryDumpProvider
|
|
|
|
Repeat the above steps, but this time also [take a heap
|
|
dump](#Taking-a-Heap-Dump). Confirm that the regression is also visible in the
|
|
heap dump, and then compare the two heap dumps to find the difference. You can
|
|
also use
|
|
[diff_heap_profiler.py](https://cs.chromium.org/chromium/src/third_party/catapult/experimental/tracing/bin/diff_heap_profiler.py)
|
|
to perform the diff.
|
|
|
|
### Regression in Non-Malloc MemoryDumpProvider
|
|
|
|
Hopefully the MemoryDumpProvider has sufficient information to help diagnose the
|
|
leak. Depending on the whether the leaked object is allocated via malloc or new
|
|
- it usually should be, you can also use the steps for debugging a Malloc
|
|
MemoryDumpProvider regression.
|
|
|
|
### Regression only in Private Footprint
|
|
|
|
* Repeat the repro steps, but instead of taking a memory-infra trace, use
|
|
the following tools to map the process's virtual space:
|
|
* On macOS, use vmmap
|
|
* On Windows, use SysInternal VMMap
|
|
* On other OSes, use /proc/<pid\>/smaps.
|
|
* The results should help diagnose what's happening. Contact the
|
|
memory-dev@chromium.org mailing list for more help.
|
|
|
|
### No observed regression
|
|
|
|
* If there isn't a regression in PrivateMemoryFootprint, then this might become
|
|
a question of semantics for what constitutes a memory regression. Common
|
|
problems include:
|
|
* Shared Memory, which is hard to attribute, but is mostly accounted for in
|
|
the memory-infra trace.
|
|
* Binary size, which is currently not accounted for anywhere.
|
|
|
|
## Investigating Heap Dumps From the Wild
|
|
|
|
For a small set of Chrome users in the wild, Chrome will record and upload
|
|
anonymized heap dumps. This has the benefit of wider coverage for real code
|
|
paths, at the expense of reproducibility.
|
|
|
|
These heap dumps can take some time to grok, but frequently yield valuable
|
|
insight. At the time of this writing, heap dumps from the wild have resulted in
|
|
real, high impact bugs being found in Chrome code ~90% of the time.
|
|
|
|
For an example investigation of a real heap dump, see [this
|
|
link](/docs/memory/investigating_heap_dump_example.md).
|
|
|
|
* Raw heap dumps can be viewed in the trace viewer. [See detailed
|
|
instructions.](/docs/memory-infra/heap_profiler.md#how-to-manually-browse-a-heap-dump).
|
|
This interface surfaces all available information, but can be overwhelming and
|
|
is usually unnecessary for investigating heap dumps.
|
|
* Important note: Heap profiling in the field uses
|
|
[Poisson process sampling](https://bugs.chromium.org/p/chromium/issues/detail?id=810748)
|
|
with a rate parameter of 10000. This means that for large/frequent allocations
|
|
[e.g. >100 MB], the noise will be quite small [much less than 1%]. But
|
|
there is noise so counts will not be exact.
|
|
* The heap dump summary typically contains all information necessary to diagnose
|
|
a memory issue.
|
|
* The stack trace of the potential memory leak is almost always sufficient to
|
|
tell the type of object being leaked, since most functions in Chrome
|
|
have a limited number of calls to new and malloc.
|
|
* The next thing to do is to determine whether the memory usage is intentional.
|
|
Very rarely, components in Chrome legitimately need to use many 100s of MBs of
|
|
memory. In this case, it's important to create a
|
|
[MemoryDumpProvider](https://cs.chromium.org/chromium/src/base/trace_event/memory_dump_provider.h)
|
|
to report this memory usage, so that we have a better understanding of which
|
|
components are using a lot of memory. For an example, see
|
|
[Issue 813046](https://bugs.chromium.org/p/chromium/issues/detail?id=813046).
|
|
* Assuming the memory usage is not intentional, the next thing to do is to
|
|
figure out what is causing the memory leak.
|
|
* The most common cause is adding elements to a container with no limit.
|
|
Usually the code makes assumptions about how frequently it will be called
|
|
in the wild, and something breaks those assumptions. Or sometimes the code
|
|
to clear the container is not called as frequently as expected [or at
|
|
all]. [Example
|
|
1](https://bugs.chromium.org/p/chromium/issues/detail?id=798012). [Example
|
|
2](https://bugs.chromium.org/p/chromium/issues/detail?id=804440).
|
|
* Retain cycles for ref-counted objects.
|
|
[Example](https://bugs.chromium.org/p/chromium/issues/detail?id=814334#c23)
|
|
* Straight up leaks resulting from incorrect use of APIs. [Example
|
|
1](https://bugs.chromium.org/p/chromium/issues/detail?id=801702#c31).
|
|
[Example
|
|
2](https://bugs.chromium.org/p/chromium/issues/detail?id=814444#c17).
|
|
|
|
## Taking a Heap Dump
|
|
|
|
Navigate to chrome://flags and search for **memlog**. There are several options
|
|
that can be used to configure heap dumps. All of these options are also
|
|
available as command line flags, for automated test runs [e.g. telemetry].
|
|
|
|
* `#memlog` controls which processes are profiled. It's also possible to
|
|
manually specify the process via the interface at `chrome://memory-internals`.
|
|
* `#memlog-in-process` makes the profiling service to be run within the
|
|
Chrome browser process. Defaults to run the service as a separate dedicated
|
|
process.
|
|
* `#memlog-sampling-rate` specifies the sampling interval in bytes. The lower
|
|
the interval, the more precise is the profile. However it comes at the cost of
|
|
performance. Default value is 100KB, that is enough to observe allocation
|
|
sites that make allocations >500KB total, where total equals to a single
|
|
allocation size times the number of such allocations at the same call site.
|
|
* `#memlog-stack-mode` describes the type of metadata recorded for each
|
|
allocation. `native` stacks provide the most utility. The only time the other
|
|
options should be considered is for Android official builds, most of which do
|
|
not support `native` stacks.
|
|
|
|
Once the flags have been set appropriately, restart Chrome and take a
|
|
memory-infra trace. The results will have a heap dump.
|
|
|
|
## Investigating Memory Corruption
|
|
|
|
In case you can reproduce the corruption locally,
|
|
you are advised to run sanitizers (e.g.
|
|
[ASan](https://chromium.googlesource.com/chromium/src/+/HEAD/docs/asan.md))
|
|
to locate and fix UB.
|
|
|
|
Otherwise, you can look into
|
|
[minidump](https://sites.google.com/a/google.com/crash/users/how-to/manually-debug-a-minidump)
|
|
(link Googlers-only) if available.
|
|
|
|
### Known Memory Poisoning Patterns
|
|
|
|
Memory allocation goes through multiple states,
|
|
and its payload sometimes has a distinctive pattern.
|
|
You may also see some variance on lower bits, introduced by
|
|
e.g. an offset within `struct`.
|
|
|
|
#### Memory held by the OS
|
|
|
|
* All memory comes from the OS and returns back to the OS at some point.
|
|
* Access to memory that is already returned to the OS is likely a crash.
|
|
* Large allocations (>= ~1 MiB) tend to go back to the OS quickly when
|
|
freed, while smaller allocations are mostly reused.
|
|
|
|
#### Memory held by the allocator
|
|
|
|
* The allocator holds the memory region borrowed from the OS in a free-list.
|
|
* Payload and behavior are implementation-specific.
|
|
* In Chrome, we use
|
|
[PartitionAlloc](/base/allocator/partition_allocator/PartitionAlloc.md) as the
|
|
main allocator.
|
|
* We embed some data on payload and the original payload before `free()` may
|
|
or may not be overwritten.
|
|
* Writes to `free()`d memory may be caught as "free-list corruption".
|
|
* Following patterns can be written at this stage:
|
|
* `0xCDCDCDCDCDCDCDCD`: when allocation gets returned to PartitionAlloc.
|
|
* Shows up only in `PA_BUILDFLAG(EXPENSIVE_DCHECKS_ARE_ON)` builds.
|
|
|
|
#### Quarantined Memory
|
|
|
|
* Optionally, the allocator may keep `free()`d memory in quarantine
|
|
for a while before returning it into a free-list to detect and mitigate
|
|
UaF bugs.
|
|
* Following patterns can be written at this stage:
|
|
* `0xCDCDCDCDCDCDCDCD`: PartitionAlloc's `FreeFlags::kZap`.
|
|
* As of Aug. 2024 this is used by only [AMSC](https://docs.google.com/document/d/12OM0CSKgKv6NhM9YylSqAAXiV_f4uMgYgaH8KABUe-o/edit?usp=sharing).
|
|
* `0xEFEFEFEFEFEFEFEF`: In [BRP](https://chromium.googlesource.com/chromium/src/+/HEAD/base/memory/raw_ptr.md) quarantine.
|
|
* You are using a dangling pointer to access invalidated memory region.
|
|
* `0xEFED????????8000`: In [LUD](https://docs.google.com/document/d/1xfGa_IMtFZiQ3beOmkncEafODwn4U90ZyL4NfPaAtDY/edit?usp=sharing&resourcekey=0-89BZl1SVILB6ylOHula0IA) quarantine.
|
|
* (Googlers-only) You may have an access to `free()` stack trace on crashpad.
|
|
* `0xECEC????????8000`: In [E-LUD](https://docs.google.com/document/d/1_9TSOtQuPR3NjorLDjAkuloi8lYqblb6Ykt5nbVnh9I/edit?usp=sharing) quarantine.
|
|
|
|
|
|
#### Memory allocation you officially own
|
|
|
|
In principle, once initialized you should only see values written
|
|
by your code while your allocation is alive.
|
|
However, in rare case, you may see values from Write-after-Free.
|
|
|
|
```txt
|
|
void YourFunc() { | void TheirFunc() {
|
|
| int* p1 = new int;
|
|
| delete p1;
|
|
// The allocator may |
|
|
// redistribute `p1` to `p2` |
|
|
int* p2 = new int; |
|
|
*p2 = 123; |
|
|
| // Write-after-Free
|
|
| *p1 = 456;
|
|
// 456 may show up |
|
|
printf("%d\n", *p2); |
|
|
} | }
|
|
```
|
|
|
|
...or values from Double-Free.
|
|
|
|
```
|
|
void YourFunc() { | void TheirFunc() {
|
|
| int* p1 = new int;
|
|
| delete p1;
|
|
// The allocator may |
|
|
// redistribute `p1` to `p2` |
|
|
int* p2 = new int; |
|
|
*p2 = 123; |
|
|
| // Double-Free
|
|
| delete p1;
|
|
|
|
|
| // The allocator may
|
|
| // redistribute `p2` to `p3`
|
|
| int* p3 = new int;
|
|
| *p3 = 456;
|
|
// 456 may show up |
|
|
printf("%d\n", *p2); |
|
|
} | }
|
|
```
|
|
|
|
* Following patterns can be written at this stage:
|
|
* `0x0000000000000000`: [zero initialization](https://en.cppreference.com/w/cpp/language/zero_initialization).
|
|
* `0x0000000000000000`: PartitionAlloc's `AllocFlags::kZeroFill`.
|
|
* This payload is written as a part of memory allocation but requires
|
|
explicit opt-in e.g. `calloc()`.
|
|
- `0xABABABABABABABAB`: PartitionAlloc's newly allocated memory.
|
|
* Shows up only in `PA_BUILDFLAG(EXPENSIVE_DCHECKS_ARE_ON)` builds.
|
|
* MSan should be capable of catching this kind of reads to uninitialized
|
|
regions.
|
|
|
|
|
|
#### Memory allocation owned by someone else
|
|
|
|
You may see random values written by someone else
|
|
if you keep using pointers to `free()`d region.
|
|
|
|
```
|
|
void YourFunc() { | void TheirFunc() {
|
|
int* p1 = new int; |
|
|
*p1 = 123; |
|
|
delete p1; |
|
|
| // The allocator may
|
|
| // redistribute `p1` to `p2`
|
|
| int* p2 = new int;
|
|
| *p2 = 456;
|
|
// Use-after-Free; |
|
|
// 456 may show up |
|
|
printf("%d\n", *p1); |
|
|
} | }
|
|
```
|