
This is taking care of a long-standing TODO to move these OnceClosure holders rather than copy them around with their OnceClosure* members. This is a precursor to https://chromium-review.googlesource.com/c/chromium/src/+/3187153/35#message-fcc92e9f85e73f0e5ba6c03610a95cda8736f1f9 which highlighted a problem where some tests see a non-null MainFunctionParams::ui_task but running the closure results in a UAF. Logs show that the test hitting the UAF is not the one setting this field. This CL makes that impossible and fixes the issue in the follow-up CL. This CL is intended to be a logical no-op. This CL touches a lot of files and must happen all at once. The core change is that ContentMainParams and MainFunctionParams's moveable fields (ui_task, created_main_parts_closure, and startup_data) are now held by moveable types rather than raw pointers. This trickles in the following chain: main() (in various *_main.cc) (or SetUp() in !OS_ANDROID browser_test_base.cc) -> ContentMain() -> ContentMainRunnerImpl::Initialize() (forwards arg into MainFunctionParams) -> RunBrowser() -> BrowserMain() -> BrowserMainRunnerImpl::Initialize() -> BrowserMainLoop (stores MainFunctionParams) -> BrowserMainLoop::Init -> ContentBrowserClient::CreateBrowserMainParts() -> (Embedder)ContentBrowserClient::CreateBrowserMainParts() -> (Embedder)BrowserMainParts(Platform) -> (Embedder)BrowserMainParts -> RunOtherNamedProcessTypeMain() -> (Embedder)ContentMainDelegate::RunProcess() (or) -> FooMain() (kMainFunctions) (or) -> RunZygote() (creates its own MainFunctionParams) -> (Embedder)ContentMainDelegate::RunProcess() (on OS_ANDROID, browser_test_base.cc calls directly into ContentMainDelegate::RunProcess()) Few of these needed the params after passing them down so a move-only model was simple to adapt (even if invasive). The few exceptions like BrowserMainRunnerImpl::Initialize consuming |created_main_parts_closure| are better off in the new model (where they take the OnceClosure before passing down the params) because that prevents others down the chain from having access to a OnceClosure they shouldn't invoke anyways. Noteworthy: - ContentMainDelegate::RunProcess(): Returned an exit_code >= 0 to indicate the embedder elected to handle the run request given these params. With move-only semantics it is necessary to return the params back when the embedder declines handling this run request. An absl::variant return value is used to satisfy this requirement. - content/public/test/test_launcher.h : GetContentMainParams(): Becomes CopyContentMainParams() and only exposes a copy of copyable params. Uses new ContentMainParams::ShallowCopyForTesting() which verifies that moveable fields are still null by that time as should be the case in the order browser tests are initialized. - MainFunctionParams::command_line being const& violated the style-guide rule to "avoid defining functions that require a const reference parameter to outlive the call". This also prevented moving. The type was hence switched to a const CommandLine*. - BUILD.gn changes for nacl_helper_win_64 which requires static linking of its minimal //content deps (was previously missing a dep but was getting away with it because MainFunctionParams was .h only; required now with .cc). This was already done for static_switches and this CL adds static_main_function_params, reusing a similar static_features target that already existed but was no longer required in /c/nacl/broker, cleaning that up by replacing rather than copying that target's definition in this CL. - ContentMainParams::minimal_browser_mode was weirdly passed as a parameter to ContentMainRunner::Run(bool start_minimal_browser) but that method also has access to the ContentMainParams originally passed via ContentMainRunner::Init(). Passing the param again from Run() would be a use-after-move in content_main.cc, instead content_main_runner_impl.cc was updated to use the param it already has in store. Bug: 1175074 Change-Id: I3af90505525e426383c59107a3903d645d455682 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3244976 Commit-Queue: Gabriel Charette <gab@chromium.org> Auto-Submit: Gabriel Charette <gab@chromium.org> Reviewed-by: Alexander Timin <altimin@chromium.org> Reviewed-by: Brad Nelson <bradnelson@chromium.org> Reviewed-by: Scott Violet <sky@chromium.org> Reviewed-by: Clark DuVall <cduvall@chromium.org> Owners-Override: Alexander Timin <altimin@chromium.org> Cr-Commit-Position: refs/heads/main@{#940478}
302 lines
13 KiB
Markdown
302 lines
13 KiB
Markdown
# Example Investigation of a Heap Dump
|
|
|
|
This document describes the steps taken to investigate a real memory leak
|
|
discovered by heap profiling in the wild. For investigators less familiar with
|
|
the code base, `Navigating the Stack Trace` should be enough information to
|
|
determine the relevant component, and to forward the bug to a component OWNER.
|
|
|
|
## Understanding the heap dump summary
|
|
|
|
The opening comment of [Issue
|
|
834033](https://bugs.chromium.org/p/chromium/issues/detail?id=834033) contains a
|
|
heap dump summary. The highlights are:
|
|
|
|
* 315723 calls to malloc without corresponding call to free.
|
|
* 806MB of memory.
|
|
* The common stacktrace for all 315723 allocations.
|
|
|
|
Usually, anything that uses over 10MB of memory is a red flag. With the
|
|
exception of large image resources, most code in Chrome should use much less
|
|
than 10MB. Anything that has over 100k allocations is also a red flag.
|
|
|
|
### Navigating the Stack Trace - Detailed Breakdown
|
|
|
|
Let's take a look at the stack trace:
|
|
|
|
```
|
|
profiling::(anonymous namespace)::HookAlloc(base::allocator::AllocatorDispatch const*, unsigned long, void*)
|
|
base::allocator::MallocZoneFunctionsToReplaceDefault()::$_1::__invoke(_malloc_zone_t*, unsigned long)
|
|
<???>
|
|
<???>
|
|
base::allocator::UncheckedMallocMac(unsigned long, void**)
|
|
sk_malloc_flags(unsigned long, unsigned int)
|
|
SkMallocPixelRef::MakeAllocate(SkImageInfo const&, unsigned long)
|
|
SkBitmap::tryAllocPixels(SkImageInfo const&, unsigned long)
|
|
IPC::ParamTraits<SkBitmap>::Read(base::Pickle const*, base::PickleIterator*, SkBitmap*)
|
|
ExtensionAction::ParseIconFromCanvasDictionary(base::DictionaryValue const&, gfx::ImageSkia*)
|
|
extensions::ExtensionActionSetIconFunction::RunExtensionAction()
|
|
extensions::ExtensionActionFunction::Run()
|
|
ExtensionFunction::RunWithValidation()
|
|
extensions::ExtensionFunctionDispatcher::DispatchWithCallbackInternal(ExtensionHostMsg_Request_Params const&, content::RenderFrameHost*, int, base::RepeatingCallback<void (ExtensionFunction::ResponseType, base::ListValue const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, extensions::functions::HistogramValue)> const&)
|
|
extensions::ExtensionFunctionDispatcher::Dispatch(ExtensionHostMsg_Request_Params const&, content::RenderFrameHost*, int)
|
|
bool IPC::MessageT<ExtensionHostMsg_Request_Meta, std::__1::tuple<ExtensionHostMsg_Request_Params>, void>::Dispatch<extensions::ExtensionWebContentsObserver, extensions::ExtensionWebContentsObserver, content::RenderFrameHost, void (extensions::ExtensionWebContentsObserver::*)(content::RenderFrameHost*, ExtensionHostMsg_Request_Params const&)>(IPC::Message const*, extensions::ExtensionWebContentsObserver*, extensions::ExtensionWebContentsObserver*, content::RenderFrameHost*, void (extensions::ExtensionWebContentsObserver::*)(content::RenderFrameHost*, ExtensionHostMsg_Request_Params const&))
|
|
extensions::ExtensionWebContentsObserver::OnMessageReceived(IPC::Message const&, content::RenderFrameHost*)
|
|
extensions::ChromeExtensionWebContentsObserver::OnMessageReceived(IPC::Message const&, content::RenderFrameHost*)
|
|
content::WebContentsImpl::OnMessageReceived(content::RenderFrameHostImpl*, IPC::Message const&)
|
|
content::RenderFrameHostImpl::OnMessageReceived(IPC::Message const&)
|
|
IPC::ChannelProxy::Context::OnDispatchMessage(IPC::Message const&)
|
|
base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*)
|
|
base::MessageLoop::RunTask(base::PendingTask*)
|
|
base::MessageLoop::DoWork()
|
|
base::MessagePumpCFRunLoopBase::RunWork()
|
|
base::mac::CallWithEHFrame(void () block_pointer)
|
|
base::MessagePumpCFRunLoopBase::RunWorkSource(void*)
|
|
<???>
|
|
<???>
|
|
<???>
|
|
<???>
|
|
<???>
|
|
<???>
|
|
<???>
|
|
<???>
|
|
<???>
|
|
__71-[BrowserCrApplication nextEventMatchingMask:untilDate:inMode:dequeue:]_block_invoke
|
|
base::mac::CallWithEHFrame(void () block_pointer)
|
|
-[BrowserCrApplication nextEventMatchingMask:untilDate:inMode:dequeue:]
|
|
<???>
|
|
base::MessagePumpNSApplication::DoRun(base::MessagePump::Delegate*)
|
|
base::MessagePumpCFRunLoopBase::Run(base::MessagePump::Delegate*)
|
|
<name omitted>
|
|
ChromeBrowserMainParts::MainMessageLoopRun(int*)
|
|
content::BrowserMainLoop::RunMainMessageLoopParts()
|
|
content::BrowserMainRunnerImpl::Run()
|
|
content::BrowserMain(content::MainFunctionParams)
|
|
content::ContentMainRunnerImpl::Run()
|
|
service_manager::Main(service_manager::MainParams const&)
|
|
content::ContentMain(content::ContentMainParams const&)
|
|
ChromeMain
|
|
main
|
|
<???>
|
|
```
|
|
|
|
The first step is to divide the stack trace into smaller segments to get a
|
|
better understanding of what's happening at the time of allocations. The best
|
|
way to do this is to segment by name space and/or function prefixes.
|
|
|
|
```
|
|
profiling::(anonymous namespace)::HookAlloc(base::allocator::AllocatorDispatch const*, unsigned long, void*)
|
|
base::allocator::MallocZoneFunctionsToReplaceDefault()::$_1::__invoke(_malloc_zone_t*, unsigned long)
|
|
<???>
|
|
<???>
|
|
base::allocator::UncheckedMallocMac(unsigned long, void**)
|
|
```
|
|
|
|
The top of each stack will always contain some `base` and/or `profiling`
|
|
code. This is the code responsible for allocating and recording the memory.
|
|
|
|
```
|
|
sk_malloc_flags(unsigned long, unsigned int)
|
|
SkMallocPixelRef::MakeAllocate(SkImageInfo const&, unsigned long)
|
|
SkBitmap::tryAllocPixels(SkImageInfo const&, unsigned long)
|
|
```
|
|
|
|
Next, we three 3 frames with the prefix `sk`. Searching for
|
|
`sk_malloc_flags` on
|
|
[codesearch](https://cs.chromium.org/search/?q=sk_malloc_flags&sq=package:chromium&type=cs)
|
|
reveals that the component is `third_party/skia`. Looking at the
|
|
[README](https://cs.chromium.org/chromium/src/third_party/skia/README) reveals
|
|
that Skia is a 2D graphics library.
|
|
|
|
```
|
|
IPC::ParamTraits<SkBitmap>::Read(base::Pickle const*, base::PickleIterator*, SkBitmap*)
|
|
```
|
|
|
|
Next we see a templated function called `Read` in the namespace `IPC`.
|
|
`IPC` stands for inter-process communication. This suggests that the
|
|
function is responsible for reading an IPC Message, perhaps concerning an
|
|
`SkBitmap`.
|
|
|
|
```
|
|
ExtensionAction::ParseIconFromCanvasDictionary(base::DictionaryValue const&, gfx::ImageSkia*)
|
|
extensions::ExtensionActionSetIconFunction::RunExtensionAction()
|
|
extensions::ExtensionActionFunction::Run()
|
|
ExtensionFunction::RunWithValidation()
|
|
extensions::ExtensionFunctionDispatcher::DispatchWithCallbackInternal(ExtensionHostMsg_Request_Params const&, content::RenderFrameHost*, int, base::RepeatingCallback<void (ExtensionFunction::ResponseType, base::ListValue const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, extensions::functions::HistogramValue)> const&)
|
|
extensions::ExtensionFunctionDispatcher::Dispatch(ExtensionHostMsg_Request_Params const&, content::RenderFrameHost*, int)
|
|
bool IPC::MessageT<ExtensionHostMsg_Request_Meta, std::__1::tuple<ExtensionHostMsg_Request_Params>, void>::Dispatch<extensions::ExtensionWebContentsObserver, extensions::ExtensionWebContentsObserver, content::RenderFrameHost, void (extensions::ExtensionWebContentsObserver::*)(content::RenderFrameHost*, ExtensionHostMsg_Request_Params const&)>(IPC::Message const*, extensions::ExtensionWebContentsObserver*, extensions::ExtensionWebContentsObserver*, content::RenderFrameHost*, void (extensions::ExtensionWebContentsObserver::*)(content::RenderFrameHost*, ExtensionHostMsg_Request_Params const&))
|
|
extensions::ExtensionWebContentsObserver::OnMessageReceived(IPC::Message const&, content::RenderFrameHost*)
|
|
extensions::ChromeExtensionWebContentsObserver::OnMessageReceived(IPC::Message const&, content::RenderFrameHost*)
|
|
```
|
|
|
|
Next, we see many frames with the `extension` prefix. Extensions are exactly
|
|
what they sound like - Chrome extensions like AdBlock are used to modify the
|
|
behavior of the browser.
|
|
|
|
```
|
|
content::WebContentsImpl::OnMessageReceived(content::RenderFrameHostImpl*, IPC::Message const&)
|
|
content::RenderFrameHostImpl::OnMessageReceived(IPC::Message const&)
|
|
```
|
|
|
|
`content` is the name of code that glues together web code [like extensions] and
|
|
the rest of Chrome.
|
|
|
|
```
|
|
IPC::ChannelProxy::Context::OnDispatchMessage(IPC::Message const&)
|
|
```
|
|
|
|
More `IPC` code.
|
|
|
|
```
|
|
base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*)
|
|
base::MessageLoop::RunTask(base::PendingTask*)
|
|
base::MessageLoop::DoWork()
|
|
base::MessagePumpCFRunLoopBase::RunWork()
|
|
base::mac::CallWithEHFrame(void () block_pointer)
|
|
base::MessagePumpCFRunLoopBase::RunWorkSource(void*)
|
|
```
|
|
|
|
More `base` code. The bottom of most stack traces should go back to
|
|
`MessageLoop`, a primitive Chrome construct used to run tasks.
|
|
|
|
### Navigating the Stack Trace - Summary
|
|
|
|
* The top and bottom of the stack should generally be the same and are not very
|
|
interesting.
|
|
* The prefixes of frames can be used to get a rough idea of the components
|
|
involved.
|
|
* Function names can be used to get a rough idea of what's going on.
|
|
|
|
In this case, extension code is calling `ParseIconFromCanvasDictionary` - so
|
|
it's probably trying to parse an icon. This calls into Skia code. Given that
|
|
Skia is a 2D drawing library, and the function is `tryAllocPixels`, Skia is
|
|
allocating some pixels for the icon. This process is being repeated 315 thousand
|
|
times, and the icon is being leaked every time.
|
|
|
|
|
|
## Diving into the code
|
|
|
|
Now that we have a rough idea of what's happening, let's look at the code for
|
|
ParseIconFromCanvasDictionary.
|
|
|
|
```cpp
|
|
bool ExtensionAction::ParseIconFromCanvasDictionary(
|
|
const base::DictionaryValue& dict,
|
|
gfx::ImageSkia* icon) {
|
|
for (base::DictionaryValue::Iterator iter(dict); !iter.IsAtEnd();
|
|
iter.Advance()) {
|
|
std::string binary_string64;
|
|
IPC::Message pickle;
|
|
if (iter.value().is_blob()) {
|
|
pickle = IPC::Message(iter.value().GetBlob().data(),
|
|
iter.value().GetBlob().size());
|
|
} else if (iter.value().GetAsString(&binary_string64)) {
|
|
std::string binary_string;
|
|
if (!base::Base64Decode(binary_string64, &binary_string))
|
|
return false;
|
|
pickle = IPC::Message(binary_string.c_str(), binary_string.length());
|
|
} else {
|
|
continue;
|
|
}
|
|
base::PickleIterator pickle_iter(pickle);
|
|
SkBitmap bitmap;
|
|
if (!IPC::ReadParam(&pickle, &pickle_iter, &bitmap))
|
|
return false;
|
|
CHECK(!bitmap.isNull());
|
|
|
|
// Chrome helpfully scales the provided icon(s), but let's not go overboard.
|
|
const int kActionIconMaxSize = 10 * ActionIconSize();
|
|
if (bitmap.drawsNothing() || bitmap.width() > kActionIconMaxSize)
|
|
continue;
|
|
|
|
float scale = static_cast<float>(bitmap.width()) / ActionIconSize();
|
|
icon->AddRepresentation(gfx::ImageSkiaRep(bitmap, scale));
|
|
}
|
|
return true;
|
|
}
|
|
```
|
|
|
|
There's a lot going on here, but we can use the information we have to focus.
|
|
The leak happens in IPC::ReadParam, so the relevant lines are:
|
|
|
|
```
|
|
SkBitmap bitmap;
|
|
if (!IPC::ReadParam(&pickle, &pickle_iter, &bitmap))
|
|
return false;
|
|
```
|
|
|
|
The `IPC` message is being decoded into `bitmap`.
|
|
|
|
```
|
|
icon->AddRepresentation(gfx::ImageSkiaRep(bitmap, scale));
|
|
```
|
|
Looking at subsequent consumers of `bitmap`, we see that it is being added as a
|
|
representation to `icon`. `icon` is an output parameter of this function, so we
|
|
have to look at the calling frame,
|
|
`ExtensionActionSetIconFunction::RunExtensionAction`.
|
|
|
|
```
|
|
ExtensionFunction::ResponseAction
|
|
ExtensionActionSetIconFunction::RunExtensionAction() {
|
|
...
|
|
EXTENSION_FUNCTION_VALIDATE(
|
|
ExtensionAction::ParseIconFromCanvasDictionary(*canvas_set, &icon));
|
|
|
|
if (icon.isNull())
|
|
return RespondNow(Error("Icon invalid."));
|
|
|
|
extension_action_->SetIcon(tab_id_, gfx::Image(icon));
|
|
...
|
|
}
|
|
```
|
|
|
|
In this case, I've already focused on the code that calls
|
|
`ParseIconFromCanvasDictionary`. Let's look at `SetIcon`.
|
|
|
|
```
|
|
void ExtensionAction::SetIcon(int tab_id, const gfx::Image& image) {
|
|
SetValue(&icon_, tab_id, image);
|
|
}
|
|
```
|
|
|
|
```
|
|
template<class T>
|
|
void SetValue(std::map<int, T>* map, int tab_id, const T& val) {
|
|
(*map)[tab_id] = val;
|
|
}
|
|
```
|
|
|
|
The icon is being added to a map `icon_`, with `tab_id` as the key. Ah ha!
|
|
Adding elements to a container [and never removing them] is one of the most
|
|
common sources of memory issues.
|
|
|
|
There are two ways for this memory to be released - the container `icon_` can be
|
|
destroyed, or the element can be removed from the container.
|
|
|
|
`icon_` is a member of `ExtensionAction`, whose documentation reads:
|
|
```
|
|
// ExtensionAction encapsulates the state of a browser action or page action.
|
|
// Instances can have both global and per-tab state. If a property does not have
|
|
// a per-tab value, the global value is used instead.
|
|
```
|
|
|
|
This suggests that the lifetime of `icon_` is tied to the lifetime of the
|
|
ExtensionAction, which we can guess is tied to the lifetime of the Extension. As
|
|
long as the extension stays installed and enabled, `icon_` will not be
|
|
destroyed.
|
|
|
|
Next, we use codesearch to look at all code that removes elements from `icon_`.
|
|
The only place that performs removal is
|
|
|
|
```
|
|
void ExtensionAction::ClearAllValuesForTab(int tab_id) {
|
|
...
|
|
icon_.erase(tab_id);
|
|
...
|
|
}
|
|
```
|
|
|
|
This is called by `ExtensionActionAPI::ClearAllValuesForTab`, which is called by
|
|
`TabHelper::DidFinishNavigation`. The name of this method suggests that each
|
|
time a tab is navigated, the previous tab-specific icon is cleared. However,
|
|
that means that if a tab is closed, then the icon is leaked forever.
|