
The problem being solved here is that, although various web platform features can cause documents to be placed in opaque origins, sometimes doing so obscures the actual source of the documents, which itself can be a security risk. "data:" URLs, "srcdoc" plus "sandbox" are particular tricky cases of this, as neither the URL nor the committed origin retains information about which network host the content is originally from. This CL is the first step towards solving this problem by keeping that information around in url::Origin. It is just the url::Origin changes from nick@'s work on precursor origins started in https://crrev.com/c/1028985. The precursor information must be used carefully. Opaque origins should generally not inherit privileges from the origins they derive from. However, in some cases (such as restrictions on process placement, or determining the http lock icon, or determining content script injection) this information may be relevant to ensure that entering an opaque origin does not grant privileges initially denied to the original non-opaque origin. This new tracking is transitive: meaning if a page loaded from http://example.com navigates to a data URL, which then navigates to a blob:null URL, which embeds an <iframe sandbox srcdoc="...">, the precursor origin for the sandboxed iframe is retained to be "http://example.com". Bug: 882053 Cq-Include-Trybots: luci.chromium.try:linux_mojo Change-Id: I021245c624b78f08bd835c5cae9fde7ec5e44b80 Reviewed-on: https://chromium-review.googlesource.com/1214745 Commit-Queue: Nasko Oskov <nasko@chromium.org> Reviewed-by: Ian Clelland <iclelland@chromium.org> Reviewed-by: Luna Lu <loonybear@chromium.org> Reviewed-by: Daniel Cheng <dcheng@chromium.org> Reviewed-by: David Benjamin <davidben@chromium.org> Reviewed-by: Tom Sepez <tsepez@chromium.org> Cr-Commit-Position: refs/heads/master@{#591867}
Chrome's URL library
Layers
There are several conceptual layers in this directory. Going from the lowest level up, they are:
Parsing
The url_parse.*
files are the parser. This code does no string
transformations. Its only job is to take an input string and splits out the
components of the URL as best as it can deduce them, for a given type of URL.
Parsing can never fail, it will take its best guess. This layer does not
have logic for determining the type of URL parsing to apply, that needs to
be applied at a higher layer (the "util" layer below).
Because the parser code is derived (very distantly) from some code in
Mozilla, some of the parser files are in url/third_party/mozilla/
.
The main header to include for calling the parser is
url/third_party/mozilla/url_parse.h
.
Canonicalization
The url_canon*
files are the canonicalizer. This code will transform specific
URL components or specific types of URLs into a standard form. For some
dangerous or invalid data, the canonicalizer will report that a URL is invalid,
although it will always try its best to produce output (so the calling code
can, for example, show the user an error that the URL is invalid). The
canonicalizer attempts to provide as consistent a representation as possible
without changing the meaning of a URL.
The canonicalizer layer is designed to be independent of the string type of
the embedder, so all string output is done through a CanonOutput
wrapper
object. An implementation for std::string
output is provided in
url_canon_stdstring.h
.
The main header to include for calling the canonicalizer is
url/url_canon.h
.
Utility
The url_util*
files provide a higher-level wrapper around the parser and
canonicalizer. While it can be called directly, it is designed to be the
foundation for writing URL wrapper objects (The GURL later and Blink's KURL
object use the Utility layer to implement the low-level logic).
The Utility code makes decisions about URL types and calls the correct parsing and canonicalzation functions for those types. It provides an interface to register application-specific schemes that have specific requirements. Sharing this loigic between KURL and GURL is important so that URLs are handled consistently across the application.
The main header to include is url/url_util.h
.
GURL and Origin
At the highest layer, a C++ object for representing URLs is provided. This
object uses STL. Most uses need only this layer. Include url/gurl.h
.
Also at this layer is also the Origin object which exists to make security
decisions on the web. Include url/origin.h
.
Historical background
This code was originally a separate library that was designed to be embedded into both Chrome (which uses STL) and WebKit (which didn't use any STL at the time). As a result, the parsing, canonicalization, and utility code could not use STL, or any other common code in Chromium like base.
When WebKit was forked into the Chromium repo and renamed Blink, this restriction has been relaxed somewhat. Blink still provides its own URL object using its own string type, so the insulation that the Utility layer provides is still useful. But some STL strings and calls to base functions have gradually been added in places where doing so is possible.