0
Files
src/headless
Johannes Henkel c29bddc20d [DevTools] Roll inspector_protocol (Chromium)
Upstream PR:
Introduce a crdtp/dispatch.{h,cc} library.
https://chromium-review.googlesource.com/c/deps/inspector_protocol/+/1974680

New Rev: 8c2064ea99cffdb5647760ad3953c43e612e07fc

Notable downstream changes:
- ChromeDevToolsManagerDelegate::HandleCommand no longer carries
  a method. Reason being, the shallow parser (crdtp::Dispatchable)
  should be efficient enough to parse an incoming message a couple
  of times, e.g. once for the content layer and once for the
  embedder, and if we felt differently then we'd quite possibly
  want to carry more than just the method - e.g., we'd want to also
  pass the params. Anyway, for now simplifying this interface.
- crdtp::FrontendChannel::FallThrough; here, it's advantagueous
  to keep the method around, but now it's a crdtp::span. This is
  much better than const std::string& because the generated code
  knows exactly which method is going to fall through and we
  can pass a C++ string literal via this span.
- The crdtp/dispatch library presents a somewhat different surface
  between the UberDispatcher and the session implementations.
  The session implementations are responsible for creating a
  crdtp::Dispatchable instance (the shallow parser) which
  it then hands to the dispatcher for dispatching. Rather than
  querying for whether it can dispatch and then doing it,
  the result of the Dispatch indicates whether a method was
  found and can be executed. There's no more need to instantiate
  protocol::Value in a devtools session class.
- Since the dispatch library uses crdtp::span to represent
  method names, we no longer need to reference platform specific
  routines for finding strings and making substrings. As a result,
  v8_inspector_string.h is losing a few more methods
  (and same for base_string_adapter_h.template).
- crdtp::DispatchResponse (also known as protocol::Response) has
  some renames for consistency (
  Response::Error -> Response::SeverError,
  Response::OK -> Response::Success). Touches all domain handlers
  but is mechanical.
- All protocol error messages, such as the parameters passed
  to DispatchResponse::ServerError are required to be UTF8 strings,
  even for Blink, so when code generates them as WTF::String,
  we convert them to std::string using WTF::String::Utf8()
  before passing them to DispatchResponse::ServerError.
- We're better about checking messages and sending errors when we
  can't parse them - esp. we no longer drop messages on the floor
  if we can't make sense of them; the LOG statements are gone,
  because we can either send an error or assume that we've
  previously parsed the message elsewhere and put a DCHECK (e.g.,
  a message received by blink has always been shallow-parsed by the
  browser before).
- DevToolsAgentHost::DispatchProtocolMessage no longer has a boolean
  return value. Reason being, it's not well defined what the
  boolean should indicate. If we reject the message and send
  an error, arguably we've also handled it. And of course, we
  always do that (now). So it's not useful to generate or check
  this return value, and there is only one DCHECK on it currently,
  which is also not covered by tests.
- content::DevToolsSession uses binary searches to match method names
  (e.g. in ShouldSendOnIO).
  This is because I've switched the method names to span, and
  implemented similar searches for the dispatching library, so
  it's best to be consistent. It will also scale better if we add
  more methods.
- The additional unittests added to the CRDTP library upstream cover
  shallow parsing, dispatching, etc. and are now also part of
  the content_unittests.
- Improves Android binary size by about 36k, by reducing code
  duplication. There's now just one UberDispatcher, for example,
  as opposed to one for blink, one for content, one for headless,
  one for chromium, ...
- Speeds up execution by 1-2% (based on internal measurements),
  because message serialization no longer takes a detour via
  protocol::Value.

Change-Id: I422fe527d6f8a6ffb098b3992728ecba408b571f
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2047966
Reviewed-by: Dmitry Gozman <dgozman@chromium.org>
Reviewed-by: Ganggui Tang <gogerald@chromium.org>
Reviewed-by: Andrey Kosyakov <caseq@chromium.org>
Reviewed-by: Leonard Grey <lgrey@chromium.org>
Commit-Queue: Johannes Henkel <johannes@chromium.org>
Cr-Commit-Position: refs/heads/master@{#750284}
2020-03-13 20:04:15 +00:00
..
2020-01-14 20:58:52 +00:00

Headless Chromium

Headless Chromium allows running Chromium in a headless/server environment. Expected use cases include loading web pages, extracting metadata (e.g., the DOM) and generating bitmaps from page contents -- using all the modern web platform features provided by Chromium and Blink.

There are two ways to use Headless Chromium:

Usage via the DevTools remote debugging protocol

  1. Start a normal Chrome binary with the --headless command line flag (Linux-only for now):
$ chrome --headless --remote-debugging-port=9222 https://chromium.org

Currently you'll also need to use --disable-gpu to avoid an error from a missing Mesa library.

  1. Navigate to http://localhost:9222 in another browser to open the DevTools interface or use a tool such as Selenium to drive the headless browser.

Usage from Node.js

For example, the chrome-remote-interface Node.js package can be used to extract a page's DOM like this:

const CDP = require('chrome-remote-interface');

CDP((client) => {
  // Extract used DevTools domains.
  const {Page, Runtime} = client;

  // Enable events on domains we are interested in.
  Promise.all([
    Page.enable()
  ]).then(() => {
    return Page.navigate({url: 'https://example.com'});
  });

  // Evaluate outerHTML after page has loaded.
  Page.loadEventFired(() => {
    Runtime.evaluate({expression: 'document.body.outerHTML'}).then((result) => {
      console.log(result.result.value);
      client.close();
    });
  });
}).on('error', (err) => {
  console.error('Cannot connect to browser:', err);
});

Usage as a C++ library

Headless Chromium can be built as a library for embedding into a C++ application. This approach is otherwise similar to controlling the browser over a DevTools connection, but it provides more customization points, e.g., for networking and mojo services.

Headless Example is a small sample application which demonstrates the use of the headless C++ API. It loads a web page and outputs the resulting DOM. To run it, first initialize a headless build configuration:

$ mkdir -p out/Debug
$ echo 'import("//build/args/headless.gn")' > out/Debug/args.gn
$ gn gen out/Debug

Then build the example:

$ ninja -C out/Debug headless_example

After the build completes, the example can be run with the following command:

$ out/Debug/headless_example https://www.google.com

Headless Shell is a more capable headless application. For instance, it supports remote debugging with the DevTools protocol. To do this, start the application with an argument specifying the debugging port:

$ ninja -C out/Debug headless_shell
$ out/Debug/headless_shell --remote-debugging-port=9222 https://youtube.com

Then navigate to http://localhost:9222 with your browser.

Embedder API

The embedder API allows developers to integrate the headless library into their application. The API provides default implementations for low level adaptation points such as networking and the run loop.

The main embedder API classes are:

  • HeadlessBrowser::Options::Builder - Defines the embedding options, e.g.:
    • SetMessagePump - Replaces the default base message pump. See base::MessagePump.
    • SetProxyServer - Configures an HTTP/HTTPS proxy server to be used for accessing the network.

Client/DevTools API

The headless client API is used to drive the browser and interact with loaded web pages. Its main classes are:

  • HeadlessBrowser - Represents the global headless browser instance.
  • HeadlessWebContents - Represents a single "tab" within the browser.
  • HeadlessDevToolsClient - Provides a C++ interface for inspecting and controlling a tab. The API functions corresponds to DevTools commands. See the client API documentation for more information.

Resources and Documentation

Mailing list: headless-dev@chromium.org

Bug tracker: Internals>Headless

File a new bug (bit.ly/2pP6SBb)