0
Files
src/headless
Yoshifumi Inoue 72d438ee55 Implement Element#innerText to conform the spec
NOT READY FOR COMMIT
To commit this patch, I need to do:
 - Rebaseline 9500+ layout files for linux, mac, win, since existing
 implementation doesn't conform the spec[1]
 - Fix DOM distiller bug[3]
  It depends on textContent(true) to have newline for <br>
  https://github.com/chromium/dom-distiller/issues/10
  WebTextTest.testGenerateOutputBRElements should have spec complaint test
  expectation
 - Fix CrSettingsSiteDetailsPermissionTest.All
   change expectations to have <option>s


This patch implements Element#innerText to conform the spec[1].
Pass rate of WPT is changed from 78 failures to 6 failures for 213 test cases.

The design doc is https://goo.gl/VW9xxe.

The differences of current implementations are:
 - No more leading/training newlines
 - No more trailing whitespaces
 - At most two newlines between sequences of <p> and <div>.
 - Contents of <select>, <optgroup> and <option> in result.
 - No newline for <br> for disconnected element.

Note: Handling of <select>, <optgroup> and <option> aren't conformed with the
spec[1] since the spec[1] requires to implement Element#innerText specific
CSS handling, ::first-line, ::first-letter, text-transform etc, for contents
of <option>. I filed the issue[2].


[1] https://html.spec.whatwg.org/multipage/dom.html#the-innertext-idl-attribute
[2] https://github.com/whatwg/html/issues/3797 innerText for <select>, <optgroup> and <option>

TBR=alexmos@chromium.org
TBR=dpapad@chromium.org
TBR=dmazzoni@chromium.org
TBR=skyostil@chromium.org

Bug: 651764, 859410
Cq-Include-Trybots: luci.chromium.try:linux_layout_tests_layout_ng
Change-Id: I48a02db0347d8ebd189f3ef608b31a4a93d89e84
Reviewed-on: https://chromium-review.googlesource.com/1114673
Commit-Queue: Yoshifumi Inoue <yosin@chromium.org>
Reviewed-by: Sami Kyöstilä <skyostil@chromium.org>
Reviewed-by: Yoshifumi Inoue <yosin@chromium.org>
Reviewed-by: Kent Tamura <tkent@chromium.org>
Reviewed-by: Yoichi Osato <yoichio@chromium.org>
Cr-Commit-Position: refs/heads/master@{#585756}
2018-08-24 08:17:29 +00:00
..
2018-08-01 02:09:56 +00:00
2018-07-10 00:49:22 +00:00
2018-07-24 18:12:09 +00:00

Headless Chromium

Headless Chromium allows running Chromium in a headless/server environment. Expected use cases include loading web pages, extracting metadata (e.g., the DOM) and generating bitmaps from page contents -- using all the modern web platform features provided by Chromium and Blink.

There are two ways to use Headless Chromium:

Usage via the DevTools remote debugging protocol

  1. Start a normal Chrome binary with the --headless command line flag (Linux-only for now):
$ chrome --headless --remote-debugging-port=9222 https://chromium.org

Currently you'll also need to use --disable-gpu to avoid an error from a missing Mesa library.

  1. Navigate to http://localhost:9222 in another browser to open the DevTools interface or use a tool such as Selenium to drive the headless browser.

Usage from Node.js

For example, the chrome-remote-interface Node.js package can be used to extract a page's DOM like this:

const CDP = require('chrome-remote-interface');

CDP((client) => {
  // Extract used DevTools domains.
  const {Page, Runtime} = client;

  // Enable events on domains we are interested in.
  Promise.all([
    Page.enable()
  ]).then(() => {
    return Page.navigate({url: 'https://example.com'});
  });

  // Evaluate outerHTML after page has loaded.
  Page.loadEventFired(() => {
    Runtime.evaluate({expression: 'document.body.outerHTML'}).then((result) => {
      console.log(result.result.value);
      client.close();
    });
  });
}).on('error', (err) => {
  console.error('Cannot connect to browser:', err);
});

Usage as a C++ library

Headless Chromium can be built as a library for embedding into a C++ application. This approach is otherwise similar to controlling the browser over a DevTools connection, but it provides more customization points, e.g., for networking and mojo services.

Headless Example is a small sample application which demonstrates the use of the headless C++ API. It loads a web page and outputs the resulting DOM. To run it, first initialize a headless build configuration:

$ mkdir -p out/Debug
$ echo 'import("//build/args/headless.gn")' > out/Debug/args.gn
$ gn gen out/Debug

Then build the example:

$ ninja -C out/Debug headless_example

After the build completes, the example can be run with the following command:

$ out/Debug/headless_example https://www.google.com

Headless Shell is a more capable headless application. For instance, it supports remote debugging with the DevTools protocol. To do this, start the application with an argument specifying the debugging port:

$ ninja -C out/Debug headless_shell
$ out/Debug/headless_shell --remote-debugging-port=9222 https://youtube.com

Then navigate to http://localhost:9222 with your browser.

Embedder API

The embedder API allows developers to integrate the headless library into their application. The API provides default implementations for low level adaptation points such as networking and the run loop.

The main embedder API classes are:

  • HeadlessBrowser::Options::Builder - Defines the embedding options, e.g.:
    • SetMessagePump - Replaces the default base message pump. See base::MessagePump.
    • SetProxyServer - Configures an HTTP/HTTPS proxy server to be used for accessing the network.

Client/DevTools API

The headless client API is used to drive the browser and interact with loaded web pages. Its main classes are:

  • HeadlessBrowser - Represents the global headless browser instance.
  • HeadlessWebContents - Represents a single "tab" within the browser.
  • HeadlessDevToolsClient - Provides a C++ interface for inspecting and controlling a tab. The API functions corresponds to DevTools commands. See the client API documentation for more information.

Resources and Documentation

Mailing list: headless-dev@chromium.org

Bug tracker: Internals>Headless

File a new bug (bit.ly/2pP6SBb)