Add more topics to accessiiblity documentation.

Thanks to Elly for writing a great first draft of
technical documentation. I took a stab at introducing
a lot of the key higher-level concepts that provide
important context for understanding the rest of the
code.

BUG=none
NOTRY=true

Review-Url: https://codereview.chromium.org/2478083002
Cr-Commit-Position: refs/heads/master@{#430033}

This commit is contained in:

dmazzoni

2016-11-04 15:17:06 -07:00

committed by

Commit bot

parent 01538e07d5

commit c6ff030caf

1 changed files with 384 additions and 27 deletions

									
										411

docs/accessibility.md
									
				@ -1,55 +1,412 @@

				# Accessibility Overview

				This document describes how accessibility is implemented throughout Chromium at

				a high level.

				Accessibility means ensuring that all users, including users with disabilities,

				have equal access to software. One piece of this involves basic design

				principles such as using appropriate font sizes and color contrast,

				avoiding using color to convey important information, and providing keyboard

				alternatives for anything that is normally accomplished with a pointing device.

				However, when you see the word "accessibility" in a directory name in Chromium,

				that code's purpose is to provide full access to Chromium's UI via external

				accessibility APIs that are utilized by assistive technology.

				**Assistive technology** here refers to software or hardware which

				makes use of these APIs to create an alternative interface for the user to

				accommodate some specific needs, for example:

				Assistive technology includes:

				* Screen readers for blind users that describe the screen using

				  synthesized speech or braille

				* Voice control applications that let you speak to the computer,

				* Switch access that lets you control the computer with a small number

				  of physical switches,

				* Magnifiers that magnify a portion of the screen, and often highlight the

				  cursor and caret for easier viewing, and

				* Assistive learning and literacy software that helps users who have a hard

				  time reading print, by highlighting and/or speaking selected text

				In addition, because accessibility APIs provide a convenient and universal

				way to explore and control applications, they're often used for automated

				testing scripts, and UI automation software like password managers.

				Web browsers play an important role in this ecosystem because they need

				to not only provide access to their own UI, but also provide access to

				all of the content of the web.

				Each operating system has its own native accessibility API. While the

				core APIs tend to be well-documented, it's unfortunately common for

				screen readers in particular to depend on additional undocumented or

				vendor-specific APIs in order to fully function, especially with web

				browsers, because the standard APIs are insufficient to handle the

				complexity of the web.

				Chromium needs to support all of these operating system and

				vendor-specific accessibility APIs in order to be usable with the full

				ecosystem of assistive technology on all platforms. Just like Chromium

				sometimes mimics the quirks and bugs of older browsers, Chromium often

				needs to mimic the quirks and bugs of other browsers' implementation

				of accessibility APIs, too.

				## Concepts

				The three central concepts of accessibility are:

				While each operating system and vendor accessibility API is different,

				there are some concepts all of them share.

				1. The *tree*, which models the entire interface as a tree of objects, exposed

				   to screenreaders or other accessibility software;

				2. *Events*, which let accessibility software know that a part of the tree has

				   to assistive technology via accessibility APIs;

				2. *Events*, which let assistive technology know that a part of the tree has

				   changed somehow;

				3. *Actions*, which come from accessibility software and ask the interface to

				3. *Actions*, which come from assistive technology and ask the interface to

				   change.

				Here's an example of an accessibility tree looks like. The following HTML:

				Consider the following small HTML file:

				```

				<select title="Select A">

				  <option value="1">Option 1</option>

				  <option value="2" selected>Option 2</option>

				  <option value="3">Option 3</option>

				</select>

				<html>

				<head>

				  <title>How old are you?</title>

				</head>

				<body>

				  <label for="age">Age</label>

				  <input id="age" type="number" name="age" value="42">

				  <div>

				    <button>Back</button>

				    <button>Next</button>

				  </div>

				</body>

				</html>

				```

				has a generated accessibility tree like this:

				### The Accessibility Tree and Accessibility Attributes

				Internally, Chromium represents the accessibility tree for that web page

				using a data structure something like this:

				```

				0: AXMenuList title="Select A"

				1:   AXMenuListOption title="Option 1"

				2:   AXMenuListOption title="Option 2" selected

				3:   AXMenuListOption title="Option 3"

				id=1 role=WebArea name="How old are you?"

				    id=2 role=Label name="Age"

				    id=3 role=TextField labelledByIds=[2] value="42"

				    id=4 role=Group

				        id=5 role=Button name="Back"

				        id=6 role=Button name="Next"

				```

				Given that accessibility tree, an example of the events generated when selecting

				"Option 1" might be:

				Note that the tree structure closely resembles the structure of the

				HTML elements, but slightly simplified. Each node in the accessibility

				tree has an ID and a role. Many have a name. The text field has a value,

				and instead of a name it has labelledByIds, which indicates that its

				accessible name comes from another node in the tree, the label node

				with id=2.

				On a particular platform, each node in the accessibility tree is implemented

				by an object that conforms to a particular protocol.

				On Windows, the root node implements the IAccessible protocol and

				if you call IAccessible::get_accRole, it returns ROLE_SYSTEM_DOCUMENT,

				and if you call IAccessible::get_accName, it returns "How old are you?".

				Other methods let you walk the tree.

				On macOS, the root node implements the NSAccessibility protocol and

				if you call [NSAccessibility accessibilityRole], it returns @"AXWebArea",

				and if you call [NSAccessibility accessibilityLabel], it returns

				"How old are you?".

				The Linux accessibility API, ATK, is more similar to the Windows APIs;

				they were developed together. (Chrome's support for desktop Linux

				accessibility is unfinished.)

				The Android accessibility API is of course based on Java. The main

				data structure is AccessibilityNodeInfo. It doesn't have a role, but

				if you call AccessibilityNodeInfo.getClassName() on the root node

				it returns "android.webkit.WebView", and if you call

				AccessibilityNodeInfo.getContentDescription() it returns "How old are you?".

				On Chrome OS, we use our own accessibility API that closely maps to

				Chrome's internal accessibility API.

				So while the details of the interface vary, the underlying concepts are

				similar. Both IAccessible and NSAccessibility have a concept of a role,

				but IAccessible uses a role of "document" for a web page, while NSAccessibility

				uses a role of "web area". Both IAccessible and NSAccessibility have a

				concept of the primary accessible text for a node, but IAccessible calls

				it the "name" while NSAccessibility calls it the "label", and Android

				calls it a "content description".

				**Historical note:** The internal names of roles and attributes in

				Chrome often tend to most closely match the macOS accessibility API

				because Chromium was originally based on WebKit, where most of the

				accessibility code was written by Apple. Over time we're slowly

				migrating internal names to match what those roles and attributes are

				called in web accessibility standards, like ARIA.

				### Accessibility Events

				In Chromium's internal terminology, an Accessibility Event always represents

				communication from the app to the assistive technology, indicating that the

				accessibility tree changed in some way.

				As an example, if the user were to press the Tab key and the text

				field from the example above became focused, Chromium would fire a

				"focus" accessibility event that assistive technology could listen

				to. A screen reader might then announce the name and current value of

				the text field. A magnifier might zoom the screen to its bounding

				box. If the user types some text into the text field, Chromium would

				fire a "value changed" accessibility event.

				As with nodes in the accessibility tree, each platform has a slightly different

				API for accessibility events. On Windows we'd fire EVENT_OBJECT_FOCUS for

				a focus change, and on Mac we'd fire @"AXFocusedUIElementChanged".

				Those are pretty similar. Sometimes they're quite different - to support

				live regions (notifications that certain key parts of a web page have changed),

				on Mac we simply fire @"AXLiveRegionChanged", but on Windows we need to

				fire IA2_EVENT_TEXT_INSERTED and IA2_EVENT_TEXT_REMOVED events individually

				on each affected node within the changed region, with additional attributes

				like "container-live:polite" to indicate that the affected node was part of

				a live region. This discussion is not meant to explain all of the technical

				details but just to illustrate that the concepts are similar,

				but the details of notifying software on each platform about changes can

				vary quite a bit.

				### Accessibility Actions

				Each native object that implements a platform's native accessibility API

				supports a number of actions, which are requests from the assistive

				technology to control or change the UI. This is the opposite of events,

				which are messages from Chromium to the assistive technology.

				For example, if the user had a voice control application running, such as

				Voice Access on Android, the user could just speak the name of one of the

				buttons on the page, like "Next". Upon recognizing that text and finding

				that it matches one of the UI elements on the page, the voice control

				app executes the action to click the button id=6 in Chromium's accessibility

				tree. Internally we call that action "do default" rather than click, since

				it represents the default action for any type of control.

				Other examples of actions include setting focus, changing the value of

				a control, and scrolling the page.

				### Parameterized attributes

				In addition to accessibility attributes, events, and actions, native

				accessibility APIs often have so-called "parameterized attributes".

				The most common example of this is for text - for example there may be

				a function to retrieve the bounding box for a range of text, or a

				function to retrieve the text properties (font family, font size,

				weight, etc.) at a specific character position.

				Parameterized attributes are particularly tricky to implement because

				of Chromium's multi-process architecture. More on this in the next section.

				## Chromium's multi-process architecture

				Native accessibility APIs tend to have a *functional* interface, where

				Chromium implements an interface for a canonical accessible object that

				includes methods to return various attributes, walk the tree, or perform

				an action like click(), focus(), or setValue(...).

				In contrast, the web has a largely *declarative* interface. The shape

				of the accessibility tree is determined by the DOM tree (occasionally

				influenced by CSS), and the accessible semantics of a DOM element can

				be modified by adding ARIA attributes.

				One important complication is that all of these native accessibility APIs

				are *synchronous*, while Chromium is multi-process, with the contents of

				each web page living in a different process than the process that

				implements Chromium's UI and the native accessibility APIs. Furthermore,

				the renderer processes are *sandboxed*, so they can't implement

				operating system APIs directly.

				If you're unfamiliar with Chrome's multi-process architecture, see

				[this blog post introducing the concept](

				https://blog.chromium.org/2008/09/multi-process-architecture.html) or

				[the design doc on chromium.org](

				https://www.chromium.org/developers/design-documents/multi-process-architecture)

				for an intro.

				Chromium's multi-process architecture means that we can't implement

				accessibility APIs the same way that a single-process browser can -

				namely, by calling directly into the DOM to compute the result of each

				API call. For example, on some operating systems there might be an API

				to get the bounding box for a particular range of characters on the

				page.  In other browsers, this might be implemented by creating a DOM

				selection object and asking for its bounding box.

				That implementation would be impossible in Chromium because it'd require

				blocking the main thread while waiting for a response from the renderer

				process that implements that web page's DOM. (Not only is blocking the

				main thread strictly disallowed, but the latency of doing this for every

				API call makes it prohibitively slow anyway.) Instead, Chromium takes an

				approach where a representation of the entire accessibility tree is

				cached in the main process. Great care needs to be taken to ensure that

				this representation is as concise as possible.

				In Chromium, we build a data structure representing all of the

				information for a web page's accessibility tree, send the data

				structure from the renderer process to the main browser process, cache

				it in the main browser process, and implement native accessibility

				APIs using solely the information in that cache.

				As the accessibility tree changes, tree updates and accessibility events

				get sent from the renderer process to the browser process. The browser

				cache is updated atomically in the main thread, so whenever an external

				client (like assistive technology) calls an accessibility API function,

				we're always returning something from a complete and consistent snapshot

				of the accessibility tree. From time to time, the cache may lag what's

				in the renderer process by a fraction of a second.

				Here are some of the specific challenges faced by this approach and

				how we've addressed them.

				### Sparse data

				There are a *lot* of possible accessibility attributes for any given

				node in an accessibility tree. For example, there are more than 150

				unique accessibility API methods that Chrome implements on the Windows

				platform alone. We need to implement all of those APIs, many of which

				request rather rare or obscure attributes, but storing all possible

				attribute values in a single struct would be quite wasteful.

				To avoid each accessible node object containing hundreds of fields the

				data for each accessibility node is stored in a relatively compact

				data structure, ui::AXNodeData. Every AXNodeData has an integer ID, a

				role enum, and a couple of other mandatory fields, but everything else

				is stored in attribute arrays, one for each major data type.

				```

				AXMenuListItemUnselected 2

				AXMenuListItemSelected 1

				AXMenuListValueChanged 0

				struct AXNodeData {

				  int32_t id;

				  AXRole role;

				  ...

				  std::vector<std::pair<AXStringAttribute, std::string>> string_attributes;

				  std::vector<std::pair<AXIntAttribute, int32_t>> int_attributes;

				  ...

				}

				```

				An example of a command used to change the selection from "Option 1" to "Option

				3" might be:

				So if a text field has a placeholder attribute, we can store

				that by adding an entry to `string_attributes` with an attribute

				of ui::AX_ATTR_PLACEHOLDER and the placeholder string as the value.

				### Incremental tree updates

				Web pages change frequently. It'd be terribly inefficient to send a

				new copy of the accessibility tree every time any part of it changes.

				However, the accessibility tree can change shape in complicated ways -

				for example, whole subtrees can be reparented dynamically.

				Rather than writing code to deal with every possible way the

				accessibility tree could be modified, Chromium has a general-purpose

				tree serializer class that's designed to send small incremental

				updates of a tree from one process to another. The tree serializer has

				just a few requirements:

				* Every node in the tree must have a unique integer ID.

				* The tree must be acyclic.

				* The tree serializer must be notified when a node's data changes.

				* The tree serializer must be notified when the list of child IDs of a

				  node changes.

				The tree serializer doesn't know anything about accessibility attributes.

				It keeps track of the previous state of the tree, and every time the tree

				structure changes (based on notifications of a node changing or a node's

				children changing), it walks the tree and builds up an incremental tree

				update that serializes as few nodes as possible.

				In the other process, the Unserialization code applies the incremental

				tree update atomically.

				### Text bounding boxes

				One challenge faced by Chromium is that accessibility clients want to be

				able to query the bounding box of an arbitrary range of text - not necessarily

				just the current cursor position or selection. As discussed above, it's

				not possible to block Chromium's main browser process while waiting for this

				information from Blink, so instead we cache enough information to satisfy these

				queries in the accessibility tree.

				To compactly store the bounding box of every character on the page, we

				split the text into *inline text boxes*, sometimes called *text runs*.

				For example, in a typical paragraph, each line of text would be its own

				inline text box. In general, an inline text box or text run contians a

				sequence of text characters that are all oriented in the same direction,

				in a line, with the same font, size, and style.

				Each inline text box stores its own bounding box, and then the relative

				x-coordinate of each character in its text (assuming left-to-right).

				From that it's possible to compute the bounding box

				of any individual character.

				The inline text boxes are part of Chromium's internal accessibility tree.

				They're used purely internally and aren't ever exposed directly via any

				native accessibility APIs.

				For example, suppose that a document contains a text field with the text

				"Hello world", but the field is narrow, so "Hello" is on the first line and

				"World" is on the second line. Internally Chromium's accessibility tree

				might look like this:

				```

				AccessibilityMsg_DoDefaultAction 3

				staticText location=(8, 8) size=(38, 36) name='Hello world'

				    inlineTextBox location=(0, 0) size=(36, 18) name='Hello ' characterOffsets=12,19,23,28,36

				    inlineTextBox location=(0, 18) size=(38, 18) name='world' characterOffsets=12,20,25,29,37

				```

				All three concepts are handled at several layers in Chromium.

				### Scrolling, transformations, and animation

				Native accessibility APIs typically want the bounding box of every element in the

				tree, either in window coordinates or global screen coordinates. If we

				stored the global screen coordinates for every node, we'd be constantly

				re-serializing the whole tree every time the user scrolls or drags the

				window.

				Instead, we store the bounding box of each node in the accessibility tree

				relative to its *offset container*, which can be any ancestor. If no offset

				container is specified, it's assumed to be the root of the tree.

				In addition, any offset container can contain scroll offsets, which can be

				used to scroll the bounding boxes of anything in that subtree.

				Finally, any offset container can also include an arbitrary 4x4 transformation

				matrix, which can be used to represent arbitrary 3-D rotations, translations, and

				scaling, and more. The transformation matrix applies to the whole subtree.

				Storing coordinates this way means that any time an object scrolls, moves, or

				animates its position and scale, only the root of the scrolling or animation

				needs to post updates to the accessibility tree. Everything in the subtree

				remains valid relative to that offset container.

				Computing the global screen coordinates for an object in the accessibility

				tree just means walking up its ancestor chain and applying offsets and

				occasionally multiplying by a 4x4 matrix.

				### Site isolation / out-of-process iframes

				At one point in time, all of the content of a single Tab or other web view

				was contained in the same Blink process, and it was possible to serialize

				the accessibility tree for a whole frame tree in a single pass.

				Today the situation is a bit more complicated, as Chromium supports

				out-of-process iframes. (It also supports "browser plugins" such as

				the `<webview>` tag in Chrome packaged apps, which embeds a whole

				browser inside a browser, but for the purposes of accessibility this

				is handled the same as frames.)

				Rather than a mix of in-process and out-of-process frames that are handled

				differently, Chromium builds a separate independent accessibility tree

				for each frame. Each frame gets its own tree ID, and it keeps track of

				the tree ID of its parent frame (if any) and any child frames.

				In Chrome's main browser process, the accessibility trees for each frame

				are cached separately, and when an accessibility client (assistive

				technology) walks the accessibility tree, Chromium dynamically composes

				all of the frames into a single virtual accessibility tree on the fly,

				using those aforementioned tree IDs.

				The node IDs for accessibility trees only need to be unique within a

				single frame. Where necessary, separate unique IDs are used within

				Chrome's main browser process. In Chromium accessibility, a "node ID"

				always means that ID that's only unique within a frame, and a "unique ID"

				means an ID that's globally unique.

				## Blink

				@ -106,7 +463,7 @@ usually forwarded to [BrowserAccessibilityManager] which is responsible for:

				On Chrome OS, RenderFrameHostImpl does not route events to

				BrowserAccessibilityManager at all, since there is no platform screenreader

				outside Chrome to integrate with.

				outside Chromium to integrate with.

				## Views

Add more topics to accessiiblity documentation.

411 docs/accessibility.md

411

docs/accessibility.md