[privacy_budget] Document guidance on selecting surfaces.
Bug: 973801 Change-Id: I28dee78a9b9a3d4eedbdc7c2a2f13292a4aade35 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2463449 Reviewed-by: Matt Menke <mmenke@chromium.org> Commit-Queue: Matt Menke <mmenke@chromium.org> Auto-Submit: Asanka Herath <asanka@chromium.org> Cr-Commit-Position: refs/heads/master@{#821460}
This commit is contained in:

committed by
Commit Bot

parent
a858ebf1ba
commit
d037c97b4c
docs/privacy_budget
286
docs/privacy_budget/good_identifiable_surface.md
Normal file
286
docs/privacy_budget/good_identifiable_surface.md
Normal file
@ -0,0 +1,286 @@
|
||||
# What's a Good Candidate For an IdentifiableSurface? {#good-surface}
|
||||
|
||||
Once you have a source of potentially identifying information picked out, you
|
||||
need to determine how to represent the surface using
|
||||
[`blink::IdentifiableSurface`].
|
||||
|
||||
The first step would be to determine what the surface *is* in the first place.
|
||||
|
||||
_If the surface were to be presented as a question that the document asks
|
||||
a user-agent, what details should the question include in order for the answer
|
||||
to be identifiable across a wide range of user-agents?_
|
||||
|
||||
Sometimes the question is straightforward. E.g. [`window.screenTop`] pretty much
|
||||
captures the entire question. But it can get tricky as we'll see in the
|
||||
examples below.
|
||||
|
||||
All the pieces of information that the document needs to present to the
|
||||
user-agent in order to ask the identifiable question should be represented in
|
||||
the [`blink::IdentifiableSurface`].
|
||||
|
||||
There are two broad categories of identifiable surfaces:
|
||||
|
||||
* **Direct Surfaces**: Surfaces accessible via looking up an attribute or
|
||||
invoking a parameter-less operation of a global singleton object.
|
||||
|
||||
*Global singleton objects* refer to an object which is effectively the only
|
||||
instance of its interface in a single execution context. If one were to
|
||||
start from the global object and follow a chain of attributes or
|
||||
parameter-less methods, all objects encountered along the way are global
|
||||
singleton objects. One could pluck the attribute or operation out of the
|
||||
last interface and stick it in the global object and there would be no
|
||||
semantic difference.
|
||||
|
||||
In our `window.screenTop` example the global object exposes the `Window`
|
||||
interface. `Window.window` or just `window` is a reference back to this
|
||||
global object. The `Window` interface exposes the `screenTop` attribute. So
|
||||
`window.screenTop` is an expression that looks up an attribute of the global
|
||||
object.
|
||||
|
||||
By convention direct surfaces are represented using their corresponding
|
||||
[`blink::WebFeature`]. All APIs that are direct identifiable surfaces should
|
||||
have corresponding [Use Counters] and hence corresponding `WebFeature`
|
||||
values.
|
||||
|
||||
For `window.screenTop`, the resulting `IdentifiableSurface` constructor
|
||||
would look like this:
|
||||
|
||||
```cpp
|
||||
IdentifiableSurface::FromTypeAndToken(
|
||||
Type::kWebFeature, // All direct surfaces use this type.
|
||||
WebFeature::WindowScreenTop)
|
||||
```
|
||||
|
||||
See [Instrumenting Direct Surfaces] for details on how to instrument
|
||||
these surfaces.
|
||||
|
||||
2. **Indirect Surfaces**. A.k.a. everything else.
|
||||
|
||||
[`HTMLMediaElement.canPlayType()`] takes a string indicating a MIME type and
|
||||
returns a vague indication of whether that media type is supported (its
|
||||
return values are one of `"probably"`, `"maybe"`, or `""`). So the question
|
||||
necessarily must include the MIME type.
|
||||
|
||||
Hence the `IdentifiableSurface` constructor could look like this:
|
||||
|
||||
```cpp
|
||||
IdentifiableSurface::FromTypeAndToken(
|
||||
Type::kHTMLMediaElement_CanPlayType,
|
||||
IdentifiabilityBenignStringToken(mime_type))
|
||||
```
|
||||
|
||||
The [`blink::IdentifiableSurface`] includes:
|
||||
* That it represents an `HTMLMediaElement.canPlayType()` invocation.
|
||||
`Type::kHTMLMediaElement_CanPlayType` is a bespoke enum value that was
|
||||
introduced for this specific surface. See [Adding a Surface Type]
|
||||
for details on how to add a surface type.
|
||||
* The parameter that was passed to the operation.
|
||||
|
||||
`IdentifiabilityBenignStringToken` is a helper function that calculates
|
||||
a digest of a "benign" string. See [Instrumentation] for more details on how
|
||||
to represent operation arguments.
|
||||
|
||||
The distinction between a direct surface and an indirect surface can sometimes
|
||||
be fuzzy. But it's always based on what's known _a priori_ and what's practical
|
||||
to measure. A `canPlayType("audio/ogg; codecs=vorbis")` query could just as
|
||||
easily be represented as a `WebFeature` like
|
||||
`MediaElementCanPlayType_Audio_Ogg_Codecs_Vorbis`. But
|
||||
|
||||
* [This doesn't scale].
|
||||
* The set of MIME types can be pretty large and changing.
|
||||
* It's not possible to hardcode all possible values at coding time.
|
||||
* Most of the values will be irrelevant to identifiability, but we don't know which ones.
|
||||
|
||||
All things considered, deriving a digest for the argument is much more
|
||||
practical than alternatives.
|
||||
|
||||
### Example: NetworkInformation.saveData {#eg-net-effective-type}
|
||||
|
||||
The following expression yields whether the user-agent is operating under
|
||||
a reduce data usage constraint (See [`NetworkInformation.saveData`]):
|
||||
|
||||
```js
|
||||
navigator.connection.saveData
|
||||
```
|
||||
|
||||
This is a [direct surface]. As such constructing a `IdentifableSurface` only
|
||||
requires knowing the interface and name of the final attribute or operation in
|
||||
the expression.
|
||||
|
||||
Hence the `IdentifiableSurface` is of type `kWebFeature` with a web feature
|
||||
named `NetInfoEffectiveType` (which was a pre-existing [Use Counter][Use
|
||||
counters]). I.e.:
|
||||
|
||||
```cpp
|
||||
IdentifiableSurface::FromTypeAndToken(
|
||||
Type::kWebFeature,
|
||||
WebFeature::NetInfoEffectiveType)
|
||||
```
|
||||
|
||||
### Example: Media Capabilities {#eg-media-capabilities}
|
||||
|
||||
The [Media Capabilities API] helps determine whether some media type is
|
||||
supported. E.g.:
|
||||
|
||||
```js
|
||||
await navigator.mediaCapabilities.decodingInfo({
|
||||
type: "file",
|
||||
audio: { contentType: "audio/mp3" }
|
||||
});
|
||||
```
|
||||
|
||||
In this case the script is specifying a [`MediaDecodingConfiguration`]
|
||||
dictionary. The [`MediaCapabilitiesInfo`] object returned by [`decodingInfo()`]
|
||||
depends on the input. Hence we have to capture the input in the
|
||||
`IdentifiableSurface` as follows:
|
||||
|
||||
```cpp
|
||||
IdentifiableSurface::FromTypeAndToken(
|
||||
Type::kHTMLMediaElement_CanPlayType,
|
||||
IdentifiabilityBenignStringToken(mime_type))
|
||||
```
|
||||
|
||||
See [Instrumentation] for more details on how to represent operation arguments
|
||||
and caveats around encoding strings.
|
||||
|
||||
### Example: Media Streams API {#eg-media-streams}
|
||||
|
||||
Another more complicated example is this use of the [Media Streams API].
|
||||
|
||||
```js
|
||||
var mediaStream = await navigator.mediaDevices.getUserMedia({video: {
|
||||
height: 240,
|
||||
width: 320
|
||||
}});
|
||||
|
||||
var firstAudioTrack = mediaStream.getAudioTracks()[0];
|
||||
|
||||
var capabilities = firstAudioTrack.getCapabilities();
|
||||
```
|
||||
|
||||
The target identifiable surface is the value of `capabilities`.
|
||||
|
||||
An important consideration here is that [`MediaDevices.getUserMedia`] operation
|
||||
involves user interaction.
|
||||
|
||||
In theory, if the `getUserMedia` operation is successful, the
|
||||
`IdentifiableSurface` for `capabilities` should represent the artifacts
|
||||
(starting with the last global singleton object):
|
||||
|
||||
1. The operation [`MediaDevices.getUserMedia`].
|
||||
|
||||
1. A [`MediaStreamConstraints`] dictionary with value
|
||||
`{video: {height: 240, width: 320}}`.
|
||||
|
||||
1. The user action.
|
||||
|
||||
1. The operation [`MediaStream.getAudioTracks`] -- which is invoked on the
|
||||
result of the prior step assuming the operation succeeded.
|
||||
|
||||
1. `[0]`^th^ index -- applied to the list of [`MediaStreamTrack`]s resulting from
|
||||
the previous step
|
||||
|
||||
> The Media Streams API does not specify the order of tracks. In general
|
||||
> where a spec doesn't state the ordering of a sequence, the ordering itself
|
||||
> can be a tracking concern. However in this case the implementation yields
|
||||
> at most one audio track after a successful `getUserMedia` invocation.
|
||||
> Hence there's no ordering concern here at least in Chromium.
|
||||
|
||||
1. The operation [`MediaStreamTrack.getCapabilities`] -- which is invoked on
|
||||
the result of the prior step.
|
||||
|
||||
However,
|
||||
|
||||
* The user action is not observable by the document. The only outcome exposed
|
||||
to the document is whether `getUserMedia()` returned a `MediaStream` or if
|
||||
the request was rejected due to some reason.
|
||||
|
||||
It's not necessary to go beyond what the document can observe.
|
||||
|
||||
* If the call is successful the initial state of the resulting `MediaStream`
|
||||
determines the stable properties that a document can observe.
|
||||
|
||||
The remaining accessors (e.g. `getAudioTracks()`, `getVideoTracks()` etc...)
|
||||
deterministically depend on the returned `MediaStream` with the exception of
|
||||
the indexing in step 5 which can be non-deterministic if there is more than
|
||||
one audio track.
|
||||
|
||||
The diversity of document exposed state past step 3 is a subset of the
|
||||
diversity of the initial `MediaStream` object.
|
||||
|
||||
* If the call is rejected due to the request being over-constrained, then the
|
||||
exception could indicate limitations of the underlying devices.
|
||||
|
||||
Considering the above, we can tease apart multiple identifiable surfaces:
|
||||
|
||||
1. **VALID** The mapping from <`"MediaDevices.getUserMedia"` operation,
|
||||
`MediaStreamConstraints` instance> to <`Exception` instance> when
|
||||
the call rejects prior to any user interaction.
|
||||
|
||||
1. **OUT OF SCOPE** The mapping from <`"MediaDevices.getUserMedia"`
|
||||
operation, `MediaStreamConstraints` instance> to <time elapsed> when
|
||||
the call resolves.
|
||||
|
||||
Timing vectors like this are outside the scope of the initial study.
|
||||
|
||||
1. **INFEASIBLE** The mapping from <`"MediaDevices.getUserMedia"` operation,
|
||||
<`MediaStreamConstraints` instance, (user action)> to
|
||||
<`MediaStream` instance>.
|
||||
|
||||
As mentioned earlier the user action is not exposed to the document. Hence
|
||||
we end up with an incomplete metric where the key doesn't have sufficient
|
||||
diversity to account for the outcomes.
|
||||
|
||||
1. **INFEASIBLE** The mapping from <`"MediaDevices.getUserMedia"` operation,
|
||||
`MediaStreamConstraints` instance, (user action)> to <`Exception`
|
||||
instance>.
|
||||
|
||||
Same problem as above.
|
||||
|
||||
1. **VALID** The mapping from <`MediaStreamTrack` instance> to
|
||||
<`MediaTrackCapabilities` instance>.
|
||||
|
||||
We can reason that this mapping is going to be surjective. The diversity of
|
||||
<`MediaTrackCapabilities` instance> is not going to add information.
|
||||
|
||||
For simplicity surjective mappings can be collapsed into a single point
|
||||
without losing information. Thus the mapping here is just <`MediaStream`
|
||||
instance> to <`1`> where the value is arbitrary and doesn't matter.
|
||||
|
||||
1. **VALID** The mapping from <`MediaStreamTrack.label` string> to
|
||||
<`MediaTrackCapabilities` instance>.
|
||||
|
||||
The label is a string like "Internal microphone" which can be presented to
|
||||
the user and assumed to be discerning enough that the user will find the
|
||||
string sufficient to identify the correct device.
|
||||
|
||||
The metrics we can derive from this surface are marked as **VALID**.
|
||||
|
||||
Constructing a digest out of any of the dictionary instances also requires some
|
||||
care. Only include properties of each object that are expected to persist
|
||||
across browsing contexts. For example, any identifier that is origin-local,
|
||||
document-local, or depends on input from the document is not a good candidate.
|
||||
|
||||
<!-- sort, case insensitive -->
|
||||
[`blink::IdentifiableSurface`]: ../../third_party/blink/public/common/privacy_budget/identifiable_surface.h
|
||||
[`blink::WebFeature`]: ../../third_party/blink/public/mojom/web_feature/web_feature.mojom
|
||||
[`decodingInfo()`]: https://www.w3.org/TR/media-capabilities/#dom-mediacapabilities-decodinginfo
|
||||
[`HTMLMediaElement.canPlayType()`]: https://developer.mozilla.org/en-US/docs/Web/API/HTMLMediaElement/canPlayType
|
||||
[`MediaCapabilitiesInfo`]: https://www.w3.org/TR/media-capabilities/#dictdef-mediacapabilitiesinfo
|
||||
[`MediaDecodingConfiguration`]: https://developer.mozilla.org/en-US/docs/Web/API/MediaDecodingConfiguration
|
||||
[`MediaDevices.getUserMedia`]: https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getUserMedia
|
||||
[`MediaStream.getAudioTracks`]: https://developer.mozilla.org/en-US/docs/Web/API/MediaStream/getAudioTracks
|
||||
[`MediaStream`]: https://developer.mozilla.org/en-US/docs/Web/API/MediaStream
|
||||
[`MediaStreamConstraints`]: https://developer.mozilla.org/en-US/docs/Web/API/MediaTrackConstraints
|
||||
[`MediaStreamTrack.getCapabilities`]: https://developer.mozilla.org/en-US/docs/Web/API/MediaStreamTrack/getCapabilities
|
||||
[`MediaStreamTrack`]: https://developer.mozilla.org/en-US/docs/Web/API/MediaStreamTrack
|
||||
[`NetworkInformation.saveData`]: https://developer.mozilla.org/en-US/docs/Web/API/NetworkInformation/saveData
|
||||
[`window.screenTop`]: https://developer.mozilla.org/en-US/docs/Web/API/Window/screenTop
|
||||
[Adding a Surface Type]: privacy_budget_instrumentation.md#adding-a-surface-type
|
||||
[direct surface]: privacy_budget_glossary.md#directsurface
|
||||
[Instrumentation]: privacy_budget_instrumentation.md
|
||||
[Instrumenting Direct Surfaces]: privacy_budget_instrumentation.md#annotating-direct-surfaces
|
||||
[Media Capabilities API]: https://developer.mozilla.org/en-US/docs/Web/API/Media_Capabilities_API
|
||||
[Media Streams API]: https://developer.mozilla.org/en-US/docs/Web/API/Media_Streams_API
|
||||
[this doesn't scale]: https://thecooperreview.com/10-tricks-appear-smart-meetings/
|
||||
[Use Counters]: ../use_counter_wiki.md
|
@ -17,10 +17,10 @@ Follow the instructions below for adding instrumentation for an API.
|
||||
1. Determine the `UkmSourceId` and `UkmRecorder` to use for reporting, which
|
||||
depends on what you have. See the table below:
|
||||
|
||||
|You have this |Use this |
|
||||
|---------------------------|-------------------------------------------------------------------------|
|
||||
|[`blink::Document`] |`Document::UkmRecorder()` and `Document::UkmSourceID()` |
|
||||
|[`blink::ExecutionContext`]|`ExecutionContext::UkmRecorder()` and `ExecutionContext::UkmSourceID()` |
|
||||
| You have this | Use this |
|
||||
|----------------------------|-------------------------------------------------------------------------|
|
||||
|[`blink::Document`] |`Document::UkmRecorder()` and `Document::UkmSourceID()` |
|
||||
|[`blink::ExecutionContext`] |`ExecutionContext::UkmRecorder()` and `ExecutionContext::UkmSourceID()` |
|
||||
|
||||
Several classes inherit `blink::ExecutionContext` and therefore implement
|
||||
`UkmRecorder()` and `UkmSourceID()` methods. E.g.:
|
||||
@ -38,12 +38,12 @@ Follow the instructions below for adding instrumentation for an API.
|
||||
source ID can be mapped to a top level navigation.
|
||||
|
||||
1. Decide on the [`blink::IdentifiableSurface`] to use, and the method for
|
||||
constructing it. If there's no corresponding surface type, see the [Surface
|
||||
Types](#surface-types) section for instructions on adding a new type.
|
||||
constructing it. If there's no corresponding surface type, see the
|
||||
[Surface Types](#surface-types) section for instructions on adding a new type.
|
||||
|
||||
*** note
|
||||
When constructing the [`blink::IdentifiableSurface`] instance, prefer
|
||||
`FromTypeAndToken` instead of `FromTypeFromInput`.
|
||||
What's a good candidate for [`blink::IdentifiableSurface`]?
|
||||
See [What's a good candidate for IdentifiableSurface?] below.
|
||||
***
|
||||
|
||||
1. Condition all additional work on whether the study is active and whether the
|
||||
@ -236,7 +236,7 @@ belong to two different types. I.e. two different
|
||||
[`blink::IdentifiableSurface::Type`]`s`. If a matching type doesn't exist,
|
||||
you'll need to add one. See the next section for how to do that.
|
||||
|
||||
### Adding a Surface Type
|
||||
### Adding a Surface Type {#adding-a-surface-type}
|
||||
|
||||
All surface types and their parameters must be documented in
|
||||
[`identifiable_surface.h`]. When adding a new type, you should document:
|
||||
@ -291,8 +291,6 @@ Here's a sample CL that shows what needs to be done:
|
||||
* http://crrev.com/c/2351957: Adds IDL based instrumentation for
|
||||
`Screen.internal` and `Screen.primary`.
|
||||
|
||||
TODO(dylancutler@google.com): Add examples of adding support for new V8 types.
|
||||
|
||||
Don't add custom `UseCounter` enums and instead rely on the generated
|
||||
`UseCounter` name whenever possible.
|
||||
|
||||
@ -327,7 +325,7 @@ detail that doesn't belong in the IDL. It also adds unnecessary noise.
|
||||
|
||||
*** note
|
||||
**IMPORTANT** Make sure that each API has its own `UseCounter` name. Otherwise
|
||||
multiple APIs will have their samples accumulate within the same bucket. This
|
||||
multiple APIs will have their samples aggregated within the same bucket. This
|
||||
alters the observed characteristics of the API from what it really is.
|
||||
***
|
||||
|
||||
@ -337,12 +335,15 @@ alters the observed characteristics of the API from what it really is.
|
||||
[`blink::IdentifiabilityStudySettings`]: ../../third_party/blink/public/common/privacy_budget/identifiability_study_settings.h
|
||||
[`blink::IdentifiableSurface::Type`]: ../../third_party/blink/public/common/privacy_budget/identifiable_surface.h
|
||||
[`blink::IdentifiableSurface`]: ../../third_party/blink/public/common/privacy_budget/identifiable_surface.h
|
||||
[`blink::WebFeature`]: ../../third_party/blink/public/mojom/web_feature/web_feature.mojom
|
||||
[`HighEntropy`]: ../../third_party/blink/renderer/bindings/IDLExtendedAttributes.md#HighEntropy_m_a_c
|
||||
[`identifiability_digest_helpers.h`]: ../../third_party/blink/renderer/platform/privacy_budget/identifiability_digest_helpers.h
|
||||
[`identifiable_surface.h`]: ../../third_party/blink/public/common/privacy_budget/identifiable_surface.h
|
||||
[`identifiable_token.h`]: ../../third_party/blink/public/common/privacy_budget/identifiable_token.h
|
||||
[`Measure`]: ../../third_party/blink/renderer/bindings/IDLExtendedAttributes.md#Measure_i_m_a_c
|
||||
[`Plugin`]: ../../third_party/blink/renderer/modules/plugins/plugin.idl
|
||||
[`Screen`]: ../../third_party/blink/renderer/core/frame/screen.idl
|
||||
[direct surface]: privacy_budget_glossary.md#directsurface
|
||||
[Use Counter]: ../use_counter_wiki.md
|
||||
[volatile surface]: privacy_budget_glossary.md#volatilesurface
|
||||
[`HighEntropy`]: ../../third_party/blink/renderer/bindings/IDLExtendedAttributes.md#HighEntropy_m_a_c
|
||||
[`Measure`]: ../../third_party/blink/renderer/bindings/IDLExtendedAttributes.md#Measure_i_m_a_c
|
||||
[What's a good candidate for IdentifiableSurface?]: good_identifiable_surface.md
|
||||
|
Reference in New Issue
Block a user