0

[docs] add "What's Up With That" transcripts

Change-Id: Ie7f34cd19b5f97f9330e914d13de0f6e3ea2d7de
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4886394
Commit-Queue: Nigel Tao <nigeltao@chromium.org>
Reviewed-by: Sharon Yang <yangsharon@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1202896}
This commit is contained in:
Nigel Tao
2023-09-28 22:30:44 +00:00
committed by Chromium LUCI CQ
parent 238a174121
commit 187a479a8a
10 changed files with 7266 additions and 0 deletions

@ -438,6 +438,22 @@ used when committed.
### UI
* [Chromium UI Platform](ui/index.md) - All things user interface
### What's Up With That Transcripts
These are transcripts of [What's Up With
That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq),
a video series of interviews with Chromium software engineers.
* [What's Up With Pointers - Episode 1](transcripts/wuwt-e01-pointers.md)
* [What's Up With DCHECKs - Episode 2](transcripts/wuwt-e02-dchecks.md)
* [What's Up With //content - Episode 3](transcripts/wuwt-e03-content.md)
* [What's Up With Tests - Episode 4](transcripts/wuwt-e04-tests.md)
* [What's Up With BUILD.gn - Episode 5](transcripts/wuwt-e05-build-gn.md)
* [What's Up With Open Source - Episode 6](transcripts/wuwt-e06-open-source.md)
* [What's Up With Mojo - Episode 7](transcripts/wuwt-e07-mojo.md)
* [What's Up With Processes - Episode 8](transcripts/wuwt-e08-processes.md)
* [What's Up With Site Isolation - Episode 9](transcripts/wuwt-e09-site-isolation.md)
### Probably Obsolete
* [TPM Quick Reference](tpm_quick_ref.md) - Trusted Platform Module notes.
* [System Hardening Features](system_hardening_features.md) - A list of

@ -0,0 +1,601 @@
# Whats Up With Pointers
This is a transcript of [What's Up With
That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)
Episode 1, a 2022 video discussion between [Sharon (yangsharon@chromium.org)
and Dana (danakj@chromium.org)](https://www.youtube.com/watch?v=MpwbWSEDfjM).
The transcript was automatically generated by speech-to-text software. It may
contain minor errors.
---
Welcome to the first episode of Whats Up With That, all about pointers! Our
special guest is C++ expert Dana. This talk covers smart pointer types we have
in Chrome, how to use them, and what can go wrong.
Notes:
- https://docs.google.com/document/d/1VRevv8JhlP4I8fIlvf87IrW2IRjE0PbkSfIcI6-UbJo/edit
Links:
- [Life of a Vulnerability](https://www.youtube.com/watch?v=HAJAEQrPUN0)
- [MiraclePtr](https://www.youtube.com/watch?v=WhI1NWbGvpE)
---
0:00 SHARON: Hi, everyone, and welcome to the first installment of "What's Up
With That", the series that demystifies all things Chrome. I'm your host,
Sharon, and today's inaugural episode will be all about pointers. There are so
many types of types - which one should I use? What can possibly go wrong? Our
guest today is Dana, who is one of our Base and C++ OWNERS and is currently
working on introducing Rust to Chromium. Previously, she was part of bringing
C++ 11 support to the Android NDK and then to Chrome. Today, she'll be telling
us what's up with points. Welcome, Dana!
00:31 DANA: Thank you, Sharon. It's super exciting to be here. Thank you for
letting me be on your podcast thingy.
00:36 SHARON: Yeah, thanks for being the first episode. So let's just jump
right in. So when you use pointers wrong, what can go wrong? What are the
problems? What can happen?
00:48 DANA: So pointers are a big cause in security problems for Chrome, and
that's what we mostly think about when things go wrong with pointers. So you
have a pointer to some thing, like you've pointed to a goat. And then you
delete the goat, and you allocate some new thing - a cow. And it gets stuck in
the same spot. Your pointer didn't change. It's still pointing to what it
thinks is a goat, but there's now a cow there. And so when you go to use that
pointer, you use something different. And this is a tool that malicious actors
use to exploit software, like Chrome, in order to gain access to your system,
your information, et cetera.
01:39 SHARON: And we want to avoid those. So what's that general type of attack
called?
01:39 DANA: That's a Use-After-Free because you have freed the goat and
replaced it with a cow. And you're using your pointer, but the thing it pointed
to was freed. There are other kinds of pointer badness that can happen. If you
take a pointer and you add to it some number, or you go to an offset off the
pointer, and you have an array of five things, and you go and read 20, or minus
2, or something, now you're reading out of bounds of that memory allocation.
And that's not good. these are both memory safety bugs that occur a lot with
pointers.
02:23 SHARON: Today, we'll be mostly looking at the Use-After-Free kind of
bugs. We definitely see a lot of those. And if you want to see an example of
one being used, Dana has previously done a talk called, "Life of a
Vulnerability." It'll be linked below. You can check that out. So that being
said, should we ever be using just a regular raw pointer in C++ in Chrome?
02:41 DANA: First of all, let's call them native pointers. You will see them
called raw pointers a lot in literature and stuff. But later on, we'll see why
that could be a bit ambiguous in this context. So we'll call them a native
pointer. So should you use a native pointer? If you don't want to
Use-After-Free, if you don't want a problem like that, no. However, there is a
performance implication with using smart pointers, and so the answer is yes.
The style guide that we have right now takes this pragmatic approach of saying
you should use raw pointers for giving access to an object. So if you're
passing them as a function parameter, you can share it as a pointer or a
reference, which is like a pointer with slightly different rules. But you
should not store native pointers as fields and objects because that is a place
where they go wrong a lot. And you should not use a native pointer to express
ownership. So before C++ 11, you would just say, this is my pointer, use a
comment, say this one is owning it. And then if you wanted to pass the
ownership, you just pass this native pointer over to something else as an
argument, and put a comment and say this is passing ownership. And you just
kind of hope it works out. But then it's very difficult. It requires the
programmer to understand the whole system to do it correctly. There is no help.
So in C++ 11, the type called `std::optional_ptr` - or sorry, `std::unique_ptr`
- was introduced. And this is expressing unique ownership. That's why it's
called `unique_ptr`. And it's just going to hold your pointer, and when it goes
out of scope, it gets deleted. It can't be copied because it's unique
ownership. But it can be moved around. And so if you're going to express
ownership to an object in the heap, you should use a `unique_ptr`.
04:48 SHARON: That makes sense. And that sounds good. So you mentioned smart
pointers before. You want to tell us a bit more about what those are? It sounds
like `unique_ptr` is one of those.
04:55 DANA: Yes, so a smart pointer, which can also be referred to as a
pointer-like object, perhaps as a subset of them, is a class that holds inside
of it a pointer and mediates access to it in some way. So unique pointer
mediates access by saying I own this pointer, I will delete this pointer when I
go away, but I'll give you access to it. So you can use the arrow operator or
the star operator to get at the underlying pointer. And you can construct them
out of native pointers as well. So that's an example of a smart pointer.
There's a whole bunch of smart pointers, but that's the general idea. I'm going
to add something to what a native pointer is, while giving you access to it in
some way.
05:40 SHARON: That makes sense. That's kind of what our main thing is going to
be about today because you look around in Chrome. You'll see a lot of these
wrapper types. It'll be a `unique_ptr` and then a type. And you'll see so many
types of these, and talking to other people, myself, I find this all very
confusing. So we'll cover some of the more common types today. We just talked
about unique pointers. Next, talk about `absl::optional`. So why don't you tell
us about that.
06:10 DANA: So that's actually a really great example of a pointer-like object
that's not actually holding a pointer, so it's not a smart pointer. But it
looks like one. So this is this distinction. So `absl::optional`, also known as
`std::optional`, if you're not working in Chromium, and at some point, we will
hopefully migrate to it, `std::optional` and `absl::optional` hold an object
inside of it by value instead of by pointer. This means that the object is held
in that space allocated for the `optional`. So the size of the `optional` is
the size of the thing it's holding, plus some space for a presence flag.
Whereas a `unique_ptr` holds only a pointer. And its size is the size of a
pointer. And then the actual object lives elsewhere. So that's the difference
in how you can think about them. But otherwise, they do look quite similar. An
`optional` is a unique ownership because it's literally holding the object
inside of it. However, an `optional` is copyable if the object inside is
copyable, for instance. So it doesn't have quite the same semantics. And it
doesn't require a heap allocation, the way unique pointer does because it's
storing the memory in place. So if you have an `optional` on the stack, the
object inside is also right there on the stack. That's good or bad, depending
what you want. If you're worried about your object sizes, not so good. If
you're worried about the cost of memory allocation and free, good. So this is
the trade-off between the two.
07:51 SHARON: Can you give any examples of when you might want to use one
versus the other? Like you mentioned some kind of general trade-offs, but any
specific examples? Because I've definitely seen use cases where `unique_ptr` is
used when maybe an `optional` makes more sense or vise versa. Maybe it's just
because someone didn't know about it or it was chosen that way. Do you have any
specific examples?
08:14 DANA: So one place where you might use a `unique_ptr`, even though
`optional` is maybe the better choice, is because of forward declarations. So
because an `optional` holds the type inside of it, it needs to know the type
size, which means it needs to know the full declaration of that type, or the
whole definition of that type. And a `unique_ptr` doesn't because it's just
holding a pointer, so it only needs to know the size of a pointer. And so if
you have a header file, and you don't want to include another header file, and
you just want to forward declare the types, you can't stick an optional of that
type right there because you don't know how big it's supposed to be. So that
might be a case where it's maybe not the right choice, but for other
constraining reasons, you choose to use a `unique_ptr` here. And you pay the
cost of a heap allocation and free as a result. But when would you use an
`optional`? So `optional` is fantastic for returning a value sometimes. I want
to do this thing, and I want to give you back a result, but I might fail. Or
sometimes there's no value to give you back. Typically, before C++ - what are
we on now, was it came in 14? I'm going to say it wrong. That's OK. Before we
had `absl::optional`, you would have to do different tricks. So you would pass
in a native pointer as a parameter and return a bool as the return value to say
did I populate the pointer. And yes, that works. But it's easy to mess it up.
It also generates less optimal code. Pointers cause the optimizer to have
troubles. And it doesn't express as nicely what your intention is. A return,
this thing, sometimes. And so in place of using this pointer plus bool, you can
put that into a single type, return an `optional`. Similar for holding
something as a field, where you want it to be held in line in your class, but
you don't always have it present, you can do that with an `optional` now, where
you would have probably used a pointer before. Or a `union` or something, but
that gets even more tricky. And then another place you might use it as a
function argument. However, that's usually not the right choice for a function
argument. Why? Because the `optional` holds the value inside of it.
Constructing an `optional` requires constructing the whole object inside of it.
And so that's not free. It can be arbitrarily expensive, depending on what your
type is. And if your caller to your function doesn't have already an
`optional`, they have to go and construct it to pass it to you. And that's a
copy or move of that inner type. So generally, if you're going to receive a
parameter, maybe sometimes, the right way to spell that is just to pass it as a
pointer because a native pointer, which can be null, when it's not present.
11:29 SHARON: Hopefully that clarifies some things for people who are trying to
decide which one best suits their use case. So moving on from that, some people
might remember from a couple of years ago that instead of being called
`absl::optional`, it used to be called `base::optional`. And do you want to
quickly mention why we switched from `base` to `absl`? And you mentioned even
switching to `std::optional`. Why this transition?
11:53 DANA: Yeah, absolutely. So as the C++ standards come out, we want to use
them, but we can't until our toolchain is ready. What's our toolchain? So our
compiler, our standard library, and unfortunately, we have more than one
compiler that we need to worry about. So we have the NaCl compiler. Luckily, we
just have Clang for the compiler choice we really have to worry about. But we
do have to wait for these things to be ready, and for a code base to be ready
to turn on the new standard because sometimes there are some non-backwards
compatible changes. But we can forward port stuff out of the standard library
into base. And so we've done that. We have a bunch of C++ 20 backport in base
now. We had 17 back ports before. We turned on 17, now they should hopefully be
gone. And so `base::optional` was an example of a backport, while `optional`
was still considered experimental in the standard library. We adopted use of
`absl` since then, and `absl` had also, essentially, a backport of the
`optional` type` inside of it for presumably the same reasons. And so why have
two when you can have one? That's a pretty good rule. And so we deprecated the
`base` one, removed it, and moved everything to the `abslq one. One thing to
note here, possibly interest, is we often add security hardening to things in
`base`. And so sometimes there is available in the standard library something.
But we choose not to use it and use something in `base` or `absl`, but we use
it in `base` instead, because we have extra hardening checks. And so part of
the process of removing `base::optional` and moving to `absl::optional` was
ensuring those same security hardening checks are present in `absl`. And we're
going to have to do the same thing to stop using `absl` and start using the
standard one. And that's currently a work in progress.
13:48 SHARON: So let's go through some of the `base` types because that's
definitely where the most of these kind of wrapper types live. So let's just
start with one that I learned about recently, and that's a `scoped_refptr`.
What's that? When should we use it?
13:59 DANA: So `scoped_refptr` is kind of your Chromium equivalent to
`shared_ptr` in the standard library. So if you're familiar with that, it's
quite similar, but it has some slight differences. So what is `scoped_refptr`?
It gives you share ownership of the underlying object. And it's a smart
pointer. It holds a pointer to an object that's allocated in the heap when all
`scoped_refptr` that point to the same object are gone, it'll be deleted. So
it's like `unique_ptr`, except it can be copied to add to your ref count,
basically. And when all of them are gone, it's destroyed. And it gives access
to the underlined pointer in exactly the same ways. Oh, but why is it different
than `shared_ptr`? I did say it is. `scoped_refptr` requires the type that is
held inside of it to inherit from `RefCounted` or `RefCountedThreadSafe`.
`shared_ptr` doesn't require this. Why? So `shared_ptr` sticks an allocation
beside your object and then puts your object here. So the ref count is
externalized to your object being stored and owned by the shared pointer.
Chromium took this position to be doing intrusive ref counting. So because we
inherit from a known type, we stick the ref count in that base class,
`RefCounted` or `RefCountedThreadSafe`. And so that is enforced by the
compiler. You must inherit from one of these two in order to be stored and
owned in a `scoped_refptr`. What's the difference? Ref counted is the default
choice, but it's not thread safe. So the ref counting is cheap. It's the more
performant one, but if you have a `scoped_refptr` on two different threads
owning the same object, their ref counting will race, can be wrong, you can end
up with a double free - which is another way that pointers can go wrong, two
things free in the same thing - or you could end up with potentially not
freeing it at all, probably. I guess I've never checked if that's possible. But
they can race, and then bad things happen. Whereas, ref counted thread safe
gives you atomic ref counting. So atomic means that across all threads, they're
all going to have the same view of the state. And so it can be used across
threads and be owned across threads. And the tricky part there is the last
thread that owns that object is where it's going to be destroyed. So if your
objects destructor does things that you expect to happen on a specific thread,
you have to be super careful that you synchronize which thread that last
reference is going away on, or it could explode in a really flakey way.
17:02 SHARON: This sounds useful in other ways. What are some kind of more
design things to consider, in terms of when a scope ref pointer is useful and
does help enforce things that you want to enforce, like relative lifetimes of
certain objects?
17:15 DANA: Generally, we recommend that you don't use ref counting if you can
help it. And that's because it's hard to understand when it's going to be
destroyed, like I kind of alluded to with the thread situation. Even in a
single thread situation, how do you know which one is the last reference? And
is this object going to outlive that other object? Maybe sometimes. It's not
super obvious. It's a little more clear with a `unique_ptr`, at least local to
where that `unique_ptr`'s destruction is. But there's usually no
`scoped_refptr`. You can say this is the last one. So I know it's gone after
this thing is gone. Maybe it is, maybe it's not often. So it's a bit tricky.
However, there are scenarios when you truly want a bunch of things to have
access to a piece of data. And you want that data to go away when nobody needs
it anymore. And so that is your use case for a `scoped_refptr`. It is nicer
when that thing being with shared ownership is not doing a lot of interesting
things, especially in its destructor because of the complexity that's involved
in shared ownership. But you're welcome to shoot yourself in the foot with this
one if you need to.
18:33 SHARON: We're hoping to help people not shoot themselves in the foot. So
use `scoped_refptr` carefully, is the lesson there. So you mentioned
`shared_ptr`. Is that something we see much of in Chrome, or is that something
that we generally try to avoid in terms of things from the standard library?
18:51 DANA: That is something that is banned in Chrome. And that's just
basically because we already have `scoped_refptr`, and we don't want two of the
same thing. There's been various times where people have brought up why do we
need to have both? Can we just use `shared_ptr` now? And nobody's ever done the
kind of analysis needed to make that kind of decision. And so we stay with what
we're at.
19:18 SHARON: If you want to do that, there's someone that'll tell you what to
do. So something that when I was using `scoped_refptr`, I came across that you
need a WeakPtrFactory to create such a pointer. So weak pointers and WeakPtr
factories are one of those things that you see a lot in Chrome and one of these
base things. So tell us a bit about weak pointers and their factories.
19:42 DANA: So WeakPtr and WeakPtrFactory have a bit of an interesting history.
Their major purpose is for asynchronous work. Chrome is basically a large
asynchronous machine, and what does that mean? It means that we break all of
the work of Chrome up into small pieces of work. And every time you've done a
piece, you go and say, OK, I'm done. And when the next piece is ready, run this
thing. And maybe that next thing is like a user input event, maybe that's a
reply from the network, whatever it might be. And there's just a ton of steps
in things that happen in Chrome. Like, a navigation has a request, a response,
maybe another request - some redirects, whatever. That's an example of tons of
smaller asynchronous tasks that all happen independently. So what goes on with
asynchronous tasks? You don't have a continuous stack frame. What does that
mean? So if you're just running some synchronous code, you make a variable, you
go off and you do some things, you come back. Your variable is still here
right. You're in this stack frame. And you can keep using it. You have
asynchronous tasks. You make a variable, you go and do some work, and you are
done your task Boop, your stack's gone. You come back later, you're going to
continue. You don't have your variable anymore. So any state that you want to
keep across your various tasks has to be stored and what we call bound in with
that task. If that's a pointer, that's especially risky. So we talked earlier
about Use-After-Frees. Well, you can, I hope, imagine how easy it is to stick a
pointer into your state. This pointer is valid, I'm using it. I go away, I come
back when? I don't know, sometime in the future. And I'm going to go use this
pointer. Is it still around? I don't own it. I didn't use a `unique_ptr`. So
who owns it. How do they know that I have a task waiting to use it? Well,
unless we have some side channel communicating that, they don't. And how do I
know if they've destroyed it if we don't have some side channel communicating
that. I don't know. And so I'm just going to use this pointer and bad things
happen. Your bank account is gone.
22:06 SHARON: No! My bank account!
22:06 DANA: I know. So what's the side channel? The side channel that we have
is WeakPtr. So a WeakPtr and WeakPtrFactory provide this communication
mechanism where WeakPtrFactory watches an object, and when the object gets
destroyed, the WeakPtrFactory inside of it is destroyed. And that sends this
little bit that says, I'm gone. And then when your asynchronous task comes back
with its pointer, but it's a WeakPtr inside of it and tries to run, it can be
like, am I still here? If the WeakPtrFactory was destroyed, no, I'm not. And
then you have a choice of what to do at that point. Typically, we're like,
abandon ship. Don't do anything here. This whole task is aborted. But maybe you
do something more subtle. That's totally possible.
22:59 SHARON: I think the example I actually meant to say that uses a
WeakPtrFactory is a SafeRef, which is another base type. So tell us a bit about
SafeRefs.
23:13 DANA: WeakPtr is cool because of the side channel that you can examine.
So you can say are you still alive, dear object? And it can tell you, no, it's
gone. Or yeah, it's here. And then you can use it. The problem with this is
that in places where you as the code author want to believe that this object is
actually always there, but you don't want a security bug if you're wrong. And
it doesn't mean that you're wrong now, even. Sometime later, someone can change
code, unrelated to where this is, where the ownership happens, and break you.
And maybe they don't know all the users of a given object and change in its
lifetime in some subtle way, maybe not even realizing they are. Suddenly you're
eventually seeing security bugs. And so that's why native pointers can be
pretty scary. And so SafeRef is something we can use instead of a native
pointer to protect you against this type of bug. It's built on top of WeakPtr
and WeakPtrFactory. That's its relationship, but its purpose is not the same.
so what SafeRef does is it says - SafePtr?
24:31 SHARON: SafeRef.
24:31 DANA: SafeRef.
24:31 SHARON: I think there's also a safe pointer, but there -
24:38 DANA: We were going to add it. I'm not sure if it's there yet. But so two
differences between SafeRef and WeakPtr then, ref versus ptr, it can't be null.
So it's like a reference wrapper. But the other difference is you can't observe
whether the object is actually alive or not. So it has the side channel, but it
doesn't show it to you. Why would you want that? If the information is there
anyway, why wouldn't you want to expose it? And the reason is because you are
documenting that you as the author understand and expect that this pointer is
always valid at this time. It turns out it's not valid. What do you do? If it's
a WeakPtr, people tend to say, we don't know if it's valid. It's a WeakPtr.
Let's check. Am I valid? And if I'm not, return. And what does that result in?
It results in adding a branch to your code. You do that over, and over, and
over, and over, and static analysis, which is what we as humans have to do -
we're not running the program, we're reading the code - can't really tell what
will happen because there's so many things that could happen. We could exit
here, we could exit there, we could exit here. Who knows. And that makes it
increasingly hard to maintain and refactor the code. So SafeRef you the option
to say this is always going to be valid. You can't check it. So if it's not
valid, go fix that bug somewhere else. It should be valid here.
26:16 SHARON: So what kind of -
26:16 DANA: The assumptions are broken.
26:16 SHARON: So what kind of errors happen when that assumption is broken? Is
that a crash? Is that a DCHECK kind of thing?
26:22 DANA: For SafeRef and for WeakPtr, if you try to use it without checking
it, or write it incorrectly, they will crash. And crashing in this case means a
safe crash. It's not going to lead to a security bug. It's literally just
terminating the program.
26:41 SHARON: Does that also mean you get a sad tab as a user? Like when the
little sad file comes up?
26:47 DANA: Yep. It would. If you're in the render process, you take it down.
It's a sad tab. So that's not great. It's better than a security bug. Because
your options here are don't write bugs. Ideal. I love that idea, but we know
that bugs happen. Use a native pointer, security problem. Use a WeakPtr, that
makes sense if you wanted it to sometimes not be there. But if you want it to
always be there - because you have to make a choice now of what you're supposed
to do if it's not, and it makes the code very hard to understand. And you're
only going to find out it can't be there through a crash anyhow. Or use a
SafeRef. And it's going to just give you the option to crash. You're going to
figure out what's wrong and make it no longer do that.
27:38 SHARON: I think wanting to guarantee the lifetime of some other things
seems like a pretty common thing that you might come across. So I'm sure there
are many cases for many people to be adding SafeRefs to make their code a bit
safer, and also ensure that if something does go wrong, it's not leading to a
memory bug that could be exploited in who knows how long. Because we don't
always hear about those. If it crashes, and they can reliably crash, at least
you know it's there. You can fix it. If it's not, we're hoping that one of our
VRP vulnerability researchers find it and report it, but that doesn't always
happen. So if we can know about these things, that's good. So another new type
in base that people might have been seeing recently is a `raw_ptr` which is
maybe why earlier we were saying let's call them native pointers, not raw
pointers. Because the difference between `raw_ptr` and raw pointer, very easy
to mix those up. So why don't you tell us a bit about `raw_ptr`s?
28:40 DANA: So `raw_ptr` is really cool. It's a non-owning smart pointer. So
that's kind of WeakPtr or SafeRef. These are also non-owning. And it's actually
very similar in inspiration to what WeakPtr is. So it has a side channel where
it can see if the thing It's pointing to is alive or gone. So for WeakPtr, it
talks to the WeakPtrFactory and says am I deleted? And for `raw_ptr`, what it
does is it keeps a reference count, kind of like `scoped_refptr`, but it's a
weak reference count. It's not owning. And it keeps this reference count in the
memory allocator. So Chrome has its own memory allocator for new and delete
called PartitionAlloc. And that lets us do some interesting stuff. And this is
one of them. And so what happens is as long as there is `raw_ptr` around, this
reference count is non-zero. So even if you go and you delete the object, the
allocator knows there is some pointer to it. It's still out there. And so it
doesn't free it. It holds it. And it poisons the memory, so that just means
it's going to write some bit pattern over it, so it's not really useful
anymore. It's basically re-initialized the memory. And so later, if you go and
use this `raw_ptr`, you get access to just dead memory. It's there, but it's
not useful anymore. You're not going to be able to create security bugs in the
same way. Because when we first started talking about a Use-After-Free - you
have your goat, you free it, a cow is there, and now your pointer is pointing
at the wrong thing - you can't do that because as long as there's this
`raw_ptr` to your goat, the goat can be gone, but nothing else is going to come
back here. It's still taken by that poisoned memory until all the `raw_ptr`s
are gone. So that's their job, to protect us from a Use-After-Free being
exploitable. It doesn't necessarily crash when you use it incorrectly, you just
get to use this bad memory inside of it. If you try to use it as a pointer,
then you're using a bad pointer, you're going to probably crash. But it's a
little bit different than a WeakPtr, which is going to deterministically crash
as soon as you try to use it when it's gone. It's really just a protection or a
mitigation against security exploits through Use-After-Free. And then we
recently just added `raw_ref`, which is really the same as `raw_ptr`, except
addressing null ability. So smart pointers in C++ have historically all allowed
a null state. That's representative of what native pointers did in C and C++.
And so this is kind of just bringing this along in this obvious, historical
way. But if you look at other languages that have been able to break with
history and make their own choices kind of fresh, we see that they make choices
like not having null pointers, not having null smart pointers. And that
increases the readability and the understanding of your code greatly. So just
like for WeakPtr, how we said, we just check if it's there or not. And if it's
not, we return, and so on. It's every time you have a WeakPtr, if you were
thinking of a timeline, every time you touch a WeakPtr, your timeline splits.
And so you get this exponential timeline of possible states that your
software's in. That's really intense. Whereas every time you cannot do that,
say this can't be null, so instead of WeakPtr, you're using SafeRef. This can't
be not here or null, actually. WeakPtr can't just be straight up null. This is
always present. Then you don't have a split in your timeline, and that makes it
a lot easier to understand what your software is doing. And so for `raw_ptr`,
it followed this historical precedent. It lets you have a null value inside of
it. And `raw_ref` is our kind of modern answer to this new take on nullability.
And so `raw_ref` is a reference wrapper, meaning it holds a reference inside of
it, conceptually, meaning it just can't be null. That is just basically - it's
a pointer, but it can't be null.
33:24 SHARON: So these do sound the most straightforward to use. So basically,
if you're not sure - or your class members at least - any time you would use a
native pointer or an ampersand, basically you should always just put those in
either a `raw_ptr` or a `raw_ref`, right?
33:45 DANA: Yeah, that's what our style guide recommends, with one nuance. So
because `raw_ptr` and `raw_ref` interact with the memory allocator, they have
the ability to be like, turn on or off dynamically at runtime. And there's a
performance hit on keeping this reference count around. And so at the moment,
they are not turned on in the renderer process because it's a really
performance-critical place. And the impact of security bugs, there is a little
less than in the browser process where you just immediately get access to the
whole system. And so we're working on turning it on there. But if you're
writing code that's only in the renderer process, then there's no point to use
it. And we don't recommend that you use it. But the default rule is yes. Don't
use a native pointer, don't use a native reference. As a field to an object,
use a `raw_ptr`, use a `raw_ref`. Prefer something with less states, always,
because you get less branches in your timeline. And then you can make it cost
if you don't want it to be able to rebound to an object, if you don't want the
pointer to change. Or you can make it mutable if you wanted to be able to.
34:58 SHARON: So you did mention that these types are ref counted, but earlier
you said that you should avoid ref counting things. So
35:04 DANA: Yes.
35:11 SHARON: So what's the balance there? Is it because with a
`scoped_refptr`, you're a bit more involved in the ref counting, or is it just,
this is we've done it for you, you can use it. This is OK.
35:19 DANA: No, this is a really good question. Thank you for asking that. So
there's two kinds of ref counts going on here. I tried to kind of allude to it,
but it's great to make it clear. So `scoped_refptr` is a strong ref count,
meaning the ref count owns the object. So the destructor runs, the object is
gone and deleted when that ref count goes to 0. `raw_ref` and `raw_ptr` are a
witchcraft count. They could be pointing to something owned in a
`scoped_refptr` even. So they can exist at the same time. You can have both
kind of ref counts going at the same time. A weak ref count, in this case, is
holding the memory alive so that it doesn't get re-used. But it's not keeping
the object in that memory alive. And so from a programming state point-of-view,
the weak refs don't matter. They're helping protect you from security bugs.
They're helping to make - when things go wrong, when a bug happens, they're
helping to make it less impactful. But they don't change your program in a
visible way. Whereas, strong references do. That destrutor's is based on when
the ref count goes to 0 for a strong reference. So that's the difference
between these two.
36:46 SHARON: So when you say don't use ref counting, you mean don't use strong
ref counting.
36:46 DANA: I do, yes.
36:51 SHARON: And if you want to learn more about the raw pointer, `raw_ptr`,
`raw_ref`, that's all part of the MiraclePtr project, and there's a talk about
that from BlinkOn. I'll link that below also. So in terms of other base types,
there's a new one that's called `base::expected`. I haven't even really seen
this around. So can you tell us a bit more about how we use that, and what
that's for?
37:09 DANA: `base::expected` is a backport from C++ 23, I want to say. So the
proposal for `base::expected` actually cites a Rust type as inspiration, which
is called `std::result` in Rust. And it's a lot like `optional`, so it's used
for return values. And it's more or less kind of a replacement for exceptions.
So Chrome doesn't compile with exceptions enabled even, so we've never relied
on exceptions to report errors. But we have to do complicated things, like with
`optional` to return a bool or an enum. And then maybe some value. And so this
kind of compresses all that down into a single type, but it's got more state
than just an option. So `expected` gives you two choices. It either returns
your value, like `optional` can, or it returns an error. And so that's the
difference between `optional` and `expected`. You can give a full error type.
And so this is really useful when you want to give more context on what went
wrong, or why you're not returning the value. So it makes a lot of sense in
stuff like File IO. So you're opening a file, and it can fail for various
reasons, like I don't have permission, it doesn't exist, whatever. And so in
that case, the way you would express that in a modern way would be to return
`base::expected` of your file handle or file class. And as an error, some
enumerator, perhaps, or even an object that has additional state beyond just I
couldn't open the file. But maybe a string about why you couldn't open the file
or something like this. And so it gives you a way to return a structured error
result.
39:05 SHARON: That's found useful in lots of cases. So all of these types are
making up for basically what is lacking in C++, which is memory safety. C++, it
does a lot. It's been around for a long time. Most of Chrome is written in it.
But there are all these memory issues. And a lot of our security bugs are a
result of this. So you are working on bringing Rust to Chromium. Why is that a
good next step? Why does that solve these problems we're currently facing?
39:33 DANA: So Rust has some very cool properties to it. Its first property
that is really important to this conversation is the way that it handles
pointers, which in Rust would be treated pretty much exclusively as references.
And what Rust does is it requires you to tell the compiler the relationships
between the lifetimes of your references. And the outcome of this additional
knowledge to the compiler is memory safety. And so what does that mean? It
means that you can't write a Use-After-Free bug in Rust unless you're going
into the unsafe part of the language, which is where scariness exists. But you
don't need to go there to write a normal program. So we'll ignore it. And so
what that means is you can't write the bug. And so that doesn't just mean I
also like to believe I can write C++ without a bug. That's not true. But I
would love to believe that. But it means that later, when I come back and
refactor my code, or someone comes who's never seen this before and fixes some
random bug somewhere related to it, they can't introduce a Use-After-Free
either. Because if they do, the compiler is like, hey - it's going to outlive
it. You can't use it. Sorry. And so there's this whole class of bugs that you
never have to debug, you never ship, they never affect users. And so this is a
really nice promise, really appealing for a piece of software like Chrome,
where our basic purpose is to handle arbitrary and adversarial data. You want
to be able to go on some web page, maybe it's hostile, maybe not. You just get
a link. You want to be able to click that link and trust that even if it's
really hostile and wanting to destroy you, it can't. Chrome is that safety net
for you. And so Rust is that kind of safety net for our code, to say no matter
how you change it over time, it's got your back. You can't introduce this kind
of bug.
42:03 SHARON: So the first project sounds really cool. If people want to learn
more or get involved - if you're into the whole languages, memory kind of thing
- where can people go to learn more?
42:09 DANA: So if you're interested in helping out with our Rust experiment,
then you can look for us in the Rust channel on Slack. If you're interested in
C++ language stuff, you can find us in the CXX channel on Slack, as well. As
well as the same CXX@chromium.org mailing list. And there is, of course, the
rust-dev@chromium.org mailing list if you want to use email to reach us as
well.
42:44 SHARON: Thank you very much, Dana. There will be notes from all of this
also linked in the description box. And thank you very much for this first
episode.
42:52 DANA: Thanks, Sharon This was fun.

@ -0,0 +1,453 @@
# Whats Up With DCHECKs
This is a transcript of [What's Up With
That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)
Episode 2, a 2022 video discussion between [Sharon (yangsharon@chromium.org)
and Peter (pbos@chromium.org)](https://www.youtube.com/watch?v=MpwbWSEDfjM).
The transcript was automatically generated by speech-to-text software. It may
contain minor errors.
---
You've seen DCHECKs around and been asked to use them in code review, but what
are they? What's the difference between a CHECK and a DCHECK? How do you use
them? Here to answer that is special guest is Peter, who works on UI and
improving crash reports.
Notes:
- https://docs.google.com/document/d/146LoJ1E3N3E6fb4zDh92HPQc6yhRpNI7DSKlJjaYlLw/edit
Links:
- [What's Up With Pointers](https://www.youtube.com/watch?v=MpwbWSEDfjM)
---
00:00 SHARON: Hello, and welcome to What's Up With That?, the series that
demystifies all things Chrome. I'm your host, Sharon. And today, we're talking
about DCHECKs. You've seen them around. You've probably been told to add one in
code review before. But what are they? What are they for, and what do they do?
Our guest today is Peter, who works on desktop UI and Core UI. He's also
working on improving Chrome's crash reports, which includes DCHECKs. Today
he'll help us answer, what's up with DCHECKs? Welcome, Peter.
00:30 PETER: Thanks for having me.
00:32 SHARON: Yeah. Thanks for being here. So the most obvious question to
start with, what is a DCHECK?
00:39 PETER: So a CHECK and a DCHECK are both sort of things that make sure
that what you think is true is true. Right? So this should never be called with
an empty vector. You might add a CHECK for it, or you might add a DCHECK for
it. And it's sort of similar to a search, which you may have hit during earlier
programming outside of Chrome. And what it means is when this line gets hit, we
check and see if it's true. And if it's not true, we crash. DCHECKs differ from
CHECKs in that they are traditionally only in debug builds, or local
development builds, or on our try-bots. So they have zero overhead when Chrome
hits stable, because the CHECK just won't be there.
01:24 SHARON: OK. So like if the D stands for Debug. That make sense.
01:28 PETER: Yeah. I want debug to turn into developer, because now we have
them by default if you're no longer - if you're doing a release build, and
you're not turning them off, and you're not doing an official build, you get
them.
01:42 SHARON: OK. Well, you heard it here first, or maybe you heard it before.
I heard it here first. So you mentioned asserts. So something that I've seen a
couple times in Chrome, and also is part of the standard library, is
`static_assert`. So how is that similar or different to DCHECKs? And why do we
use or not use them?
02:00 PETER: Right. So `static_assert`s are - and you're going to have to ask
C++ experts, who can probably take some of the sharp edges off of this - but
it's basically, if you can assert something in compile time, then you can use a
`static_assert`, which means that you don't have to hit a code path where it's
wrong. It sort of has to always hold true. And whenever you can use a
`static_assert`, use a `static_assert`, because it's free. And basically, you
can't compile the program if it's not true.
02:31 SHARON: OK. That's good to know, because I definitely thought that was
one of the C++ standard library things we should avoid, because we have a
similar thing in Chromium. But I guess that's not the case.
02:41 PETER: Yeah. Assert is the one that is - OK, so this is a little
complicated, right? `static_assert` is a language feature, not a library
feature. And someone will tell me that I'm wrong about something about this.
Asserts are just sort of a poorer version of DCHECKs. So they won't go through
our crash handling. It won't print the pretty stacks, et cetera.
`static_assert`s, on the other hand, are a compile time feature. And we don't,
as far as I know, have our own wrapper around it. We just use `static_assert`.
So what you would maybe use this for is like if you have a constant - like, say
you have an array, and the code makes an assumption that some constant is the
size of this array, you can assert that in compile time, and that would be a
good use of a `static_assert`.
03:26 SHARON: OK. Cool. So you mentioned that some things have changed with how
DCHECKs work. So can you give us a brief overview of the history of DCHECKs -
what they used to be, people who have been using them for a while, how might
they have changed from the idea of what they have as a DCHECK in their mind?
03:43 PETER: Sure. So this is as best I know. I'm just sort of extrapolating
from what I've seen. And what I think originally was true is that a CHECK used
to be this logging statement, where you essentially compile the file name and
the line number. And if this ever hits, then we'll log some stuff and then
crash. Right? Which comes with a little bit of overhead, especially on size,
that you basically take the file name and line number for every instance, and
that generates a bunch of strings and numbers that essentially add to Chrome's
binary size. I don't know how many steps between that and where we currently
are. But right now, our CHECKs are just, if condition is false, crash, which
means that you won't, out of the CHECK, get file name and line number. We'll
get those out of debugging symbols. And you also won't get any of the logging
messages that you can add to the end of a CHECK, which means that your debug
info will be poorer, but it will be cheaper to use. So they've gotten from
being pretty heavy CHECKs to being really cheap.
05:01 SHARON: OK. So that kind of leads us into the question that I think most
people want to have answered, which is, when should I use a DCHECK? When should
I use a CHECK? When should I use neither?
05:13 PETER: I would say that historically, we've said CHECKs are expensive.
Don't use them unless you sort of have to. And I don't think that holds true
anymore. So basically, unless you are in really performance-critical code, then
use a CHECK. If there's anything that you care about where the program state
will be unpredictable from this point on if it's not true, CHECK it. It's not
that expensive. Right? We have a lot of code where we push a string onto a
vector, and that never gets flagged in code review. And it's probably like 10
times more expensive, if not 100 times more expensive, than adding a CHECK. The
exception to that is if you're in a really hot loop where you don't want to
dereference a pointer, then a CHECK might add some cost. And the other is if
the condition that you're trying to validate is really expensive. It's not the
CHECK itself that's expensive. It's the thing you're evaluating. And if that's
expensive, then you might not afford doing a CHECK. If you don't know that it's
expensive, it's probably not expensive.
06:20 SHARON: Can you give us an example of something expensive to evaluate for
a CHECK?
06:24 PETER: Right. So say that you have something in video code that for every
video frame, for every pixel validates the alpha value as opaque, or something.
That would probably make video conferencing a little bit worse performance.
Another thing would just be if you have to traverse a graph on every frame, and
it will sort of jump all over memory to see if some reachability problem in
your graph is true, that's going to be a lot more expensive. But CHECKing that
index is less than some vector bounds, I think that should fall under cheap.
And -
07:02 SHARON: OK.
07:02 PETER: culturally, we've tried to avoid doing a lot of these. And I think
it's just hurting us.
07:09 SHARON: OK. So since most places we should use CHECKs, are there any
places where a DCHECK would be better then? Or any time you would have normally
previously used a DCHECK, you should just make that a check?
07:23 PETER: So we have a new construct that's called `EXPENSIVE_DCHECK`s, or
if `EXPENSIVE_DCHECK`s are on, I think we should add a corresponding macro for
`EXPENSIVE_DCHECK`. And then you should be able to just say, either it's
expensive and has to be a DCHECK, so use `EXPENSIVE_DCHECK`; otherwise, use
CHECK. And my hunch would be like 95% of what we have as DCHECKs would probably
serve us better as CHECKs. But your code owner and reviewer might disagree with
that. And it's not yet documented policy that we say CHECKs are cheap; just add
a billion of them. But I would like to get there eventually.
08:04 SHARON: OK. So if you put in a CHECK, and your reviewer tells them this
should be a DCHECK, the person writing the CL can point them to this video, and
then they can discuss from there.
08:13 PETER: I mean, yeah, you can either say Peter disagrees with you, or I
can get further along this and say we make policy that CHECKs are cheap, so
they are preferable. So a lot of foot-shooters with DCHECKs is that you expect
this property to hold true, but you never effectively CHECK it. And that can
lead to all sorts of bad stuff, right? Like if you're trying to DCHECK that
some origin for some frame makes some assumptions of site iso - I don't know
site isolation well enough to say this. But basically, if you're DCHECKing that
the code that you're running runs under some sort of permissions, then that is
effectively unchecked in stable, right? And we do care about those properties,
and it would be really good if we crashed rather than leaked information
between sites.
09:12 SHARON: Right.
09:14 PETER: Yeah.
09:16 SHARON: So that seems like a good tie-in for the fact that within some
security people, they don't have the most positive impression of DCHECKs, shall
we say? So a couple examples of this, for listeners who maybe aren't familiar
with this, is one person previously on security saying DCHECKs are pronounced
as "code that's not tested". Someone else I told about this episode - I said,
we're going to talk about DCHECKs - they immediately said, is it going to be
about why DCHECKs are bad? So amongst the Chrome security folks, they are not a
huge fan of DCHECKs. Can you tell us maybe why that is?
09:51 PETER: So if we go back a little bit in time, it used to be that DCHECKs
were only built for developers if they do a debug build. And Chrome has gotten
so big that you don't want to do a debug build or the UI is incredibly slow.
Unfortunately, it's sort of not that great an experience to work in a debug
build. So people work in a release build. That doesn't mean that they don't
care about the things they put under DCHECK. It just means they want to go on
with their lives and not wait x minutes for the browser to launch, or however
bad it is nowadays. And that means that they, unfortunately, lose coverage for
the DCHECKs. So this means that if your code is not exercised well under tests,
then this is completely not enforced. But it's slightly better than a comment,
in that you're really expecting this thing to hold true, and that's clearly an
expectation. But how good is the expectation if you don't look at it? So last
year, I believe, we made it so that DCHECKs are on by default if you're not
doing an official build. And this included release builds. So now, it's like at
least if you're doing development and you hit this condition, it's going to
explode, which is really good, because then you can find a lot of issues, and
we can prevent a lot of issues from ever happening in the first place. It is
really hard for you, as a developer, to make the assumption that if this
invariant is ever false, I will find it during development, and it will never
happen in the wild. And DCHECKs are essentially either, I will find this
locally before I submit it, or all bets are off; or it is I don't care that
much if this thing doesn't hold true, which is sort of a weird assertion to
make. So I think we're in this little awkward in-between state. And this
in-between state, remember, mostly exists as a performance optimization from
when CHECKs used to be a lot more expensive, in terms of code size. So did I
cover most of this?
12:06 SHARON: Yeah. I think, based on that, I think it's pretty easy to see why
people who are more concerned about security are not a fan of this.
12:13 PETER: I mean, if you care about it, especially if it causes privacy or
security or user-harm sort of things, just CHECK. Just CHECK, right? If it
makes your code animate a thing slightly weirder, like it will just jump to the
end position instead of going through your fence load, whatever. Maybe you can
make that a DCHECK. Maybe it doesn't matter. Like it's wrong, but it's not that
bad. But most of the cases, you DCHECK something, where it's like the program
is going to be in some indeterminate state, and we actually care about if it's
ever false. So maybe we can afford to make it a CHECK. Maybe we should look
more about our sort of vector pushbacks than we should look at our CHECKs, and
then just have more CHECKs. More CHECKs. Because it's also like when things
break, it's a lot cheaper to debug a DCHECK than your program is in some
indeterminate state, because it was allowed to pass through a DCHECK that you
thought was - and when you read the code, unless you're used to reading it as
DCHECKs - oh, that just didn't get enforced - it's sort of hard to try to
figure out why the thing was doing the wrong thing in the first place.
13:22 SHARON: OK. How is this as a summary? When in doubt, CHECK it out.
13:27 PETER: I like that. I like that. And you might get pushback by reviewers,
who aren't on my side of the fence yet. And then you can decide on which hill
you want to die on, at least until we've made policy to just not complain about
DCHECKs, or not complain about CHECKs.
13:45 SHARON: All right. That sounds good. So you mentioned stuff failing in
the wild. And for people who might not know, do you want to just briefly
explain what failing in the wild means?
13:54 PETER: OK. So there's two things. Just failing in the wild just means
that when this thing rolls out to Canary, Dev, Beta, Stable, if you have a
CHECK that will crash and generate a crash report as if you had a memory bug,
but it crashes in a deterministic way, at a deterministic spot - so you can
find out exactly what assumption was violated. Say that this should never be
called with a null pointer. Then you can say, look at this line where it
crashed. It clearly got hit with a null pointer. And then you can try to figure
out, from the stack, why that happened, rather than after you post this pointer
to a task, it crashes somewhere completely irrelevant from the actual call
site. Well, so in the wild specifically means it generates a crash report so
you can look at it, or in the wild means it crashes at a user computer rather
than - in the wildness outside of development. And as for the other part of in
the wild, it's that we have started running non-crashy DCHECKs for a percentage
of Windows Canary. And we're looking to expand that. And we're gathering
information, basically, about which assertions or invariants that we have are
violated in practice in the wild, even though we don't think that they should
be. And that will sort of also culturally move the needle so that we do care
about DCHECKs. And when we care about DCHECKs, sort of similarly to how we care
about CHECKs, is it really that important to make the big distinction between
the two? Except for the case where you have really expensive DCHECKs, they
might still be worth keeping separate. And those will be things like, if you do
things for - say that you zero out memory or something for every memory block
that you allocate and free, or you do things for every audio sample, or for
every video frame pixel, those sort of things. And then we can sort of keep
expensive stuff gated out from CHECKs. And then maybe we don't need this
in-between where people don't know whether they can trust a DCHECK or not.
16:04 SHARON: So you mentioned that certain release builds now have DCHECKs
enabled. So for those in the wild versus regular CHECKs in the wild, if those
happen to fail, do the reports for those look the same? Are they in the same
place? Can they be treated the same?
16:20 PETER: Yeah. Well, they are uploaded to the same crash-reporting thing.
They show up under a special branch. And you likely will get bugs filed to you
if they hit very frequently, just like you would with crashes. There's a sort
of slight difference, in that they say dump without crashing. And that's just
sort of a rollout strategy for us. Because if we made DCHECK builds incredibly
crashy, because they hit more than CHECKs, then we can never roll this thing
out. Or it gets a lot scarier for us to put this on 5% of a new platform that
we haven't tested. But as it is right now, the first DCHECK that gets hit for
every process gets a crash dump uploaded.
17:07 SHARON: OK. So I've been definitely told to use dump without crashing at
certain points in CLs, where it's like, OK, we think that this shouldn't
happen. But if it does, we don't necessarily want to crash the browser because
of it. With the changes you've mentioned to DCHECKs happening, should those
just be CHECKs instead now or should those still be dump without crashing?
17:29 PETER: So if you want dump without crashing, and you made those a DCHECK,
then you would only have coverage in the Canary channels that we are testing.
Right? So if you want to get dump reports from the platforms that we're not
currently testing, including all the way up to Stable, you probably still want
to keep that a dump without crashing. You want to make sure that you're not
using the sort of - you want to make sure that you triage these, because you
don't want to keep these generating crash dumps n forever. You should still
treat them as if they were crashes. And I think the same thing should hold true
for DCHECKs. You should only add them for an invariant that you care about
being violated, right? So as it is violated, you should either figure out why
your invariant was wrong, or you should try to fix the breakage. And you can
probably add more information to logging to figure out why that happened.
18:41 SHARON: So when you have a CHECK, and it crashes in the wild, you get a
stack trace. And that's what you have to work on to figure out what went wrong
for debugging. Right? So what are some things that you can do, as a developer,
to make these CHECKs a bit more useful for you - ways to incorporate other
information that you can use to help yourself debug?
19:01 PETER: So some of the stuff that we have is we have something called
crash keys, which are essentially, you can write a piece of string data,
essentially - there's probably some other data types - and if you write those
before you're running dump without crashing, or before you hit a CHECK, or
before you hit a DCHECK, then those will be uploaded along the crash dump. And
if you talk to someone who knows where to find them, you can basically go in
under a crash report, and then under field product data, or something like
that, you should be able to find your key-value pair. And if you have
information in there, you'll be able to look at it. The other thing that I like
to do, which is probably the more obvious thing, is if you have somewhat of a
hypothesis that this thing should only fail if a or b or c is not true, then
you can add CHECKs for those. Like, if a CHECK is failing, you can add more
CHECKs to see why the CHECK was failing. In general, you're not going to get as
much out of a mini-dump that you want. You're not going to have the full heap
available to you, because that would be a mega-dump. You can usually find
whatever is on the stack if you go in with a debugger. And I know that you
wanted to lead me into talking about CHECK\_GT and CHECK\_EQ, which are
essentially, if you want to check that x is greater than y, then you should use
CHECK\_GT(x,y). The problem with those, in this sort of context, is that,
similarly to CHECKs - so CHECK\_GT gets compiled into, basically, if not x is
greater than y, crash. So unfortunately, the values of x and y are optimized
out when you're doing an official build.
21:02 SHARON: So this makes me think of some stuff we mentioned in the last
episode, which was with Dana. Check it out if you haven't. But one of the types
we mentioned there was SafeRef, which enforces a certain condition. And if that
fails - so in the case of a SafeRef, it ensures that the value you have there
is not null. And if that's ever not true, then you do get a crash similar to if
a CHECK fails. So in general, would you say it's better practice to enforce and
make sure your assumptions are held in these other, more structural ways than
relying on CHECKs instead?
21:41 PETER: So let me see if I can get at what you actually want out of that
one. So if we look at - there's a RawRef type, right? So what's good with the
RawRef is that you have a type that annotates that this thing cannot possibly
be null. So if you assign to it, and you're assigning a null pointer, your
program is going to crash, and you don't need to think about whether you throw
a null pointer in or not. If you keep passing a RawRef around, then that's
essentially you passing around a non-null pointer. And therefore, you don't
have to check that it's not null pointer in every step of the way. You only
need to do it when you're - I mean, the type will do it for you, but it only
needs to happen when you're converting from a pointer to a ref, essentially, or
a RawRef. And what's so good about that is now you have the - previously, you
might just CHECK that this isn't called with null pointer or whatever. But then
you would do that for four or five arguments. And you'd be like, null pointer
CHECKs are this part of the function body. And then it just gets super-noisy.
But if you're using the RawRef types, then the semantics of the type will
enforce that for you. And you don't have to think about that when reading the
code, because usually when you read the code, you're going to be like, it's a
pointer. Can it be null or not? What does it point to? And this thing will at
least tell you, it can't be null. And you still have the question of, what does
it point to? And that's fine. So I like enforcing this through types more than
checking those assumptions, and then checking inside of what happens. If you
were assigned to this RawRef, then it's going to crash in the constructor if
you have a null pointer. And then based on that stack trace, if we have good
stack data, you're going to know at what line you created the RawRef. And
therefore, it's equivalent to checking for not null pointer, because you can
trust the type to do the checking. And since I know Dana made this, I can
probably with 200% certainty say that it's a CHECK and not a DCHECK. But we do
have a couple of other places where you have a WeakPtr that shouldn't be
dereferenced on the wrong sequence. And those are complicated words. And that,
unfortunately, is a DCHECK. So we're hitting some sort of - I don't know if
that CHECK is actually expensive, or if it should be a CHECK, or if it could be
a CHECK. I think, especially, if you're in core types, the size overhead of
adding a CHECK is negligible, because all of the users of it benefit from that
CHECK. So unless it's incredibly -
24:28 SHARON: What do you mean by core types?
24:30 PETER: Say that you make a `scoped_refptr` something, that ref pointer is
used everywhere. So if you CHECKed in the destructor, then you're validating
all of the clients of your scope ref pointer. So for one CHECK, you get the
price of a lot of CHECKing. Whereas if in your client code you're validating
some parameters of an API call that only gets called once, then that's one
CHECK you add for one case. But if you're re-use, then your CHECK gets a lot
more value. And it's also easier to get parameters wrong sometimes if you have
500 clients that are calling your API. You can't trust all of them to get it
right. Whereas if you're just developing your feature, and it's only used by
your feature, then you can be a little bit more certain with how it's being
called. I would say, still add CHECKs, because code evolves over time. It's
sort of like how you can add unit tests to make sure that no one breaks your
code in the future. If you add CHECKs, then no one can break your code in the
future.
25:37 SHARON: Mm-hmm. OK. So you mentioned a few things about how CHECKs and
DCHECKs are changing. [AUDIO OUT] what is currently in the works, and what is
the long-term goal and plan for CHECKs and DCHECKs.
25:53 PETER: So currently what's in the work is we've made sure that some
libraries that we use, like Abseil and WebRTC, which is a first-party
third-party library, that they both use Chrome's crashing report system, which
means that you get more predictable crash stacks because it's using the
immediate crash macro. But also, you get the fatal logging field that I talked
about. That gets logged as part of crash dumps. So you hopefully have more
glanceable, actionable crash reports whenever a CHECK is violated inside of
Abseil, or in WebRTC, as it were. And then upcoming is we want to make sure
that we keep an eye out for our DCHECKs on other platforms, such as Mac. I know
that there's some issues with getting that fatal log field in the GPU process,
and I'm working on fixing that as well. So hopefully, it just means more
reports for the things you care about and easier to action on reports. That's
what we're hoping.
27:03 SHARON: If people think that this sounds really cool, want to have some
more involvement, or want to ask more questions, what's a good place for them
to do that?
27:11 PETER: I like Slack as a thing for this. So the #cxx channel on Slack,
the #base channel on Slack, the #halp channel on Slack is really good. #halp is
really, I think, unintimidating. You can just throw whatever question you have
in there, and I happen to be around there. If you can find out what my last
name is through sheer force of will, you can send me an email to my Chromium
username. What else would we have? I think if they want to get involved, just
add CHECKs to your code. That's a really good way to do it. Just make sure that
your code does what you expect it to in more cases.
27:48 SHARON: Maybe if you have a CL, and you're just doing some drive-by
cleanup, you can turn some DCHECKs into CHECKs also?
27:56 PETER: If your reviewer is cool with that, I'm cool with that. Otherwise,
you can just try to hope for us making that policy that we use CHECKs - if it's
something we care about, we use a CHECK instead of a DCHECK, unless we have a
really good reason to use a DCHECK. And that would be performance.
28:15 SHARON: That sounds good. And one last question is, what do you want
people to take away as their main takeaway from this discussion?
28:26 PETER: I think validating code assumptions is really valuable. So you
think that you're pretty smart when you're writing something, or you remember -
I mean, you're sometimes kind of smart when you're writing something. And
you're like, this can't possibly be wrong. And in practice, looking at crash
reports, these things are wrong all the time. So please validate any
assumptions that you make. It's also, I would say, better than a comment,
because it's a comment that doesn't get outdated without you noticing it. So, I
think, validate your assumptions to make sure that your code is more robust.
And validate properties you care about. And don't be afraid to use CHECKs.
29:13 SHARON: All right. That sounds like a good summary. Thank you very much
for being here, Peter. It was great to learn about DCHECKs.
29:18 PETER: Yeah. Thanks for having me.
29:24 SHARON: Action. Hello.
29:26 PETER: Oh. Take four.
29:29 SHARON: [LAUGHS] Take four. And action.

@ -0,0 +1,488 @@
# Whats Up With //content
This is a transcript of [What's Up With
That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)
Episode 3, a 2022 video discussion between [Sharon (yangsharon@chromium.org)
and John (jam@chromium.org)](https://www.youtube.com/watch?v=SD3cjzZl25I).
The transcript was automatically generated by speech-to-text software. It may
contain minor errors.
---
What lives in the content directory? What is the content layer? How does it fit
into Chrome and the web at large? Here to answer all that and more is todays
special guest, John, who not only is a Content owner, but actually split the
codebase to create the Content layer.
Notes:
- https://docs.google.com/document/d/1EJnG5gK8rQwHkdZTKl8vIwx9oScP8TaKBgwzBafIh9M/edit
Links:
- [//content/README.md](https://crsrc.org/c/content/README.md)
- [//content/public/README.md](https://crsrc.org/c/content/public/README.md)
- [What's Up With Pointers](https://www.youtube.com/watch?v=MpwbWSEDfjM)
---
00:00 SHARON: Hello, and welcome to "What's Up with That", the series that
demystifies all things Chrome. I'm your host, Sharon, and today, we're talking
about content. What lives in the content directory? What is the content layer?
How does it fit into Chrome and the web at large? Here to answer all of that
and more is today's special guest, John. He's not only a content owner, but
actually split the code base to create the content layer. Since then, a theme
of his work has been Chrome's architecture, and how to make it usable by
others. He's been involved far and wide across Chrome, but today, we're
focusing on content. John, welcome to the program.
00:33 JOHN: Hi, everyone, and thanks for setting this up, Sharon. My name's
John, and I'm happy to try to shed some light and history on this part of the
Chrome codebase. I've had the pleasure of working on a lot of different parts
of Chrome over a number of years I've worked on it. A theme of my work has been
on the architecture of Chrome and making it reusable by other products. And one
of the projects has been splitting up the codebase and helping create this
content layer.
01:02 SHARON: So, can you tell us what the content layer is? Because content is
a very overloaded term, and we're going to say it a lot today. So you mentioned
the content layer. Can you tell us what that is?
01:10 JOHN: Yes. The content layer is a part of the Chrome codebase that's
responsible for the multiprocess sandbox implementation of our platform.
01:24 SHARON: And another term that I had heard a lot tossed around before I
really understood what was going on was the content public API. So is that the
same as the content layer, or is that different?
01:36 JOHN: It's part of it. So the content component is very large, and so,
we've surrounded it by this small public API. So that you hide the
implementation details and the private directories, and then, embedders just
only have access to a small public layer.
01:56 SHARON: How did we end up with this content layer? Can you give us a bit
of history of how we came up with it? And also, maybe why it's called content?
02:02 JOHN: Sure. The history is - in the beginning, Chrome, like all software
projects begins nice and easy to understand. But over time, as you add a lot
more features to go from zero users to billions of users, it becomes harder to
understand. Small files, small classes become much larger small functions kind
of get numerous hooks to talk to every feature, because they want to know when
something happens. And so, this idea started that let's separate the product.
Things that make Google Chrome what it is from the platform, which is what any
browser, any minimal browser doing the latest HTML specs would need to
implement them in a sandbox, a multiprocess way. And so, content was the lower
part, and that's how it started.
02:58 SHARON: How did we get the name content?
02:58 JOHN: The name is like a pun. And when we started Chrome, one of the
ideas was, we'll focus on content and not Chrome, and so, the browser will get
out of the way. Chrome is a term used to refer to all the user interface parts
of the browser. And so, we said, it's going to be content and not Chrome. And
so, when you open Chrome, you just see a very small UI. Most of what you see is
the content. And so, when we split the directory, it was originally called
Source Chrome, and so, the content part, that's the pun. That's where it came
from.
03:34 SHARON: That's fun. Earlier, you mentioned embedders of content. Can you
tell us what an embedder of content is? And this is part of why I was very
excited about this episode, because I was working on a team where we were
embedders of content for a long time. Well over a year, and it took me a long
time to really understand what that was. Because, as you mentioned now,
Chrome's grown a lot. You work on a very specific thing understanding these
more general concepts of what is content? What is a content embedder are less
important to what you do day-to-day. But can you tell us what an embedder of
content is?
04:13 JOHN: Sure. An embedder of content is simply anybody who chooses to use
that code to build a browser on top of it. And so, in the beginning, right when
we did this, the goal was just to have one embedder. Or not the goal, what we
had was just one embedder. It was Chrome. But then, right away, we were like,
you know what? It would be nice for people who work on content and not the
feature part to build a smaller binary. It builds faster. It debugs faster,
runs faster. And so, we built this minimal example also to other people called
content shell. And then, we started running tests against that, and that was
the first - or the second embedder of content. And then since then, what was
unexpected, what we started for code health reasons turned out to be very
useful for other projects to restart - or start building their browser from.
And so, things like Android webview, which was using its own fork of web kit,
then started using content. That was one first-party example. But then, other
projects came along. Things like Electron and content-embedded framework, all
started building not just products on top of it, but other frameworks.
05:30 SHARON: That was really surprising to learn about, because it seems
unsurprising that you would build another browser based on Chromium. And people
have heard about this when Edge switched over to Chromium. But to learn that
things like Electron are built around content seem really surprising, because
that's very different from what a browser is.
05:52 JOHN: But they have common needs. They have some HTML data, and they want
to render it and do so in a safe, and stable, and secure way. And that's not
their value add, working on that code. So it's better for them to use something
else.
06:11 SHARON: That makes sense. You also mentioned that Chrome is dependent on
content. And when I first started working on Chrome as an intern, I had - it
told to me so many times because I couldn't remember that Chrome can depend on
content, but not the other way around. So can you tell us a bit about this
layering, and why it's there?
06:31 JOHN: I should also start by saying, content is not just - when we say
content, often what we mean, you embed content. You embed content in everything
that sits below it in the layer tree. So that includes things like Blink, our
rendering engine. V8, our JavaScript Engine. Net, our networking library, and
so on. And there's also you can talk to the content public APIs, but also,
sometimes, you talk to the Blink API and the files, and V8, and so on.
07:07 SHARON: So you have this many layer API or product? And, at the bottom,
we have things like Net, Blink, and those probably have dependencies on them
that I don't know about. And on top of that, we have content, and then, on top
of that, we have Chrome?
07:23 JOHN: Right. And so, Chrome as an embedder content can include directory
in the content public API. But since content can have multiple embedders, it
can't include Chrome. If content reached out directly to Chrome, then other
people wouldn't be able to use it. Because if you try to bring in this code, it
includes files from a directory that you're not using. So, instead, the content
public API, it has APIs going two different directions. One direction is going
into content, and then, one direction are these abstract interfaces that go out
from content. And any embedder has to implement them. And so, these usually end
up in terms like client or delegate. And these are implemented by Chrome, and
that's how content is able to call back to it. But then, any other, of course,
product or embedder can also implement these same interfaces.
08:23 SHARON: You mentioned link and also some things called delegate and
whatever. So we have a lot of things called something something host in
content. Can you talk a bit about what the relationship between content and
Blink is? Because there's a lot of mirroring in terms of how they might be set
up, and how they relate to each other.
08:37 JOHN: So Blink was the rendering engine that originally started as Web
Kit. And we forked, and we named it Blink a number of years ago. And that did
not have any concept of processes. So it was something that you call it in one
process, and it does its job. And you give it whatever data it needs, and it
gives you back the rendered data. And you can poke at it or whatever you want
to do with it. But you needed to wrap that with some - you needed a bunch of
code around it to make it multi-process. And also, to figure out when it needs
something that's not available in the sandbox that it runs in, you have to
provide that data. And so, this is where the content layer comes in. It's the
one that wraps the rendering engine and uses the networking library and other
things to be able to create a fully working browser.
09:33 SHARON: More about processes. So it's easy to think, maybe, that the
content - the relationship between the content layer and the browser process.
So can you just talk a bit about how processes work in content? And what the
content API provides in terms of accessing these processes?
09:54 JOHN: So the content code runs in - it's the initial process that runs.
Content starts up, and then - and so, it's in the browser process. But it also
creates the render processes for where Blink runs. It creates a GPU process
that talks to the GPU and where a bunch of the compositing happens. It creates
a network process where we do networking. It creates other processes, things
like audio on some platforms, storage process to isolate storage. And then, a
lot of short lived processes for security and stability reasons. And so, you
can have processes that run content code, but, sometimes, an embedder wants to
run its own code in a different process. So it could re-use the same helpers
that content has for creating a process, and we'll use that. And then, I think
I didn't fully answer your previous question yet, which was the host part. So,
often, you'll have classes in Blink that are running in the renderer process,
and you need an equivalent class to drive it from the browser process. And
that's where we often have the host suffix. So it'd be like a class for -
11:11 SHARON: Can you give an example of -
11:11 JOHN: Yes. So, for example, every renderer process has a class in content
browser called render process host. And then, every tab object in Blink will
have this class called render view, and then, in content browser, it will have
this class called render view host.
11:36 SHARON: Those are classes that, depending on what you work on, you might
see pop up quite a bit. And there's a lot of them. They're all called render
something host, and it's a bit tough to keep them straight. But that makes
sense as to why they're called render and - why render and host are in the
names for them. So you just listed a bunch of different process types. The GPU
process, the browser process, render processes. And, usually, whenever we have
different processes, we have some security boundary between them. Can you talk
a bit about how security and the content layer overlap? Is the content API a
security boundary? What happens if someone calls it maliciously? What could go
wrong if they do and do it successfully?
12:26 JOHN: So the security boundaries in any browser built on top of content
is the processes. We separate things to not just have render processes per tab,
but there are multiple render processes per tab thanks to the amazing work of
the Site Isolation project. And that's what split up different iframes into
different processes. And so, how they talk, all these processes talk through
IPC, and our current IPC system's called Mojo. And so, any time you talk, you
use Mojo between processes. You're usually talking from between processes of
different privileges. And so, one could be sandboxed and the other one not
sandboxed. Or one could be sandboxed, and the other one only partially
sandboxed. So you have to scrutinize any time you use these Mojo calls to make
sure that they can't inadvertently lead to a security vulnerability. Now, even
those, as hard as you can, people could still misuse code. Or, also, embedders
like Chrome or other content embedders can add their own IPCs. So content
obviously doesn't know about the IPCs from other layers, and so, it's possible
that it could be an embedder of content that has security vulnerability in
their own Mojo calls. And so, content doesn't know about them, so it can't do
anything about them. You could write insecure code in content. You can also
write in secure code in an embedder, and if someone finds a vulnerability - so
let's say someone finds a vulnerability in Blink, and maybe they're only
running their code in a minimal content shell. Maybe they can't find any other
Mojo calls that they can abuse to be able to get access to the browser process.
But maybe someone else, an embedder, is a more full-featured browser. It has
more IPC service, and that could be more of an attack surface for that - to
start with that Blink vulnerability and then to hop into the browser process.
14:38 SHARON: And if you gain control of the browser process, that's a very
highly privileged process.
14:44 JOHN: Because that has full access to your system. So that's the point
where you can leave persistent changes to the user system, which is pretty bad.
14:55 SHARON: That sounds not great. So if you're an average, say, Chrome
engineer, that could be anyone. This is probably not too much of a concern. All
the stuff we mentioned, this is good to know. How would a Chrome engineer who
doesn't directly work on content or in the content directory interact with the
content layer?
15:20 JOHN: Well, they might need a signal from Blink, for example. That's
often how someone will do that. They'll be working on a feature in the browser,
and everything works great. But then, they'll be like, I just need something
from Blink. But it's not there. And so, sometimes, they'll have to add an IPC
between processes, and that might interact. They'll be like, how do I get it?
It's in Blink. It's in the render view class. so I need an interface that talks
between each render view host and each render view. And that's how they might
get - well, that would be how they get interaction with the multiprocessor part
of it. But if someone is just working on something only in a browser process,
they might still be trying to get information about the current tab. And that's
represented by a web content's class and content. So they'll look in content
public browser, and they'll see web contents. And there will be a lot of
interfaces that hang off it. So they'll be looking at it, going through a trail
of interfaces and classes to be able to get more information on what's going on
in the current tab.
16:29 SHARON: Can you give us a quick overview of the Web Content class?
Because it is one, massive, and two, called something like web contents. Which
suggests it's important because content plus the web, and it's also something
you see all over the place. So can you just give us a quick overview of what
that class does? What it's for? What it represents?
16:46 JOHN: Yes. Things now are a lot more complicated than before, but if you
go back in a time machine and see how these things started, you can roughly
think in initial Chrome. Every tab had a class to represent the content in that
tab, and that was called web contents. And then, it was called web contents
because we had other classes. We used to be able to put native stuff in a tab.
And so, that would be called tab contents. But that's gone now, and we just
have web contents. So that's where the name comes from. And then even, for
example, there was render process host, which I mentioned earlier. And then,
each tab, each web contents roughly translate into one render process. And so,
now, it's a bit more complicated. There are examples where you can have web
contents inside of web contents, and that's more esoteric that most people
don't have to deal with. And then, so that's what web contents is for. It will
do things like take input and feed it to the page. Every time there's a
permission prompt, you usually go through that. If a page wants to access to a
microphone, or video, and so on. It keeps track of this navigation going on.
What's the current URL? What's the pending URL? It uses other classes to drive
all that stuff as you send out the network request and get it back. And that's
not inside of web contents itself, but it's driven by other helper classes.
18:28 SHARON: I tend to think of content as being the home of navigation, which
I think is a decent way to think about it and also is maybe biased because of
the stuff I've been working on. But you have Chrome, and navigation, and
content, and all the stuff here. And then, separately, you have the actual web,
the internet. And that has things like actual websites. And there are web
standards, and there's things like HTML. And these two things somehow have to
intersect. But being on the Chrome side, working on Chrome, apart from writing
some browser tests, maybe, you never really interact with any of the more web
things. JavaScript, you don't really touch. That's more Blink and HTML only in
a test kind of thing. So how do these web standards - there's navigation web
standards and all that. How do we actually make sure that they're implemented
in Chrome? And where does that happen?
19:32 JOHN: So that happens all over the code, but there's a few critical
directories. If you look at net at a low level, a lot of IETF - and some
aspects will be implemented there at that layer. Either net or in the network
service, which is a code that runs inside the network process. Then you've got
V8, of course, our JavaScript engine, and that has to follow the ECMAScript
standards. And then, there's a lot of the platform standards. Either some of
them only don't need multiple processes to be - to implement them, so they'll
just be completely inside Blink. But some of them require multiple processes,
things that need access to devices and so on. And so, that implementation will
be split across Blink and content browser. But then, how do you ensure that,
not only do you implement this correctly, but also that you don't regress it?
So there's a whole slew of tests. There's the Blink tests, which used to be
called the layout tests. And those run across the simple, simple test cases for
many features to make sure that each one works. And there's also this cool
thing where we share now a lot of these tests with other embedders, and that
way, you run the same test in every browser. And so, when you write a test, you
don't have to write it n times. You can just write it once. So that's how we
ensure that we meet the specs.
21:10 SHARON: That makes sense. Because I've been pointed - when I was looking
into a class. What does this do? I've been linked to, say, one of the HTML
specs or web specs. But the whole time, I'm just thinking, how do we make
sure - or who's checking that we're actually implementing this and correctly?
But these tests seem like a good way to do it and also ensure some level of
consistency across browsers. Assuming you know whether or not the browser you
use chooses to run these tests or not, I guess.
21:41 JOHN: And as an engineer on a project like that, the first time you'll
hit them is when you're breaking them. You'll make a change, and I think this
is fine. And then, you send it to the commit queue, and you break some layout
tests. What's happening to me today? And then, you have to drill into it. And
the nice thing about layout test is because each one is small, you - it's
faster to figure out what you broke because it's just like, hopefully, you only
broke a small number of tests.
22:06 SHARON: For sure, and it's a good example of why we have all these tests,
is to make sure things don't break. So that is pretty much all the questions I
have written down. Is there anything else generally content layer, content
public API-ish related that is interesting that maybe we didn't get a chance to
cover?
22:31 JOHN: Yes. The most common questions is people will be like, well, does
this belong in content or not? So I can have a chance to point people towards
their README files and content/README that describes what's supposed to go in
or not. And then, there's also a content/public/README that describes the
guidelines we have for the API to make it consistent.
22:59 SHARON: I've definitely seen those questions before. You're updating one
of the content public APIs. Does this belong? While we're here, can you give us
a quick breakdown heuristic of what things generally would belong in the
content public API versus you put it up for review, and the reviewer's like,
no. This does not belong in content public?
23:24 JOHN: So sometimes, for example, for convenience, maybe the Chrome layer
wants to call other parts of Chrome layer, but they don't have a direct
connection. Or maybe a Chrome layer wants to talk to a different component. And
so, they'll be like, we'll add something to the content API, and then, that
way, Chrome can talk to this other part of Chrome or this other component
through content as a shortcut. We don't allow that, and the reason for that is
anybody who's gone through the content public directory, it's already huge. And
so, we feel that if Chrome wants to talk to Chrome or to another layer, they
should have their own API to each other directly instead of hopping through
content. Just because the content API's already very large, very complex, hard
to understand. So we don't want to add things that are absolutely not necessary
to it. And another thing we try to do is to not add multiple ways of doing
something. We only add something to the content API when there's no other way
of getting this data from inside content, or there's no other way of getting
this data from them better to content. But if there's something similar that
can do the same thing, we push back on that.
24:39 SHARON: And also, test-only things? Are those generally OK, or do you
want to generally avoid those?
24:45 JOHN: Well, yes. test-only methods, we try really hard - not just for the
public API, but inside, because we don't want to bloat the binary. But we do
have content public tests, which is - gives you a lot more leeway to poke at
things in your browser test, for example, or your unit tests. Another thing is,
we also have guidelines for how the API should be. We don't have, really,
concrete classes. It's mostly abstract interfaces. And so, there's a bunch of
rules there, and they're all listed in content/public/README. Just so people
know the guidelines we have for interfaces there.
25:28 SHARON: On the Chrome binary point, how much is the size of the binary
dependent on the size of the content public API? Is that a big part of the
binary, or is it small enough where, sure, we want to keep it from being
unnecessarily large but not too much of an issue?
25:48 JOHN: The size is not going to come as much from the content/public API
but just from the entire content and all its dependencies. And those are in the
tens of megabytes. So, sometimes, for example, if you're bundling the content
layer, you're not going to be a small binary. You'll just start off in the 30
megabyte range or 40 megabyte range once you put everything together.
26:12 SHARON: And I guess that's something you have to be more conscious of if
you're working in content versus another directory even in Chrome. is that you
have to be wary of your dependencies more so than anywhere else. Not only for
Chrome, but also, any other embedders who might want to use content.
26:31 JOHN: Yes. And so, for example, if someone's trying to add something in
Chrome, we also ask, does this have to be in content? Of can this be part of
Chrome, so that not every embedder has to pay that cost if they don't need it?
Maybe we'll have an interface, and the embedder can plug the data in through
that way but still not have it in content. Another problem, of course, with
having data inside content is that not all embedders update at the same speed.
So if you're putting something in content, it can quickly go stale, the
content, whatever the data is if you're not updating quickly.
27:08 SHARON: That make sense. So we mentioned a bit of what content is, a bit
of the history of it. Can you tell us anything about what are upcoming changes
that might happen in content? What is the future of the content directory, the
layer, the API?
27:28 JOHN: Well, it's always changing. It's not static, driven by the needs of
the product. And so, you look at big changes happening today like MPArch to
support various use cases that we didn't have, or we never thought about
initially. And that's where the web contents, inside web content, some of that
comes in. There are big changes like banning, for example, pointers and
replacing them with a raw pointer. So we can try to address some of the
security problems we have with Use-After-Frees. So that's where, when you look
at the content code or the Chrome code in general, too, you might see a little
bit different than that average C++ project that you see. You'll be like, I'm
getting errors if I try to have a raw pointer, and that's why.
28:15 SHARON: Check out episode one for more on that. We'll link it below.
Anything else random content-related or otherwise you would like to share with
us?
28:27 JOHN: I think the only other thing I would add is familiarize yourself
with the READMEs in content/README and content/public/README before making
changes. That will make the author and reviewer's time more efficient. And if
you're working on content and below, you can build Content Shell instead of
Chrome. That would be faster to build and debug and hopefully make you more
productive.
28:52 SHARON: Good tips. Hopefully, our viewers follow them. They would never
try to change a content/public API without reading the READMEs first. Well,
thank you so much, John, for sitting down and chatting with me about content.
This was great, and, hopefully, people find it useful.
29:14 JOHN: And thank you for hosting me, Sharon.
29:23 SHARON: Did you start working on Chrome from the very start, or just -
obviously, pre-launch. Because, I think, based on your profile pictures, the
picture of that comic book that released when Chrome did - which I was lucky
enough to get a copy of when I was an intern. Shout-out Peter. But that
obviously suggests you were a major contributor before the public launch of
Chrome. So were you working on Chrome from the very beginning?
29:47 JOHN: I was not. It took about six months. I tried to join from the
beginning, but I couldn't join right at the beginning. So my sneaky way was I
found another project under that same director who was running Chrome, and
then, once that project finished in six months, then I jumped into Chrome.
30:09 SHARON: And do you ever think about how crazy it is from this thing that
you worked on, effectively, from the start before the public launch? To what it
is now where Chrome is one of the foundational pieces of the internet at large?
Any time the internet gets run period, probably something in Chrome is running
like the next stack, if not, obviously, the browser? Do you ever think about
that, and how crazy that is? And your place in that?
30:38 JOHN: Yes. It's amazing how far Chrome has come, and it's really humbling
to see it be the number one browser, the most widely-used browser. Because when
we were working on Chrome at the beginning, we were just trying to guess what
market share it would have. And people would be like, it'll be 10%, and we're
like, no way. Even the people working on it, we didn't think that was going to
be possible. So to see users really enjoy using it, and for us to keep
demonstrating value by sticking to our four principles, security and stability,
simplicity and speed. And seeing people not just adopt Chrome as a product, but
Chromium as a platform is - it's beyond our wildest dreams. And it's a
responsibility that we have every time we make a change to Chrome to all these
users and developers using it. You were asking earlier, how does it feel to be
here from the start? There's almost a sense of feeling super lucky. But also
this humbling feeling where we started in Chrome when it was really small, and
our knowledge built up incrementally as it got more complicated. But so, it's
like, well, what if I was to jump in Chrome today? It seems like way too many -
the code is so complicated now compared to before. This almost responsibility
we have as being in Chrome for a long time to share knowledge, to help people
pick it up. Because we would ourselves struggle if we were to jump in now.
32:22 SHARON: Yes. As those people, we certainly did struggle. But people are
pretty smart, I think, and they can figure it out. But that doesn't mean you
can't make it easier for the people in the future figuring it out. Or even
people who - you just work on a different part. If I were to do anything in
Blink, I'm just like -
32:44 JOHN: Same. I've been on it for a long time. I don't touch Blink.
32:50 SHARON: Yes. Yes.

@ -0,0 +1,968 @@
# Whats Up With Tests
This is a transcript of [What's Up With
That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)
Episode 4, a 2022 video discussion between [Sharon (yangsharon@chromium.org)
and Stephen
(smcgruer@chromium.org)](https://www.youtube.com/watch?v=KePsimOPSro).
The transcript was automatically generated by speech-to-text software. It may
contain minor errors.
---
Testing is important! What kinds of tests do we have in Chromium? What are they
all about? Join in as Stephen, who led Chrome's involvement in web platform
tests, tells us all about them.
Notes:
- https://docs.google.com/document/d/1SRoNMdPn78vwZVX7YzcdpF4cJdHTIV6JLGiVC2dJUaI/edit
---
00:00 SHARON: Hello, everyone, and welcome to "What's Up With That," the series
that demystifies all things Chrome. I'm your host, Sharon. And today we're
talking testing. Within Chrome, there are so many types of tests. What are they
all? What's the difference? What are the Chromium-specific quirks? Today's
guest is Stephen. He previously led Chrome's involvement in web platform tests.
Since then, he's worked on rendering, payments, and interoperability. As a fun
aside, he's one of the first people I met who worked on Chrome and is maybe
part of why I'm here today. So welcome, Stephen.
00:33 STEPHEN: Well, thank you very much for having me, Sharon, I'm excited to
be here.
00:33 SHARON: Yeah, I'm excited to have you here. So today, we're in for maybe
a longer episode. Testing is a huge topic, especially for something like
Chrome. So grab a snack, grab a drink, and let's start. We'll start with what
are all of the things that we have testing for in Chrome. What's the purpose of
all these tests we have?
00:51 STEPHEN: Yeah. It's a great question. It's also an interesting one
because I wanted to put one caveat on this whole episode, which is that there
is no right answer in testing. Testing, even in the literature, never mind in
Chromium itself, is not a solved problem. And so you'll hear a lot of different
opinions. People will have different thoughts. And I'm sure that no matter how
hard we try, by the end of this episode, our inbox will be filled with angry
emails from people being like, no, you are wrong. So all of the stuff we're
saying here today is my opinion, albeit I'll try and be as useful as possible.
But yeah, so why do we test was the question, right? So there's a lot of
different reasons that we write tests. Obviously, correctness is the big one.
You're writing some code, you're creating a feature, you want it to be correct.
Other reasons we write them, I mean, tests can be useful as a form of
documentation in itself. If you're ever looking at a class and you're like,
what does - why is this doing this, why is the code doing this, the test can
help inform that. They're also useful - I think a topic of this podcast is sort
of security. Tests can be very useful for security. Often when we have a
security bug, we go back and we write what are called regression tests, so at
least we try and never do that security failure again. And then there are other
reasons. We have tests for performance. We have tests for - our launch process
uses tests. There's lots and lots of reasons we have tests.
02:15 SHARON: Now that you've covered all of the different reasons why we test,
how do we do each of these types of tests in Chromium? What are the test types
we have?
02:27 STEPHEN: Yeah. So main test types we have in Chromium, unit tests,
browser tests, what we call web tests, and then there's a bunch of more
specialized ones, performance tests, testing on Android, and of course manual
testing.
02:43 SHARON: We will get into each of these types now, I guess. The first type
of test you mentioned is unit tests. Why don't you tell us a quick rundown of
what unit tests are. I'm sure most people have encountered them or heard of
them before. But just a quick refresher for those who might not.
02:55 STEPHEN: Yeah, absolutely. So as the name implies, a unit test is all
about testing a unit of code. And what that is not very well defined. But you
can usually think of it as just a class, a file, a small isolated component
that doesn't have to talk to all the other bits of the code to work. Really,
the goal is on writing something that's testing just the code under test - so
that new method you've added or whatever. And it should be quick and easy to
run.
03:22 SHARON: So on the screen now we have an example of a pretty typical unit
test we see in Chrome. So there's three parts here. Let's go through each of
them. So the first type - the first part of this is `TEST_P`. What is that
telling us?
03:38 STEPHEN: Yeah. So that is - in Chromium we use a unit testing framework
called Google test. It's very commonly used for C++. You'll see it all over the
place. You can go look up documentation. The test macros, that's what this is,
are essentially the hook into Google test to say, hey, the thing that's coming
here is a test. There's three types. There is just test, which it just says
here is a function. It is a test function. `TEST_F` says that you basically
have a wrapper class. It's often called a test fixture, which can do some
common setup across multiple different tests, common teardown, and that sort of
thing. And finally, `TEST_P` is what we call a parameterized test. And what
this means is that the test can take some input parameters, and it will run the
same test with each of those values. Very useful for things like when you want
to test a new flag. What happens if the flag is on or off?
04:34 SHARON: That's cool. And a lot of the things we're mentioning for unit
test also apply to browser test, which we'll cover next. But the
parameterization is an example of something that carries over to both. So
that's the first part. That's the `TEST_P`, the macro. What's the second part,
PendingBeaconHostTest? What is that?
04:54 STEPHEN: Yeah. So that is the fixture class, the test container class I
was talking about. So in this case, we're assuming that in order to write a
beacon test, whatever that is, they have some set up, some teardown they need
to do. They might want to encapsulate some common functionality. So all you
have to do to write one of these classes is, you declare a C++ class and you
subclass from the Google test class name.
05:23 SHARON: So this is a `TEST_P`, but you mentioned that this is a fixture.
So are fixture tests a subset of parameterized tests?
05:35 STEPHEN: Parameterized tests are a subset of fixture test, is that the
right way around to put it? All parameterized tests are fixtures tests. Yes.
05:41 SHARON: OK.
05:41 STEPHEN: You cannot have a parameterized test that does not have a
fixture class. And the reason for that is how Google test actually works under
the covers is it passes those parameters to your test class. You will have to
additionally extend from the `testing::WithParamInterface`. And that says, hey,
I'm going to take parameters.
06:04 SHARON: OK. But not all fixture tests are parameterized tests.
06:04 STEPHEN: Correct.
06:04 SHARON: OK. And the third part of this, SendOneOfBeacons. What is that?
06:10 STEPHEN: That is your test name. Whatever you want to call your test,
whatever you're testing, put it here. Again, naming tests is as hard as naming
anything. A lot of yak shaving, finding out what exactly you should call the
test. I particularly enjoy when you see test names that themselves have
underscores in them. It's great.
06:30 SHARON: Uh-huh. What do you mean by yak shaving?
06:35 STEPHEN: Oh, also known as painting a bike shed? Bike shed, is that the
right word? Anyway, generally speaking -
06:40 SHARON: Yeah, I've heard -
06:40 STEPHEN: arguing about pointless things because at the end of the day,
most of the time it doesn't matter what you call it.
06:46 SHARON: OK, yeah. So I've written this test. I've decided it's going to
be parameterized. I've come up with a test fixture for it. I have finally named
my test. How do I run my tests now?
06:57 STEPHEN: Yeah. So all of the tests in Chromium are built into different
test binaries. And these are usually named after the top level directory that
they're under. So we have `components_unittests`, `content_unittests`. I think
the Chrome one is just called `unit_tests` because it's special. We should
really rename that. But I'm going to assume a bunch of legacy things depend on
it. Once you have built whichever the appropriate binary is, you can just run
that from your `out` directory, so `out/release/components_unittests`, for
example. And then that, if you don't pass any flags, will run every single
components unit test. You probably don't want to do that. They're not that
slow, but they're not that fast. So there is a flag `--gtest_filter`, which
allows you to filter. And then it takes a test name after that. The format of
test names is always test class dot test name. So for example, here
PendingBeaconHostTest dot SendOneOfBeacons.
08:04 SHARON: Mm-hmm. And just a fun aside for that one, if you do have
parameterized tests, it'll have an extra slash and a number at the end. So
normally, whenever I use it, I just put a star before and after. And that
generally does - covers the cases.
08:17 STEPHEN: Yeah, absolutely.
08:23 SHARON: Cool. So with the actual test names, you will often see them
prefixed with either `MAYBE_` or `DISABLED_`, or before the test, there will be
an ifdef with usually a platform and then depending on the cases, it'll prefix
the test name with something. So I think it's pretty clear what these are
doing. Maybe is a bit less clear. Disabled pretty clear what that is. But can
you tell us a bit about these prefixes?
08:51 STEPHEN: Yeah, absolutely. So this is our way of trying to deal with that
dreaded thing in testing, flake. So when a test is flaky, when it doesn't
produce a consistent result, sometimes it fails. We have in Chromium a whole
continuous integration waterfall. That is a bunch of bots on different
platforms that are constantly building and running Chrome tests to make sure
that nothing breaks, that bad changes don't come in. And flaky tests make that
very hard. When something fails, was that a real failure? And so when a test is
particularly flaky and is causing sheriffs, the build sheriffs trouble, they
will come in and they will disable that test. Basically say, hey, sorry, but
this test is causing too much pain. Now, as you said, the `DISABLED_` prefix,
that's pretty obvious. If you put that in front of a test, Google test knows
about it and it says, nope, will not run this test. It will be compiled, but it
will not be run. `MAYBE_` doesn't actually mean anything. It has no meaning to
Google test. But that's where you'll see, as you said, you see these ifdefs.
And that's so that we can disable it on just one platform. So maybe your test
is flaky only on Mac OS, and you'll see basically, oh, if Mac OS, change the
name from maybe to disabled. Otherwise, define maybe as the normal test name.
10:14 SHARON: Makes sense. We'll cover flakiness a bit later. But yeah, that's
a huge problem. And we'll talk about that for sure. So these prefixes, the
parameterization and stuff, this applies to both unit and browser tests.
10:27 STEPHEN: Yeah.
10:27 SHARON: Right? OK. So what are browser tests? Chrome's a browser. Browser
test, seems like there's a relation.
10:34 STEPHEN: Yeah. They test the browser. Isn't it obvious? Yeah. Browser
tests are our version - our sort of version of an integration or a functional
test depending on how you look at things. What that really means is they're
testing larger chunks of the browser at once. They are integrating multiple
components. And this is somewhere that I think Chrome's a bit weird because in
many large projects, you can have an integration test that doesn't bring your
entire product up and in order to run. Unfortunately, or fortunately, I guess
it depends on your viewpoint, Chrome is so interconnected, it's so
interdependent, that more or less we have to bring up a huge chunk of the
browser in order to connect any components together. And so that's what browser
tests are. When you run one of these, there's a massive amount of machinery in
the background that goes ahead, and basically brings up the browser, and
actually runs it for some definition of what a browser is. And then you can
write a test that pokes at things within that running browser.
11:42 SHARON: Yeah. I think I've heard before multiple times is that browser
tests launch the whole browser. And that's -
11:47 STEPHEN: More or less true. It's - yeah.
11:47 SHARON: Yes. OK. Does that also mean that because you're running all this
stuff that all browser tests have fixtures? Is that the case?
11:59 STEPHEN: Yes, that is the case. Absolutely. So there is only - I think
it's - oh my goodness, probably on the screen here somewhere. But it's
`IN_PROC_BROWSER_TEST_F` and `IN_PROC_BROWSER_TEST_P`. There is no version that
doesn't have a fixture.
12:15 SHARON: And what does the in proc part of that macro mean?
12:15 STEPHEN: So that's, as far as I know - and I might get corrected on this.
I'll be interested to learn. But it refers to the fact that we've run these in
the same process. Normally, the whole Chromium is a multi-process architecture.
For the case of testing, we put that aside and just run everything in the same
process so that it doesn't leak, basically.
12:38 SHARON: Yeah. There's flags when you run them, like `--single-process`.
And then there's `--single-process-test`. And they do slightly different
things. But if you do run into that, probably you will be working with people
who can answer and explain the differences between those more. So something
that I've seen quite a bit in browser and unit tests, and only in these, are
run loops. Can you just briefly touch on what those are and what we use them
for in tests?
13:05 STEPHEN: Oh, yeah. That's a fun one. I think actually previous on an
episode of this very program, you and Dana talked a little bit around the fact
that Chrome is not a completely synchronous program, that we do we do task
splitting. We have a task scheduler. And so run loops are part of that,
basically. They're part of our stack for handling asynchronous tasks. And so
this comes up in testing because sometimes you might be testing something
that's not synchronous. It takes a callback, for example, rather than returning
a value. And so if you just wrote your test as normal, you call the function,
and you don't - you pass a callback, but then your test function ends. Your
test function ends before that callback ever runs. Run loop gives you the
ability to say, hey, put this callback into some controlled run loop. And then
after that, you can basically say, hey, wait on this run loop. I think it's
often called quit when idle, which basically says keep running until you have
no more tasks to run, including our callback, and then finish. They're
powerful. They're very useful, obviously, with asynchronous code. They're also
a source of a lot of flake and pain. So handle with care.
14:24 SHARON: Yeah. Something a tip is maybe using the `--gtest_repeat` flag.
So that one lets you run your test however number of times you've had to do it.
14:30 STEPHEN: Yeah.
14:36 SHARON: And that can help with testing for flakiness or if you're trying
to debug something flaky. In tests, we have a variety of macros that we use. In
the unit test and the browser tests, you see a lot of macros, like `EXPECT_EQ`,
`EXPECT_GT`. These seem like they're part of maybe Google test. Is that true?
14:54 STEPHEN: Yeah. They come from Google test itself. So they're not
technically Chromium-specific. But they basically come in two flavors. There's
the `EXPECT_SOMETHING` macros. And there's the `ASSERT_SOMETHING` macros. And
the biggest thing to know about them is that expect doesn't actually cause - it
causes a test to fail, but it doesn't stop the test from executing. The test
will continue to execute the rest of the code. Assert actually throws an
exception and stops the test right there. And so this can be useful, for
example, if you want to line up a bunch of expects. And your code still makes
sense. You're like, OK, I expect to return object, and it's got these fields.
And I'm just going to expect each one of the fields. That's probably fine to
do. And it may be nice to have output that's like, no, actually, both of these
fields are wrong. Assert is used when you're like, OK, if this fails, the rest
of the test makes no sense. Very common thing you'll see. Call an API, get back
some sort of pointer, hopefully a smart pointer, hey. And you're going to be
like, assert that this pointer is non-null because if this pointer is null,
everything else is just going to be useless.
15:57 SHARON: I think we see a lot more expects than asserts in general
anecdotally from looking at the test. Do you think, in your opinion, that
people should be using asserts more generously rather than expects, or do we
maybe want to see what happens - what does go wrong if things continue beyond a
certain point?
16:15 STEPHEN: Yeah. I mean, general guidance would be just keep using expect.
That's fine. It's also not a big deal if your test actually just crashes. It's
a test. It can crash. It's OK. So use expects. Use an assert if, like I said,
that the test doesn't make any sense. So most often if you're like, hey, is
this pointer null or not and I'm going to go do something with this pointer,
assert it there. That's probably the main time you'd use it.
16:45 SHARON: A lot of the browser test classes, like the fixture classes
themselves, are subclass from other base classes.
16:53 STEPHEN: Mm-hmm.
16:53 SHARON: Can you tell us about that?
16:53 STEPHEN: Yeah. So basically, we have one base class for browser tests. I
think its `BrowserTestBase`, I think it's literally called, which sits at the
bottom and does a lot of the very low level setup of bringing up a browser. But
as folks know, there's more than one browser in the Chromium project. There is
Chrome, the Chrome browser that is the more full-fledged version. But there's
also content shell, which people might have seen. It's built out of content.
It's very simple browser. And then there are other things. We have a headless
mode. There is a headless Chrome you can build which doesn't show any UI. You
can run it entirely from the command line.
17:32 SHARON: What's the difference between headless and content shell?
17:39 STEPHEN: So content shell does have a UI. If you run content shell, you
will actually see a little UI pop up. What content shell doesn't have is all of
those features from Chrome that make Chrome Chrome, if you will. So I mean,
everything from bookmarks, to integration with having an account profile, that
sort of stuff is not there. I don't think content shell even supports tabs. I
think it's just one page you get. It's almost entirely used for testing. But
then, headless, sorry, as I was saying, it's just literally there is no UI
rendered. It's just headless.
18:13 SHARON: That sounds like it would make -
18:13 STEPHEN: And so, yeah. And so - sorry.
18:13 SHARON: testing faster and easier. Go on.
18:18 STEPHEN: Yeah. That's a large part of the point, as well as when you want
to deploy a browser in an environment where you don't see the UI. So for
example, if you're running on a server or something like that. But yeah. So for
each of these, we then subclass that `BrowserTestBase` in order to provide
specific types. So there's content browser test. There's headless browser test.
And then of course, Chrome has to be special, and they called their version in
process browser test because it wasn't confusing enough. But again, it's sort
of straightforward. If you're in Chrome, `/chrome`, use
`in_process_browser_test`. If you're in `/content`, use `content_browsertest`.
It's pretty straightforward most of the time.
18:58 SHARON: That makes sense. Common functions you see overridden from those
base classes are these set up functions. So they're set, set up on main thread,
there seems to be a lot of different set up options. Is there anything we
should know about any of those?
19:13 STEPHEN: I don't think that - I mean, most of it's fairly
straightforward. I believe you should mostly be using setup on main thread. I
can't say that for sure. But generally speaking, setup on main thread, teardown
on main thread - or is it shutdown main thread? I can't remember - whichever
the one is for afterwards, are what you should be usually using in a browser
thread. You can also usually do most of your work in a constructor. That's
something that people often don't know about testing. I think it's something
that's changed over time. Even with unit tests, people use the setup function a
lot. You can just do it in the constructor a lot of the time. Most of
background initialization has already happened.
19:45 SHARON: I've definitely wondered that, especially when you have things in
the constructor as well as in a setup method. It's one of those things where
you just kind of think, I'm not going to touch this because eh, but -
19:57 STEPHEN: Yeah. There are some rough edges, I believe. Set up on main
thread, some things have been initialized that aren't around when your class is
being constructed. So it is fair. I'm not sure I have any great advice unless -
other than you may need to dig in if it happens.
20:19 SHARON: One last thing there. Which one gets run first, the setup
functions or the constructor?
20:19 STEPHEN: The constructor always happens first. You have to construct the
object before you can use it.
20:25 SHARON: Makes sense. This doesn't specifically relate to a browser test
or unit test, but it does seem like it's worth mentioning, which is the content
public test API. So if you want to learn more about content and content public,
check out episode three with John. But today we're talking about testing. So
we're talking about content public test. What is in that directory? And how
does that - how can people use what's in there?
20:48 STEPHEN: Yeah. It's basically just a bunch of useful helper functions and
classes for when you are doing mostly browser tests. So for example, there are
methods in there that will automatically handle navigating the browser to a URL
and actually waiting till it's finished loading. There are other methods for
essentially accessing the tab strip of a browser. So if you have multiple tabs
and you're testing some cross tab thing, methods in there to do that. I think
that's probably where the content browser test - like base class lives there as
well. So take a look at it. If you're doing something that you're like, someone
should write - it's the basic - it's the equivalent of base in many ways for
testing. It's like, if you're like, someone should have written a library
function for this, possibly someone has already. And you should take a look.
And if they haven't, you should write one.
21:43 SHARON: Yeah. I've definitely heard people, code reviewers, say when you
want to add something that seems a bit test only to content public, put that in
content public test because that doesn't get compiled into the actual release
binaries. So if things are a bit less than ideal there, it's a bit more
forgiving for a place for that.
22:02 STEPHEN: Yeah, absolutely. I mean, one of the big things about all of our
test code is that you can actually make it so that it's in many cases not
compiled into the binary. And that is both useful for binary size as well as
you said in case it's concerning. One thing you can do actually in test, by the
way, for code that you cannot avoid putting into the binary - so let's say
you've got a class, and for the reasons of testing it because you've not
written your class properly to do a dependency injection, you need to access a
member. You need to set a member. But you only want that to happen from test
code. No real code should ever do this. You can actually name methods blah,
blah, blah for test or for testing. And this doesn't have any - there's no code
impact to this. But we have pre-submits that actually go ahead and check, hey,
are you calling this from code that's not marked as test code? And it will then
refuse to - it will fail to pre-submit upload if that happens. So it could be
useful.
23:03 SHARON: And another thing that relates to that would be the friend test
or friend something macro that you see in classes. Is that a gtest thing also?
23:15 STEPHEN: It's not a gtest thing. It's just a C++ thing. So C++ has the
concept of friending another class. It's very cute. It basically just says,
this other class and I, we can access each other's internal states. Don't
worry, we're friends. Generally speaking, that's a bad idea. We write classes
for a reason to have encapsulation. The entire goal of a class is to
encapsulate behavior and to hide the implementation details that you don't want
to be exposed. But obviously, again, when you're writing tests, sometimes it is
the correct thing to do to poke a hole in the test and get at something. Very
much in the schools of thought here, some people would be like, you should be
doing dependency injection. Some people are like, no, just friend your class.
It's OK. If folks want to look up more, go look up the difference between open
box and closed box testing.
24:00 SHARON: For those of you who are like, oh, this sounds really cool, I
will learn more.
24:00 STEPHEN: Yeah, for my test nerds out there.
24:06 SHARON: [LAUGHS] Yeah, Stephen's got a club. Feel free to join.
24:06 STEPHEN: Yeah. [LAUGHTER]
24:11 SHARON: You get a card. Moving on to our next type of test, which is your
wheelhouse, which is web tests. This is something I don't know much about. So
tell us all about it.
24:22 STEPHEN: [LAUGHS] Yeah. This is my - this is where hopefully I'll shine.
It's the area I should know most about. But web tests are - they're an
interesting one. So I would describe them is our version of an end-to-end test
in that a web test really is just an HTML file, a JavaScript file that is when
you run it, you literally bring up - you'll remember I said that browser tests
are most of a whole browser. Web tests bring up a whole browser. It's just the
same browser as content shell or Chrome. And it runs that whole browser. And
the test does something, either in HTML or JavaScript, that then is asserted
and checked. And the reason I say that I would call them this, I have heard
people argue that they're technically unit tests, where the unit is the
JavaScript file and the entire browser is just, like, an abstraction that you
don't care about. I guess it's how you view them really. I view the browser as
something that is big and flaky, and therefore these are end-to-end tests. Some
people disagree.
25:22 SHARON: In our last episode, John touched on these tests and how that
they're - the scope and that each test covers is very small. But how you run
them is not. And I guess you can pick a side that you feel that you like more
and go with that. So what are examples of things we test with these kind of
tests?
25:49 STEPHEN: Yeah. So the two big categories of things that we test with web
tests are basically web APIs, so JavaScript APIs, provided by the browser to do
something. There are so many of those, everything from the fetch API for
fetching stuff to the web serial API for talking to devices over serial ports.
The web is huge. But anything you can talk to via JavaScript API, we call those
JavaScript tests. It's nice and straightforward. The other thing that web tests
usually encompass are what are called rendering tests or sometimes referred to
as ref tests for reference tests. And these are checking the actual, as the
first name implies, the rendering of some HTML, some CSS by the browser. The
reason they're called reference tests is that usually the way you do this to
check whether a rendering is correct is you set up your test, and then you
compare it to some image or some other reference rendering that you're like,
OK, this should look like that. If it does look like that, great. If it
doesn't, I failed.
26:54 SHARON: Ah-ha. And are these the same as - so there's a few other test
names that are all kind of similar. And as someone who doesn't work in them,
they all kind of blur together. So I've also heard web platform tests. I've
heard layout tests. I've heard Blink tests, all of which do - all of which are
JavaScript HTML-like and have some level of images in them. So are these all
the same thing? And if not, what's different?
27:19 STEPHEN: Yeah. So yes and no, I guess, is my answer. So a long time ago,
there were layout tests basically. And that was something we inherited from the
WebKit project when we forked there, when we forked Chromium from WebKit all
those years ago. And they're exactly what I've described. They were both
JavaScript-based tests and they were also HTML-based tests for just doing
reference renderings. However, web platform test came up as an external project
actually. Web platform test is not a Chromium project. It is external upstream.
You can find them on GitHub. And their goal was to create a set of - a test
suite shared between all browsers so that all browsers could test - run the
same tests and we could actually tell, hey, is the web interoperable? Does it
work the same way no matter what browser you're on? The answer is, no. But
we're trying. And so inside of Chromium we said, that's great. We love this
idea. And so what we did was we actually import web platform test into our
layout tests. So web platform test now becomes a subdirectory of layout tests.
OK?
28:30 SHARON: OK. [LAUGHS]
28:30 STEPHEN: To make things more confusing, we don't just import them, but we
also export them. We run a continuous two-way sync. And this means that
Chromium developers don't have to worry about that upstream web platform test
project most of the time. They just land their code in Chromium, and a magic
process happens, and it goes up into the GitHub project. So that's where we
were for many years - layout tests, which are a whole bunch of legacy tests,
and then also web platform tests. But fairly recently - and I say that knowing
that COVID means that might be anything within the last three years because who
knows where time went - we decided to rename layout test. And partly, the name
we chose was web tests. So now you have web tests, of which web platform tests
are a subset, or a - yeah, subset of web test. Easy.
29:20 SHARON: Cool.
29:20 STEPHEN: [LAUGHS]
29:20 SHARON: Cool. And what about Blink tests? Are those separate, or are
those these altogether?
29:27 STEPHEN: I mean, if they're talking about the JavaScript and HTML, that's
going to just be another name for the web tests. I find that term confusing
because there is also the Blink tests target, which builds the infrastructure
that is used to run web tests. So that's probably what you're referring, like
`blink_test`. It is the target that you build to run these tests.
29:50 SHARON: I see. So `blink_test` is a target. These other ones, web test
and web platform tests, are actual test suites.
29:57 STEPHEN: Correct. Yes. That's exactly right.
30:02 SHARON: OK. All right.
30:02 STEPHEN: Simple.
30:02 SHARON: Yeah. So easy. So you mentioned that the web platform tests are
cross-browser. But a lot of browsers are based on Chromium. Is it one of the
things where it's open source and stuff but majority of people contributing to
these and maintaining it are Chrome engineers?
30:23 STEPHEN: I must admit, I don't know what that stat is nowadays. Back when
I was working on interoperability, we did measure this. And it was certainly
the case that Chromium is a large project. There were a lot of tests being
contributed by Chromium developers. But we also saw historically - I would like
to recognize Mozilla, most of all, who were a huge contributor to the web
platform test project over the years and are probably the reason that it
succeeded. And we also - web platform test also has a fairly healthy community
of completely outside developers. So people that just want to come along. And
maybe they're not able to or willing to go into a browser, and actually build a
browser, and muck with code. But they could write a test for something. They
can find a broken behavior and be like, hey, there's a test here, Chrome and
Firefox do different things.
31:08 SHARON: What are examples of the interoperability things that you're
testing for in these cross-browser tests?
31:17 STEPHEN: Oh, wow, that's a big question. I mean, really everything and
anything. So on the ref test side, the rendering test, it actually does matter
that a web page renders the same in different browsers. And that is very hard
to achieve. It's hard to make two completely different engines render some HTML
and CSS exactly the same way. But it also matters. We often see bugs where you
have a lovely - you've got a lovely website. It's got this beautiful header at
the top and some content. And then on one browser, there's a two-pixel gap
here, and you can see the background, and it's not a great experience for your
users. So ref tests, for example, are used to try and track those down. And
then, on the JavaScript side, I mean really, web platform APIs are complicated.
They're very powerful. There's a reason they are in the browser and you cannot
do them in JavaScript. And that is because they are so powerful. So for
example, web USB to talk to USB devices, you can't just do that from
JavaScript. But because they're so powerful, because they're so complicated,
it's also fairly easy for two browsers to have slightly different behavior. And
again, it comes down to what is the web developer's experience. When I try and
use the web USB API, for example, am I going to have to write code that's like,
if Chrome, call it this way, if Fire - we don't want that. That is what we do
not want for the web. And so that's the goal.
32:46 SHARON: Yeah. What a team effort, making the whole web work is. All
right. That's cool. So in your time working on these web platform tests, do you
have any fun stories you'd like to share or any fun things that might be
interesting to know?
33:02 STEPHEN: Oh, wow. [LAUGHS] One thing I like to bring up - I'm afraid it's
not that fun, but I like to repeat it a lot of times because it's weird and
people get tripped up by it - is that inside of Chromium, we don't run web
platform tests using the Chrome browser. We run them using content shell. And
this is partially historical. That's how layout tests run. We always ran them
under content shell. And it's partially for I guess what I will call
feasibility. As I talked about earlier, content shell is much simpler than
Chrome. And that means that if you want to just run one test, it is faster, it
is more stable, it is more reliable I guess I would say, than trying to bring
up the behemoth that is Chrome and making sure everything goes correctly. And
this often trips people up because in the upstream world of this web platform
test project, they run the test using the proper Chrome binary. And so they're
different. And different things do happen. Sometimes it's rendering
differences. Sometimes it's because web APIs are not always implemented in both
Chrome and content shell. So yeah, fun fact.
34:19 SHARON: Oh, boy. [LAUGHTER]
34:19 STEPHEN: Oh, yeah.
34:19 SHARON: And we wonder why flakiness is a problem. Ah. [LAUGHS]
34:19 STEPHEN: Yeah. It's a really sort of fun but also scary fact that even if
we put aside web platform test and we just look at layout test, we don't test
what we ship. Layout test running content shell, and then we turn around and
we're like, here's a Chrome binary. Like uh, those are different. But, hey, we
do the best we can.
34:43 SHARON: Yeah. We're out here trying our best. So that all sounds very
cool. Let's move on to our next type of test, which is performance. You might
have heard the term telemetry thrown around. Can you tell us what telemetry is
and what these performance tests are?
34:54 STEPHEN: I mean, I can try. We've certainly gone straight from the thing
I know a lot about into the thing I know very little about. But -
35:05 SHARON: I mean, to Stephen's credit, this is a very hard episode to find
one single guest for. People who are working extensively usually in content
aren't working a ton in performance or web platform stuff. And there's no one
who is - just does testing and does every kind of testing. So we're trying our
best. [INAUDIBLE]
35:24 STEPHEN: Yeah, absolutely. You just need to find someone arrogant enough
that he's like, yeah, I'll talk about all of those. I don't need to know the
details. It's fine. But yeah, performance test, I mean, the name is self
explanatory. These are tests that are trying to ensure the performance of
Chromium. And this goes back to the four S's when we first started Chrome as a
project - speed, simplicity, security, and I've forgotten the fourth S now.
Speed, simplicity, security - OK, let's not reference the four S's then.
[LAUGHTER] You have the Comet. You tell me.
36:01 SHARON: Ah. Oh, I mean, I don't read it every day. Stability. Stability.
36:08 STEPHEN: Stability. God damn it. Let's literally what the rest of this is
about. OK, where were we?
36:13 SHARON: We're leaving this in, don't worry. [LAUGHTER]
36:19 STEPHEN: Yeah. So the basic idea of performance test is to test
performance because as much as you can view behavior as a correctness thing, in
Chromium we also consider performance a correctness thing. It is not a good
thing if a change lands and performance regresses. So obviously, testing
performance is also hard to do absolutely. There's a lot of noise in any sort
of performance testing. An so, we do it essentially heuristically,
probabilistically. We run whatever the tests are, which I'll talk about in a
second. And then we look at the results and we try and say, hey, OK, is there a
statistically significant difference here? And there's actually a whole
performance sheriffing rotation to try and track these down. But in terms of,
yeah, you mentioned telemetry. That weird word. You're like, what is a
telemetry test? Well, telemetry is the name of the framework that Chromium
uses. It's part of the wider catapult project, which is all about different
performance tools. And none of the names, as far as I know, mean anything.
They're just like, hey, catapult, that's a cool name. I'm sure someone will
explain to me now the entire history behind the name catapult and why it's
absolutely vital. But anyway, so telemetry basically is a framework that when
you give it some input, which I'll talk about in a second, it launches a
browser, performs some actions on a web page, and records metrics about those
actions. So the input, the test essentially, is basically a collection of go to
this web page, do these actions, record these metrics. And I believe in
telemetry that's called a story, the story of someone visiting a page, I guess,
is the idea. One important thing to know is that because it's sort of insane to
actually visit real websites, they keep doing things like changing - strange.
We actually cache the websites. We download a version of the websites once and
actually check that in. And when you go run a telemetry test, it's not running
against literally the real Reddit.com or something. It's running against a
version we saved at some point.
38:31 SHARON: And how often - so I haven't really heard of anyone who actually
works on this and that we can't - you don't interact with everyone. But how -
as new web features get added and things in the browser change, how often are
these tests specifically getting updated to reflect that?
38:44 STEPHEN: I would have to plead some ignorance there. It's certainly also
been my experience as a browser engineer who has worked on many web APIs that
I've never written a telemetry test myself. I've never seen one added. My
understanding is that they are - a lot of the use cases are fairly general with
the hope that if you land some performance problematic feature, it will regress
on some general test. And then we can be like, oh, you've regressed. Let's
figure out why. Let's dig in and debug. But it certainly might be the case if
you are working on some feature and you think that it might have performance
implications that aren't captured by those tests, there is an entire team that
works on the speed of Chromium. I cannot remember their email address right
now. But hopefully we will get that and put that somewhere below. But you can
certainly reach out to them and be like, hey, I think we should test the
performance of this. How do I go about and do that?
39:41 SHARON: Yeah. That sounds useful. I've definitely gotten bugs filed
against me for performance stuff. [LAUGHS] Cool. So that makes sense. Sounds
like good stuff. And in talking to some people in preparation for this episode,
I had a few people mention Android testing specifically. Not any of the other
platforms, just Android. So do you want to tell us why that might be? What are
they doing over there that warrants additional mention?
40:15 STEPHEN: Yeah. I mean, I think probably the answer would just be that
Android is such a huge part of our code base. Chrome is a browser, a
multi-platform browser, runs on multiple desktop platforms, but it also runs on
Android. And it runs on iOS. And so I assume that iOS has its own testing
framework. I must admit, I don't know much about that at all. But certainly on
Android, we have a significant amount of testing framework built up around it.
And so there's the option, the ability for you to test your Java code as well
as your C++ code.
40:44 SHARON: That makes sense. And yeah, with iOS, because they don't use
Blink, I guess there's - that reduces the amount of test that they might need
to add, whereas on Android they're still using Blink. But there's a lot of
differences because it is mobile, so they're just, OK, we actually can test
those things. So let's go more general now. At almost every stage, you've
mentioned flakiness. So let's briefly run down, what is flakiness in a test?
41:14 STEPHEN: Yes. So flakiness for a test is just - the definition is just
that the test does not consistently produce the same output. When you're
talking about flakiness, you actually don't care what the output is. A test
that always fails, that's fine. It always fails. But a test that passes 90% of
the time and fails 10%, that's not good. That test is not consistent. And it
will cause problems.
41:46 SHARON: What are common causes of this?
41:46 STEPHEN: I mean, part of the cause is, as I've said, we write a lot of
integration tests in Chromium. Whether those are browser tests, or whether
those are web tests, we write these massive tests that span huge stacks. And
what comes implicitly with that is timing. Timing is almost always the
problem - timing and asynchronicity. Whether that is in the same thread or
multiple threads, you write your test, you run it on your developer machine,
and it works. And you're like, cool, my test works. But what you don't realize
is that you're assuming that in some part of the browser, this function ran,
then this function run. And that always happens in your developer machine
because you have this CPU, and this much memory, and et cetera, et cetera. Then
you commit your code, you land your code, and somewhere a bot runs. And that
bot is slower than your machine. And on that bot, those two functions run in
the opposite order, and something goes horribly wrong.
42:50 SHARON: What can the typical Chrome engineer writing these tests do in
the face of this? What are some practices that you generally should avoid or
generally should try to do more often that will keep this from happening in
your test?
43:02 STEPHEN: Yeah. So first of all, write more unit tests, write less browser
tests, please. Unit tests are - as I've talked about, they're small. They're
compact. They focus just on the class that you're testing. And too often, in my
opinion - again, I'm sure we'll get some nice emails stating I'm wrong - but
too often, in my opinion people go straight to a browser test. And they bring
up a whole browser just to test functionality in their class. This sometimes
requires writing your class differently so that it can be tested by a unit
test. That's worth doing. Beyond that, though, when you are writing a browser
test or a web test, something that is more integration, more end to end, be
aware of where timing might be creeping in. So to give an example, in a browser
test, you often do things like start by loading some web contents. And then you
will try and poke at those web contents. Well, so one thing that people often
don't realize is that loading web contents, that's not a synchronous process.
Actually knowing when a page is finished loading is slightly difficult. It's
quite interesting. And so there are helper functions to try and let you wait
for this to happen, sort of event waiters. And you should - unfortunately, the
first part is you have to be aware of this, which is just hard to be. But the
second part is, once you are aware of where these can creep in, make sure
you're waiting for the right events. And make sure that once those events have
happened, you are in a state where the next call makes sense.
44:28 SHARON: That makes sense. You mentioned rewriting your classes so they're
more easily testable by a unit test. So what are common things you can do in
terms of how you write or structure your classes that make them more testable?
And just that seems like a general good software engineering practice to do.
44:50 STEPHEN: Yeah, absolutely. So one of the biggest ones I think we see in
Chromium is to not use singleton accessors to get at state. And what I mean by
that is, you'll see a lot of code in Chromium that just goes ahead and threw
some mechanism that says, hey, get the current web contents. And as you, I
think, you've talked about on this program before, web contents is this massive
class with all these methods. And so if you just go ahead and get the current
web contents and then go do stuff on that web contents, whatever, when it comes
to running a test, well, it's like, hold on. That's trying to fetch a real web
contents. But we're writing a unit test. What does that even look like? And so
the way around this is to do what we call dependency injection. And I'm sure as
I've said that word, a bunch of listeners or viewers have just recoiled in
fear. But we don't lean heavily into dependency injection in Chromium. But it
is useful for things like this. Instead of saying, go get the web contents,
pass a web contents into your class. Make a web contents available as an input.
And that means when you create the test, you can use a fake or a mock web
contents. We can talk about difference between fakes and mocks as well. And
then, instead of having it go do real things in real code, you can just be
like, no, no, no. I'm testing my class. When you call it web contents do a
thing, just return this value. I don't care about web contents. Someone else is
going to test that.
46:19 SHARON: Something else I've either seen or been told in code review is to
add delegates and whatnot.
46:25 STEPHEN: Mm-hmm.
46:25 SHARON: Is that a good general strategy for making things more testable?
46:25 STEPHEN: Yeah. It's similar to the idea of doing dependency injection by
passing in your web contents. Instead of passing in your web contents, pass in
a class that can provide things. And it's sort of a balance. It's a way to
balance, if you have a lot of dependencies, do you really want to add 25
different inputs to your class? Probably not. But you define a delegate
interface, and then you can mock out that delegate. You pass in that one
delegate, and then when delegate dot get web content is called, you can mock
that out. So very much the same goal, another way to do it.
47:04 SHARON: That sounds good. Yeah, I think in general, in terms of Chrome
specifically, a lot of these testing best practices, making things testable,
these aren't Chrome-specific. These are general software engineering-specific,
C++-specific, and those you can look more into separately. Here we're mostly
talking about what are the Chrome things. Right?
47:24 STEPHEN: Yeah.
47:24 SHARON: Things that you can't just find as easily on Stack Overflow and
such. So you mentioned fakes and mocks just now. Do you want to tell us a bit
about the difference there?
47:32 STEPHEN: I certainly can do it. Though I want to caveat that you can also
just go look up those on Stack Overflow. But yeah. So just to go briefly into
it, there is - in testing you'll often see the concept of a fake version of a
class and also a mock version of a class. And the difference is just that a
fake version of the class is a, what I'm going to call a real class that you
write in C++. And you will probably write some code to be like, hey, when it
calls this function, maybe you keep some state internally. But you're not using
the real web contents, for example. You're using a fake. A mock is actually a
thing out of the Google test support library. It's part of a - Google mock is
the name of the sub-library, I guess, the sub-framework that provides this. And
it is basically a bunch of magic that makes that fake stuff happen
automatically. So you can basically say, hey, instead of a web contents, just
mock that web contents out. And the nice part about mock is, you don't have to
define behavior for any method you don't care about. So if there are, as we've
discussed, 100 methods inside web contents, you don't have to implement them
all. You can be like, OK, I only care about the do Foobar method. When that is
called, do this.
48:51 SHARON: Makes sense. One last type of test, which we don't hear about
that often in Chrome but does exist quite a bit in other areas, is manual
testing. So do we actually have manual testing in Chrome? And if so, how does
that work?
49:03 STEPHEN: Yeah, we actually do. We're slightly crossing the boundary here
from the open Chromium into the product that is Google Chrome. But we do have
manual tests. And they are useful. They are a thing. Most often, you will see
this in two cases as a Chrome engineer. You basically work with the test team.
As I said, all a little bit internal now. But you work with the test team to
define a set of test cases for your feature. And these are almost always
end-to-end tests. So go to this website, click on this button, you should see
this flow, this should happen, et cetera. And sometimes we run these just as
part of the launch process. So when you're first launching a new feature, you
can be like, hey, I would love for some people to basically go through this and
smoke test it, make sure that everything is correct. Some things we test every
release. They're so important that we need to have them tested. We need to be
sure they work. But obviously, all of the caveats about manual testing out
there in the real world, they apply equally to Chromium or to Chrome. Manual
testing is slow. It's expensive. We require people - specialized people that we
have to pay and that they have to sit there, and click on things, and that sort
of thing, and file bugs when it doesn't work. So wherever possible, please do
not write manual tests. Please write automated testing. Test your code, please.
But then, yeah, it can be used.
50:33 SHARON: In my limited experience working on Chrome, the only place that
I've seen there actually be any level of dependency on manual test has been in
accessibility stuff -
50:38 STEPHEN: Yeah.
50:38 SHARON: which kind of makes sense. A lot of that stuff is not
necessarily - it is stuff that you would want to have a person check because,
sure, we can think that the speaker is saying this, but we should make sure
that that's the case.
50:57 STEPHEN: Exactly. I mean, that's really where manual test shines, where
we can't integration test accessibility because you can't test the screen
reader device or the speaker device. Whatever you're using, we can't test that
part. So yes, you have to then have a manual test team that checks that things
are actually working.
51:19 SHARON: That's about all of our written down points to cover. Do you have
any general thoughts, things that you think people should know about tests,
things that people maybe ask you about tests quite frequently, anything else
you'd like to share with our lovely listeners?
51:30 STEPHEN: I mean, I think I've covered most of them. Please write tests.
Write tests not just for code you're adding but for code you're modifying, for
code that you wander into a directory and you say, how could this possibly
work? Go write a test for it. Figure out how it could work or how it couldn't
work. Writing tests is good.
51:50 SHARON: All right. And we like to shout-out a Slack channel of interest.
Which one would be the - which one or ones would be a good Slack channel to
post in if you have questions or want to get more into testing?
52:03 STEPHEN: Yeah. It's a great question. I mean, I always like to - I think
it's been called out before, but the hashtag #halp channel is very useful for
getting help in general. There is a hashtag #wpt channel. If you want to go ask
about web platform tests, that's there. There's probably a hashtag #testing.
But I'm going to admit, I'm not in it, so I don't know.
52:27 SHARON: Somewhat related is there's a hashtag #debugging channel.
52:27 STEPHEN: Oh.
52:27 SHARON: So if you want to learn about how to actually do debugging and
not just do log print debugging.
52:34 STEPHEN: Oh, I was about to say, do you mean by printf'ing everywhere in
your code?
52:41 SHARON: [LAUGHS] So there are a certain few people who like to do things
in an actual debugger or enjoy doing that. And for a test, that can be a useful
thing too - a tool to have. So that also might be something of interest. All
right, yeah. And kind of generally, as you mentioned a lot of things are your
opinion. And it seems like we currently don't have a style guide for tests or
best practices kind of thing. So how can we -
53:13 STEPHEN: [LAUGHS] How can we get there? How do we achieve that?
53:19 SHARON: How do we get one?
53:19 STEPHEN: Yeah.
53:19 SHARON: How do we make that happen?
53:19 STEPHEN: It's a hard question. We do - there is documentation for
testing, but it's everywhere. I think there's `/docs/testing`, which has some
general information. But so often, there's just random READMEs around the code
base that are like, oh, hey, here's the content public test API surface. Here's
a bunch of useful information you might want to know. I hope you knew to look
in this location. Yeah, it's a good question. Should we have some sort of
process for - like you said, like a style guide but for testing? Yeah, I don't
know. Maybe we should enforce that people dependency inject their code.
54:04 SHARON: Yeah. Well, if any aspiring test nerds want to really get into
it, let me know. I have people who are also interested in this and maybe can
give you some tips to get started. But yeah, this is a hard problem and
especially with so many types of tests everywhere. I mean, even just getting
one for each type of test would be useful, let alone all of them together. So
anyway - well, that takes us to the end of our testing episode. Thank you very
much for being here, Stephen. I think this was very useful. I learned some
stuff. So that's cool. So hopefully other people did too. And, yeah, thanks for
sitting and answering all these questions.
54:45 STEPHEN: Yeah, absolutely. I mean, I learned some things too. And
hopefully we don't have too many angry emails in our inbox now.
54:52 SHARON: Well, there is no email list, so people can't email in if they
have issues. [LAUGHTER]
54:58 STEPHEN: If you have opinions, keep them to yourself -
54:58 SHARON: Yeah. [INAUDIBLE]
54:58 STEPHEN: until Sharon invites you on her show.
55:05 SHARON: Yeah, exactly. Yeah. Get on the show, and then you can air your
grievances at that point. [LAUGHS] All right. Thank you.

@ -0,0 +1,923 @@
# Whats Up With BUILD.gn
This is a transcript of [What's Up With
That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)
Episode 5, a 2023 video discussion between [Sharon (yangsharon@chromium.org)
and Nico (thakis@chromium.org)](https://www.youtube.com/watch?v=NcvJG3MqquQ).
The transcript was automatically generated by speech-to-text software. It may
contain minor errors.
---
Building Chrome is an integral part of being a Chrome engineer. What actually
happens when you build Chrome, and what exactly happens when you run those
build commands? Today, we have Nico, who was responsible for making Ninja the
Chrome default build system, to tell us more.
Notes:
- https://docs.google.com/document/d/1iDFqA3cZAUo0TUFA69cu5wEKL4HjSoIGfcoLIrH3v4M/edit
---
00:00 SHARON: Hello, and welcome to "What's Up With That," the series that
demystifies all things Chrome. I'm your host, Sharon, and today, we're talking
about building Chrome. How do you go from a bunch of files on your computer to
running a browser? What are all the steps involved? Our special guest today is
Nico. He's responsible for making Ninja, the Chrome default build system, and
he's worked on Clang and all sorts of areas of the Chrome build. If you don't
know what some of those things are, don't worry. We'll get into it. Welcome,
Nico.
00:29 NICO: Hello, Sharon, and hello, internet.
00:29 SHARON: Hello. We have lots to cover, so let's get right into it. If I
want to build Chrome at a really quick overview, what are all the steps that I
need to do?
00:41 NICO: It's very easy. First, you download `depot_tools` and add that to
your path. Then you run fetch Chromium. Then you type `cd source`, run `gclient
sync`, `gn gen out/GN`, and `ninja -C out/GN chrome`. And that's it.
00:53 SHARON: Wow. Sounds so easy. All right. We can wrap that up. See you guys
next time. OK. All right. Let's take it from the start, then, and go over in
more detail what some of those things are. So the first thing you mentioned is
`depot_tools`. What is that?
01:11 NICO: `depot_tools` is just a collection of random utilities for - like,
back in the day, for managing subversion repositories, nowadays for pulling
things from git. It contains Ninja and GN. Just adds a bunch of stuff to your
path that you need for working on Chrome.
01:25 SHARON: OK. Is this a Chrome-specific thing, or is this used elsewhere,
too?
01:33 NICO: In theory, it's fairly flexible. In practice, I think it's mostly
used by Chromium projects.
01:39 SHARON: OK, all right. And there, you mentioned Ninja and GN. And for
people - I think most people who are watching this have built Chrome at some
point. But what is the difference between Ninja and GN? Because you have your
build files, which are generally called Build.gn, and then you run a command
that has Ninja in it. So are those the same thing? Are those related?
01:57 NICO: Yes. So GN is short for Generate Ninja. So Ninja is a build system.
It's similar to Make. It basically gets a list of source files and a list of
build outputs. And then when you run Ninja, Ninja figures out which build steps
do I have to run, and then it runs them. So it's kind of like Make but simpler
and faster. And then GN - and Ninja doesn't have any conditionals or anything,
so GN is - just a built - it describes the build. And then it generates Ninja
files.
02:34 SHARON: OK.
02:34 NICO: So if you want to do, like, add these files only if you're building
for Windows, this is something you can do, say, in GN. But then it only
generates a Windows-specific Ninja file.
02:46 SHARON: All right. And in terms of when you mention OS, so there's a
couple places that you can specify different arguments for how you build
Chrome. So you have your gclient sync - sorry, your gclient file, and then you
have a separate args.gn. And in both of these places, you can specify different
arguments. And for example, the operating system you use - that can be
specified in both places. There's an OS option in both. So what is the purpose
of the gclient file, and what is the purpose of the args.gn file?
03:25 NICO: Yes. So gclient reads the steps file that is at the root of the
directory, and the DEPS file basically specifies dependencies that Chrome pulls
in. It's kind of similar to git submodules, but it predates git, so we don't
use git submodules also for other reasons. And so if you run gclient sync, that
reads the DEPS file, the Chrome root, and that downloads a couple hundred
repositories that Chrome depends on. And then it executes a bunch of so-called
hooks, which are just Python scripts, which also download a bunch of more
stuff. And the hooks and the dependencies are operating system dependent, so
gclient needs to know the operating system. But the build also needs to know
the operating system. And GN args are basic things that are needed for the
builds. So the OS is something that's needed in both places, but many GN args
gclient doesn't need to know about. For example, if you enable DCHECKs, like
Peter discussed a few episodes ago, that's a GN-only thing.
04:26 SHARON: All right. That sounds good. So let's see. When you actually -
OK. So when you run Chrome and you - say you build Chrome, right? A typical
example of a command to do that would be, say, `autoninja -C out/default
content`, right? And let's just go through each part of that and say what each
of those things is doing and what happens there. Because I think that's just an
example from one of the starter docs. That's just the copy and paste command
that they give you. So autoninja seems like it's based on Ninja. What is the
auto they're doing for us?
05:15 NICO: Yeah. So autoninja is also one of the things that's just
`depot_tools`. It's a very - or it used to be a very thin wraparound Ninja. Now
it's maybe a little thicker, but it's optional. You don't have to use autoninja
if you don't want to. But what it does is basically - like, it helps - So
Chrome contains a lot of code. So we have this system called Goma, which can
run all the C++ compilations in a remote data center. And if you do use the
system, then you want to build with a very high build parallelism. You want to,
say, `-j 1000` or what and run, like, a thousand bit processes in parallel. But
if you're building locally, you don't want to do that. So what autoninja
basically does - it looks at your args.gn file, sees if you have enabled Goma,
and if so, it runs Ninja with many processes, and else, it runs it with just
one process per core, or something like that. So that's originally all that
autoninja does. Nowadays, I think it also uploads a bunch of stuff. But you can
just run which autoninja, and that prints some path, and you can just open that
in the editor and read it. I think it's still short enough to fairly quickly
figure out what it does.
06:17 SHARON: OK. What does `-C` do? Because I think I've been using that this
whole time because I copied and pasted it from somewhere, and I've just always
had it.
06:28 NICO: It says - it changes the current directory where Ninja runs, like
in Make. So it basically says, change the current directory to out/GN, or
whatever you build directory is, and then run the build from there. So for
Chrome, the build always - the current directory during the build is always the
build directory. And then Ninja looks for a file called build.ninja in the
current directory, so GN writes build.ninja to out/GN, or whatever you build
directory is. And then Ninja finds it there and reads it and does its thing.
06:57 SHARON: All right. So the next part of this would be out/default, or out
slash something else. So what are out directories, and how do we make use of
them?
07:11 NICO: An out directory - it's just a build directory. That's where all
the build artifacts go to, all the generated objects files, executables, random
things that are generated during the build. So it can be any directory, really.
You can make up any directory name that you like. You can build your Chrome in,
I don't know, fluffy/kitten, or whatever. But I think most people use out just
because it's in the global `.gitignore` file already. Then you want to use
something that's two directories deep so that the path from the directory to
the source is always `../..`. And that makes sure that this is deterministic.
We try to have a so-called deterministic build, where you get exactly the same
binary when you build Chrome at the same revision, independent of the host
machine, more or less. And the path from the build directory to the source file
is something that goes into debug info. So if you want to have the same build
output as everyone else, you want a build directory path that's two directories
deep. And the names of those two directories doesn't really matter. So what
some people do is they use out/debug for the debug builds and out/release for
their release builds. But it's really up to you.
08:26 SHARON: Right. Other common ones are, like, yeah. ASan is a common one,
different -
08:33 NICO: Right.
08:33 SHARON: OSes. Right. So you mentioned having a deterministic build. And
assuming you're on the same version of Chrome, at the same checkout,
tip-of-tree, or whatever as someone else, I would have expected that all of the
builds are just deterministic, but maybe that's because of work that people
like you and the build team have done. But what are things that could cause
that to be nondeterministic? Because you have all the same files. Where is the
actual nondeterminism coming from? Or is it just different configurations and
setups you have on your local machine?
09:09 NICO: Yeah, that's a great question. I always thought this would be very
easy to - but turns out it mostly isn't. We wrote a very long blog post that we
can link to it from the show notes about this. But there's many things that can
go wrong. Like for example, in C++, there's the preprocessor macro `__DATE__`,
which embeds the current date into the build output. So if you do that, then
you're time dependent already. By default, I think you end up with absolute
paths to everything in debug information. So if you build under
`/home/sharon/blah`, then that's already different from all the people who are
not called Sharon. Then there's - we run tools as part of the build that
produce output. For example, the protobuf compiler or whatnot. And so if that
binary iterates over some map, some hash map, and that doesn't have
deterministic iteration order, then the output might be different. And there's
a long, long, long, long, long list of things. Making the build deterministic
was a very big project, and there's still a few open things.
10:08 SHARON: OK, cool. So I guess it's - yeah, it's not true nondeterminism,
maybe, but there's enough factors that go into it that to a typical person
interacting with it, it does seem -
10:21 NICO: Yeah, but there's also true nondeterminism. Like, every now and
then, when we update the compiler, the compiler will write different object
files on every run just because the compiler internally iterates about some -
over some hash map. And then we have to complain upstream, and then they fix
it.
10:34 SHARON: OK. Oh, wow. OK. That's very cool. Well, thank you for dealing
with this kind of stuff so people like us don't have to worry about it. OK. And
the last part of our typical build thing is content. So what is content in this
context? If you want to learn about content more in general, check out
episode 3. But in this case, what does that mean?
10:58 NICO: So just a build target. So I think people - at least I usually
build some executable. I usually build, I don't know, `base_unittests` or
`unit_tests` or Chrome or content shell or what. And it's just - so in the
Ninja files, there's basically - there's many, many lines that go, if you want
to build this file, you need to have these inputs and then run this command. If
you want to build this file, instead, you need these other files. You need to
run this other command. So for example, if you want to build `base_unittests`,
you need a couple thousand object files, and then you need to run the linkers,
what's in there. And so if you tell Ninja - the last thing you give it -
basically, it tells Ninja, what do you want to build? So if you say, `ninja -C
out/GN content_shell` or what, then Ninja is like, let's look at the line that
says `content_shell`. And then it checks - I need these files, so it builds all
the prerequisites, which usually means compiling a whole bunch of files. And
then it runs the final command and runs the linker. So Ninja basically decides
what it needs to do and then invokes other commands to do the actual work.
12:08 SHARON: OK, makes sense. So say I run the build - so say I built the
target Chrome, which is the one that actually is an executable, and that's
what - if you run that, the browser is built from it. So say I've built the
Chrome build target. How do I run that now?
12:31 NICO: Well, it's written - so normally, the thing you give to Ninja is
actually a file name. And the `-C` change current directory. So if you say, `-C
out/release chrome`, then this creates the file `out/release/chrome`. It just
creates that file in the out directory. So to run that, you just run
`out/release/chrome`, and hopefully it'll start up and work.
12:54 SHARON: Great. Sounds so easy. So you mentioned earlier something called
Goma, which had remote data centers and stuff. Is this something that's
available to people who don't work at Google, or is this one of the
Google-specific things? Because I think so far, everything mentioned is anyone,
anywhere can do all this. Is that the case with Goma, also?
13:14 NICO: Yeah. For the other things - so Ninja is actually something that
started in Chrome land, but that's been fairly widely adopted across the world.
Like, that's used by many projects. But yeah, Goma - I think it's kind of like
distcc. Like, it's a distributed compiler thing. I think the source code for
both the client and the server are open source. And we can link to that. But
the access to the service, I think, isn't public. So they have to work at
Google or at a partner company. I think we hand out access to a few partners.
And as far as I know, there's a few independent implementations of the
protocol, so other people also use something like Goma. But as far as I know,
these other services also aren't public.
13:53 SHARON: OK. Right. Yeah, because I think one of the main things is - I
mean, as someone who did an internship on Chrome, after, I was like, I'll
finish some of these remaining to do items once I go back to school, right? And
then I started to build Chrome on my laptop, just a decent laptop, but still a
laptop, and I was like, no, I guess I won't be doing that.
14:17 NICO: No, it's doable. You just need to be patient and strategic. Like, I
used to do that every now and then. You have to start the build at night, and
then when you get up, it's done. And if you only change one or two CC files,
it's reasonably fast. It's just, full builds take a very long time.
14:29 SHARON: Yeah, well, yeah. There was enough stuff going on that I was
like, OK. We maybe won't do this. Right. Going back to another thing you
mentioned is the compiler and Clang. So can you tell us a bit more about Clang
and how compiling fits into the build process?
14:50 NICO: Yeah, sure. I mean, compiling just means - almost all of Chrome
currently is written in C++, and compiling just means taking a CC file, like a
C++ file, and turning it into - turning that into an object file. And there are
a whole bunch of C++ compilers. And back in the day, we used to use many, many
different C++ compilers, and they're all slightly different, so that was a
little bit painful. And then the C++ language started changing more frequently,
like with C++ 11, 14, 17, 20, and so on. And so that was a huge drain on
productivity. Updating compilers was always a year-long project, and we had to
update, like, seven different compilers, one on Android, iOS, Windows, macOS,
Android, Fuchsia, whatnot. So over time, we moved to - we moved from using
basically the system compiler to using a hermetically built Clang that we
download as a gclient DEPS hook. So when you run gclient sync, that downloads a
prebuilt Clang binary. And we use that Clang binary to build Chrome on all
operating systems. So if one file builds for you on your OS, then chances are
it'll build on all the other OSes because it's built by the same compiler. And
that also enables things like cross builds, so you can build Chrome for Windows
on Linux if you want to because your compiler is right there.
16:11 SHARON: Oh, cool. All right. I didn't know that. Is there any reason,
historically, that Clang beat out these other compilers as the compiler of
choice?
16:24 NICO: Yes. So it's basically - I think when we looked at this - so Clang
is basically the native compiler on macOS and iOS, and GCC is kind of the
system compiler on Linux, I suppose. But Clang has always had very good GCC
compatibility. And then on Windows, the default choice is Visual Studio. And we
still want to link against the normal Microsoft library, so we need a compiler
that's ABI-compatible with the Microsoft ABI. And GCC couldn't do that. And
Clang also couldn't do that, but we thought if we teach Clang to do that, then
Clang basically can target all the operating systems we care about. And so we
made Clang work on Windows, also with others. But there was a team funded by
Chrome that worked on that for a few years. And also, Clang has pretty good
tooling interface. So for code search, we also use Clang. So we now use the
same code to compile Chrome and to index Chrome for code search.
17:28 SHARON: Oh, cool. I didn't know that either, so very interesting. OK.
We're just going to keep going back. And as you mention more things, we'll
cover that, and then go back to something you previously mentioned. So next on
the list is gclient sync. So I think for everyone who's ever worked on Chrome,
ever, especially at the start, you're like, I'll build Chrome. You build your
target, and you get these weird errors. And you look at it, and you think, oh,
this isn't some random weird spot that I definitely didn't change. What's going
on? And you ask a senior team member, and they say to you, did you run gclient
sync? And you're like, oh, I did not. And then you run it, and suddenly, things
are OK. So what else is going - you mentioned a couple of things that happen.
So what exactly does gclient sync do?
18:13 NICO: Yeah. So as I - that's this file at the source root called DEPS,
D-E-P-S, all capital letters. And when you update - if you git pull the Chrome
repository, then that also updates the DEPS file. And then this DEPS file
contains a long list of revisions of dependent projects. And then when you run
gclient sync, it basically syncs all these other git repositories that are
mentioned in the DEPS file. And after that, it runs so-called hooks, which like
do things download a new Clang compiler and download a bunch of other binaries
from something called the CIPD, for example, GN. But yeah, basically makes sure
that all the dependencies that are in Chrome but that aren't in the Chrome
repository are also up to date. That's what it does.
19:06 SHARON: OK. Do you have a rough ballpark guess of how many dependencies
that includes?
19:13 NICO: Its operating system dependent. I think on Android we have way
more, but it's on the order of 200. Like, 150 to 250.
19:25 SHARON: Sounds like a lot. Very cool. OK. In terms of - speaking of other
dependencies, one of the top-level directories in Chrome is `//third_party`,
and that seems in the same kind of direction. So how does stuff in
`//third_party` work in terms of building? Can you just build them as targets?
What kind of stuff is in there? What can you and can you not build? Like, for
example, Blink is one of the things in `//third_party`, and lots of people -
that's a big part of it, right? But a lot of things in there are less active
and probably less big of a part of Chrome. So does `//third_party` just build
anything else, or what's different about it?
20:09 NICO: And that's a great question. So Blink being in `//third_party` is a
bit of a historical artifact. Like, most things - almost all of the things in
`//third_party` is basically code that's third-party code. That's code that we
didn't write ourselves. And Chrome's secret mission is to depend on every other
library out there in the world. No, we depend on things like libpng for reading
PNG files, libjpeg for reading all of - libjpeg-turbo these days, I guess, for
reading JPEG files, libxml for reading XML, and so on. And, well, that's many
dependencies. I won't list them all. And some of these third-party dependencies
are just listed in the DEPS file that we talked about. And so they basically -
like, when gclient sync runs, it pulls the code from some git repository that
contains the third-party code and puts it into your source tree. And for other
third-party code, we actually check in the code into the Chrome main repository
instead of DEPSing it in. There are trade-offs, which approach to choose. We do
both from time to time. But yeah. Almost no third-party dependency has a GN
file upstream, so usually what you do is you have to write your own BUILD.gn
file for the third-party dependency you want to add. And then after that, it's
fairly normal. So for a library, if you want to add a dependency on libfoo,
usually what we do is you add - you create third-party libfoo, and you put
BUILD.gn in there. And then you add a DEPS entry that syncs the actual code to
a third-party libfoo source or something. Yes.
21:37 SHARON: All right. Sounds good. Again, you mentioned BUILD.gn files, and
that's, as, expected a big part of how building works. And that's probably the
part that most people have interacted more with, outside of just actually
running whatever command it is to build Chrome. Because if you create, delete,
rename any files, you have to update it in some BUILD.gn file. So can you walk
us through the different things contained in a BUILD.gn file? What are all the
different parts?
22:12 NICO: Sure. So there's a great presentation by Brett, who wrote GN, that
we can link to. But in a summary, it's - BUILD.gn contains build targets, and
the build target normally is like - it doesn't have to be, but usually, it's a
list of CC files that belong together and that either make up a static library
or a shared library on executable. So those are the main target types for CC
code. But then you can also have custom build actions that run just arbitrary
Python code, which, for example, if you compile a protobuf - proto files into
CC and H - into C++ and header files, then we basically have a Python script
that runs protoc, the proto compiler, to produce those. And so in that case,
the action generates C++ files, and then those get built. But the other, simple
answer is libraries or executables.
23:11 SHARON: OK. One part of GN files that has caused me personally some
confusion and difficulty - which I think is maybe, depending on the part of
Chrome you work on, less of an issue - is DEPS. So you have DEPS in your GN
files, and there's also something called external DEPS. And then you have
separate DEPS files that are just called capital D-E-P-S.
23:30 NICO: Yes. Yes, there, that's some redundant - that's, again, I guess for
historical reasons. So in gclient, DEPS just means to build this target, you
first have to build these other targets. Like, this target depends - uses this
other code. And in different contexts, it kind of means different things. So
for example - I think if an executable depends on some other target, then that
external executable is linked - that other target is also linked in. If base
unit test depends on the base library, which in a normal build is a static
library - like in a normal build? Like in a release build, by default, it's a
static library. And so if base unit test is built, it first creates a static
library and then links to it. And then base itself might depend on a bunch of
third-party things, libraries, which means when base unit tests is linked, it
links base, but then it also links against basis dependencies. So that's one
meaning of DEPS. Another meaning, like these capital DEPS files, that's
completely distinct. Has nothing to do with GN, I'm sad to say. And that's just
for enforcing layering. Those predate GN, and they are for enforcing layering
at a fairly coarse level. They say, code in this directory may include code
from this other directory but not from this third directory. For example, a
third - like, Blink must not - may include stuff from base, but must not
include anything from, I don't know, the Chrome layer or something.
25:18 SHARON: Right, the classic content Chrome layering, where Chrome -
25:18 NICO: Right. And I think -
25:18 SHARON: content, but -
25:18 NICO: Right. And there's a step called check-deps, and that checks the
capital DEPS files.
25:24 SHARON: OK. Yeah, because before, I worked on some Fuchsia stuff, and
because we're adding a lot of new things, you're messing around with different
DEPS and stuff a lot more than I think if you worked in a typical part. Like,
now, I mostly just work within content. Unlikely that you're changing any
dependencies. But that was always a bit unclear because, for the most part, the
targets have very similar names - not exactly the same, but very similar. And
if you miss one, you get all these weird errors. And it was, yeah, generally
quite confusing.
25:55 NICO: Yeah, that's pretty confusing. One thing of the capital DEPS things
that they can do that the GN DEPS can't is if someone adds a DEPS on your
library and they add an entry to their DEPS file, that means that now at code
review time, you need to approve that they depend on you. And that's not
something we can do at the GN level. And the advantage there is, I don't know,
if you have some library and then 50 teams start depending on it without
telling you, and now you're on the hook for keeping all these 50 things
working, then with this system, you at least have to approve every time someone
adds a dependency on you, you have to say, this is fine with me. Or you can
say, actually, this is - we don't want this to be used by anyone else.
26:45 SHARON: Is there an ideal state where we don't have these DEPS files and
maybe that functionality is built into the BUILD.gn files, or is this something
that's probably going to be sticking around for a while?
26:52 NICO: That's a great question. I don't know. It seems weird, right? It's
redundant. So I think the current system isn't ideal, but it's also not
horrible enough that we have to fix it immediately. So maybe one day we'll get
around to it.
27:10 SHARON: Yeah. I think I've mostly just worked on Chrome, so I've gotten
pretty used to it. But a common complaint is people who work in Google internal
things or other, bigger - the main build system of whatever company they work
on, they come to Chrome and they're like, oh, everything's so confusing. But if
you - you just got to get used to it, but -
27:27 NICO: Right. I think if you're confused by anything, it's great if you
come to us and complain. Because you kind of become blind to these problems,
right? I've been doing this for a long time. I'm used to all the foot guns. I
know how to dodge them. And yeah. So if you're confused by anything, please
tell me personally. And then if enough people complain about something, maybe
we'll fix it.
27:55 SHARON: All right. Yeah. That's what you said. The outcome of that -
we'll see. We'll see how that goes. We'll see how many complaints you suddenly
get. Right. OK. So another thing I was interested in is right now there's a lot
of work around Rust, getting more Rust things, introducing that, memory safety,
that's good. We like it. What is involved from a build perspective for getting
a whole other language into Chrome and into the build? Because we have most of
the things C++. There's some Java in all of the Android stuff. And in some
areas, you see - you'll see a list of - you'll see a file name, and then you'll
see file name underscore and then all the different operating systems, right?
And most those are some version of C++. The Mac ones are .mm. And you have Java
ones for Android. But if you want to add an entirely different language and
still be able to build Chrome, at a high level, what goes into that?
29:00 NICO: Yeah, there's also some Swift on iOS. It's many different things.
So at first, you have to teach GN how to generate Ninja files for that
language. So when a CC file is built, then basically the compiler writes out a
file that says, here are all the header files I depend on. So if one of them
gets touched, the compiler - or Ninja knows how to rebuild those. So you need
to figure out how the Rust compiler or the Swift compiler track dependencies.
You need to get that information out of the compiler into the build system
somehow. And C++ is fairly easy to build. It's like a per-file basis. I think
most languages are more on a module or package base, where you build a few
files as a unit. Then you might want to think about, how can I make this work
with Goma so that the compilation can work remotely instead of locally? So
that's the build system part. Then also, especially for us, we want to use this
for some performance critical things, so it needs to be very fast. And we use a
bunch of toolchain optimization techniques to make Chrome very fast with
three-letter acronyms, such as PGO and LTO and whatnot. And LTO in particular,
that means a Link Time Optimization. That means the C++ or the Rust code is
built - is compiled into something called "bitcode." And then all the bitcode
files at link time are analyzed together so you can do cross-file in-lining and
whatnot. And for that work, the bitcodes - all the bitcode versions need to be
compatible, which means Clang and Rust need to be built against the same
version of LLVM, which is some - it's some internal compiler machinery that
defines the bitcode. So that means you have to - if you want to do
cross-language LTO, you have to update your C++ compiler and your Rust compiler
at the same time. And you have to build them at the same time. And when you
update your LLVM revision, it must break neither the C++ compiler nor the Rust
compiler. Yeah. And then you kind of want to build the Rust library from
source, so you have bit code for all of that. So it's a fairly involved - but
yeah, we've been doing a lot of work on that. Not me, but other people.
31:24 SHARON: Right. Sounds hard. And what does LTO stand for, since you used
it?
31:30 NICO: Link Time Optimization.
31:30 SHARON: All right.
31:30 NICO: And there's a blog post on the Chromium blog about this that we can
link to in the show notes that has a fairly understandable explanation what
this does.
31:43 SHARON: Yeah, all right. That sounds good. So linking, that was my next
question. As you build stuff, you sort out all of your just compile errors, you
got all your spelling mistakes out. The next type of error you might get is
linking error. So how does - can you tell us a bit more about linking in
general and how that fits into the build process?
32:01 NICO: I mean, linking - like, for C++, the compiler basically produces
one object file for every CC file. And then the linker takes, like, about
50,000 to 100,000 object files and produces a single executable. And every
object file has a list of functions that are defined in that object file and a
list of functions that are undefined in that object file that it calls that are
needed from elsewhere. And then the linker basically makes one long list of all
the functions it finds. And at the end, all of them should be defined, and all
the non-inline ones should be defined in exactly one object file. And if
they're not - if that doesn't happen, then it emits an error, and else, it
emits a binary. And the linker is kind of interesting because the only thing
you really care about is that it does its job very quickly. But it has to read
through gigabytes of data before it writes the executable. And currently, we
use a linker called `ld`, which was also written by people on the Chrome team,
and which is also fairly popular outside of Chrome nowadays. And so we wrote on
ELF linker, which is the file format used on Linux and Android, and on COFF
linker, which is the file system used on Windows, and our own Mach-O linker,
which is the file system on Apple - macOS and iOS. And our linkers are way,
way, way faster than the things that they replace. On Windows, we were, like,
10 times faster than the Windows linker. And on Mac, we're, like, four times
faster than the system linker and whatnot. The other linker vendors have caught
up a little bit, but we - I feel like Chrome has really advanced the state and
performance of linking binaries across the industry, which I think is really
cool.
33:44 SHARON: Yeah, that is really cool. And in a kind of similar vein to the
different OSes and all that kind of stuff is 32- versus 64-bit. There's some
stuff happening. I've seen people talk about it. It seems pretty important. Can
you just tell us a bit more about this in general?
34:04 NICO: Well, I guess most processors sold in the last decade or so are
64-bit. So I think on some platforms, we only support 64-bit binaries, like -
and the bit just means how wide is a pointer and has some implications on which
instructions can the compiler use. But it's fairly transparent too, I think, at
the C++ level. You don't have to worry about it all that much. On macOS, we
only support 64-bit builds. Same on iOS. On Windows, we still have 32-bit and
64-bit builds. On Linux, we don't publicly support 32-bit, but I think some
people try to build it. But it's really on Windows where you have both 32-bit
and 64-bit builds. But the default bits is 64-bit, and you can say, if you say
target CPU equals x86, I think, in your args.gn, then you get a 32-bit build.
But it should be fairly transparent to you as a developer, unless you write
assembly.
35:02 SHARON: How big of an effort would it be to get rid of 32-bit on Windows?
Because Windows is probably the biggest Chrome-using platform, and also,
there's a lot of versions out there, right? So -
35:15 NICO: Oh, yeah.
35:15 SHARON: How doable?
35:15 NICO: I think that the biggest platform is probably Android. But yeah,
Android is also 32-bit, at least on some devices at the moment. That's true. I
don't know. I think we've looked into it and decided that we don't want to do
that at the moment. But I don't know details.
35:33 SHARON: And you mentioned ARM. So is there any - how much does the Chrome
build team - are they concerned with the architecture of these processors? Is
that something that, at the level that you and the build team have to worry
about, or is it far enough - a few layers down that that's -
35:47 NICO: It's something we have to worry about at the toolchain team. So we
update the scaling compiler every two weeks or so, which means we pull in all -
around 1,000 changes from upstream contributors that work on LVM spread across
many companies. And we have to make sure this doesn't break from on 32-bit ARM,
64-bit ARM, 32-bit Intel, 64-bit Intel, across seven different operating
systems. And so fairly frequently, when we try to update Clang tests start
failing on, I don't know, 32-bit Windows or on 64-bit iOS or some very specific
configuration. And then we have to go and debug and dissect and figure out
what's going on and work with upstream to get that fixed. So yeah. That's
something we have to deal with at the toolchain team, but hopefully, it's -
hopefully, like the normal Chrome developer is isolated from that for the most
part.
36:45 SHARON: I think so. It's not - if I weren't asking all these other
questions, it's something that almost never crosses my mind, right? So that
means you're all doing a very good job of that. Thank you very much. Much
appreciated. And jumping way back, you mentioned earlier indexing the code
base, code search. So I make a change. I submit it. I upload it. It eventually
ends up in code search. So how does that process work? And what goes into
indexing? Because before, when I was working on Fuchsia all the Fuchsia code
wasn't indexed, so you couldn't do the handy thing of clicking a thing and
seeing where it was defined. You had to actually look it up. And once you got
that, it was like, oh my gosh, so much better. So can you just tell us a bit
more about that process?
37:30 NICO: Sure, yeah. The Chrome has a pretty good code search feature, I
think, codesearch.chromium.org or cs.chromium.org. Basically, we have a bot
that runs, I think, every six hours or so, pulls the latest code, bundles it
up, sends it to some indexer service that then also uses Clang to analyze the
code. Like, for C++, I think we also index Java. We probably don't index Rust
yet, but eventually we will. And then it generates - for every word, it
generates metadata that says, this is a class. This is an identifier. And so if
you click on it, if you click on a function, you have the option of jumping to
the definition of the function, to the declaration, to all the calls, all the
overrides, and so on. And that updates ideally several times a day and is
fairly up to date. And we built the index, I think, for most operating systems.
So you can see this is called here on Linux, here on Windows, and what not.
38:32 SHARON: OK. Sounds good. Very useful stuff. And I don't know if this is
part of the build team's jurisdiction, but when you are working on things
locally, you have some git commands, and then you have some git-cl commands.
38:43 NICO: Mm-hmm.
38:48 SHARON: So the git commands are your typical ones - git pull, git rebase,
git stash, that kind of thing. And then you have git-cl commands, which relate
more to your actual CL in Gerrit. So git-cl upload, git-cl status. That'll show
you all your local branches and if they have a Gerrit change associated with
them. So what's the difference between git and git-cl commands?
39:18 NICO: I'm sorry. So this is basically a git feature. If you call git-foo,
then git looks for git-foo on your path. So you can add arbitrary commands to
git if you want to. And git-cl is just something that's in `depot_tools`.
Again, there's git-cl in `depot_tools`, and you can open that and see what it
does. And it'll redirect to `git_cl.py`, I think, which is a fairly long and
hairy Python script. But yeah. It's basically Gerrit integration, as you say.
So you can use that to send try jobs, `git cl try`. To upload, as you say, you
can use `git cl issue` to associate your current branch with a remote Gerrit
review, `git cl patch` to get a patch off Gerrit and patch it into your local
thing, `git cl web` to open the current thing in a web browser. Yeah, git-cl is
basically - git-cl help to see all the git-cl commands, or - yeah. If you have
a change that touches, like, 1,000 files, you can run `git cl split`, and it'll
upload 500 reviews. But that's usually too granular, and I wouldn't recommend
doing that. But it's possible.
40:25 SHARON: Right. Do you have a - [DOORBELL DINGS]
40:25 NICO: Oops, sorry.
40:25 SHARON: commonly - yeah.
40:30 NICO: Oh, sorry. There was - the door just rang. Maybe you didn't hear
it. Sorry.
40:30 SHARON: All right. It's all good. Do you have a lesser known git or
git-cl command that you use a lot or -
40:41 NICO: Well, I -
40:41 SHARON: is your favorite? [LAUGHS]
40:46 NICO: It's not lesser known to me, so I wouldn't know. I don't know. I
use `git cl upload` a lot.
40:53 SHARON: Right. Well, you have to use `git cl upload`, right?
40:53 NICO: I use -
40:53 SHARON: Well, you don't - maybe not but -
40:53 NICO: `git cl try` to send try jobs from my terminal, `git cl web` to see
what's going on, `git cl patch` a lot to patch stuff in locally. If I'm doing a
code review and I want to play with it, I patch it in, build a local, and see
how things are working.
41:12 SHARON: Yeah. When I patch in a thing, I go from the cl page on Gerrit
and then click the down patch thing, but -
41:21 NICO: No, even `git cl patch -b` and then some branch name, and then you
just patch - paste the Gerrit review URL.
41:28 SHARON: Oh, cool.
41:28 NICO: So it's just, yeah, Control-L to focus the URL bar. Control-C
Alt-Tab `git cl patch -b blah`, Paste, Enter, and then you have a local branch
with the thing.
41:36 SHARON: All right. Yeah, a lot of these things, once you learn about
them - at first you're like, whoa, and then you use them, and then they're not
lesser known to you, but you tell other people also a common - so another one
would be `git cl archive`, which will -
41:47 NICO: Oh, yeah, yeah.
41:47 SHARON: get rid of any local branches associated with a closed Gerrit
branch, so that's very handy, too.
41:53 NICO: Yes.
41:53 SHARON: So it's always fun to learn about things like that.
41:59 NICO: Are you fairly tidy with your branches? How many open branches do
you usually have?
41:59 SHARON: [LAUGHS] I used to be more tidy. When I tried to do a cleanup
thing, I had more branches. I think right now I've got around 20-something
branches. I like having not very many. I think to some people, that's a lot. To
some people, that's not very many. I mean, ideally, I have under five, right?
[LAUGHS] But -
42:18 NICO: I don't know. I usually have a couple 10, sometimes. Have a bunch
of machines. I think on some of them it's over 100, but yeah. Every now and
then, I run `git cl archive` and it removes half of them, but -
42:29 SHARON: Yes. All right, cool. Is there anything that we didn't cover so
far that you would like to share? So things that maybe you get asked all the
time, things that people could do better when it comes to build-related things?
Things that you can do that make the build better or don't make it worse, that
kind of thing? Or just anything else you would like to get out there?
42:58 NICO: I guess one thing that's maybe implicitly stated, but currently not
explicitly documented, as far as I know, but I'm hoping to change that, is - so
Chrome tries to have a quiet build. Like, if you build this zero build output,
except that one Ninja file, Ninja line that's changing, right? There's, well,
another code basis - I think it's fairly common - that there's many screenfulls
of warning that scroll by. And we very explicitly try not to do that because if
the build emits lots of warnings, then people just learn to ignore warnings. So
we think something should either be a serious problem that people need to know
about, then it should be an error, or it should be not interesting. Then it
should be just quiet. So if you add a build step that adds a random script, the
script shouldn't print anything, just about progress. Shouldn't say, doing
this, doing this, doing this. Should either print something and say something's
wrong and fail those build step or not say anything. So that's one thing.
43:51 SHARON: That's - yeah, that's true.
43:51 NICO: And the other thing -
43:51 SHARON: Like, you only really get a bunch of terminal output if you have
a compile or a linker error, whatever.
43:57 NICO: Right.
43:57 SHARON: I hadn't ever considered that. If you build something and it
works, you get very few lines of output. And I hadn't ever thought that was
intentional before, but you're right in that if it was a ton, you would just
not look at any of it. So yeah, that's very cool.
44:09 NICO: Yeah. And on that same note, we don't do deprecation warnings
because we don't do any warnings. So if people - like, people like deprecating
things, but people don't like tidying up calls to deprecated functions. So if
you want to deprecate something in Chrome, the idea is basically, you remove
all callers, and then you remove the deprecated thing. And we don't allow you
to say - to add a warning that tells everyone, hey, please, everyone, remove
your calls. The onus is on the person who wants to deprecate something instead
of punting that to everyone else.
44:46 SHARON: Yeah, I mean, the thing that I was working on has a deprecating
effect, so removing callers, which is why I have so many branches. But I've
also seen presubmit warnings for if you include something deprecated. So - oh,
yeah, and there's presubmit, too. OK, we'll get to that also. [LAUGHS] Tell us
more about all of this.
45:05 NICO: About presubmits? Yeah, presubmits - presubmits are terrible.
That's the short summary. So if you run a `git cl presubmit`, it'll look at a
file called presubmit.py, I think, in the current directory, and maybe in all
the directories of files - of directories that contain files you touched or
something like that. But you can just open the top-level presubmit.py file, and
there's a couple thousand lines of Python where basically everyone can add
anything they want without much oversight, so it's a fairly long - at least
historically, that used to be the case. I don't know if that's still the case
nowadays. But yeah, it's basically like a long list of things that random
people thought are good if they - like, presubmits are something that are run
before you upload, also, implicitly. And so you're supposed to clean them up.
And [INAUDIBLE] many useful things. For example, nowadays we require most code
to be autoformatted so that people don't argue about where semicolons should go
or something silly like that. So one of the things it checks is, did you run
`git cl format`, which runs, I guess, Clang format for C++ code and a bunch of
custom Python scripts for other files. But it's also - presubmits have grown
organically, and there isn't - they're kind of unowned and they're very, very
slow. And I think some people have tried to improve them recently, and they're
better than they used to be, but I don't love presubmits, I guess is the
summary. But yeah, it's another thing to check invariants that we would like to
be true about our code base.
46:48 SHARON: Yeah. I mean, I think - yes, spelling is something I think it
also checks.
46:54 NICO: It checks spelling? OK.
46:54 SHARON: Or maybe that's a separate bot in Gerrit.
46:59 NICO: Oh, yeah, yeah, yeah, yeah. Like, there's this thing called -
what's its name?
47:06 SHARON: Trucium? Tricium?
47:06 NICO: Tricium, yeah. Tricium, right. Tricium is something that adds
comments to your - automatically adds comments to your change list when you
upload it. And Tricuium can do spelling correction, but it can also - it runs
something called Clang Tidy, which is basically a static analysis engine which
has quite a few false positives, so sometimes it complains about something
that - but it's actually incorrect, and so we don't put that into the compiler
itself. So we've added a whole bunch of warnings to the compiler for things
that we think are fairly buggy. But Clang Tidy is - but these warnings have to
be - they have to have a very low false positive rate. Like, if they complain,
they should almost always be right. But sometimes, for static analysis, it's
hard to be right. Like, you can say this might be wrong. Please be sure. But
this is not something the compiler can say, so we have this other system called
Clang Tidy which also adds a comment to your C++ code which says, well, maybe
this should be a reference instead of a copy, and things like that.
48:04 SHARON: Yeah. And I think it - I've seen it - it checks for unused
variables and other - there's been useful stuff that's come from comments from
there, so definitely. All right. Very cool. So if people are interested in all
this build "infra-y" kind of stuff and they want to get more into it, what can
they do?
48:32 NICO: We have a public build@chromium.org mailing list. It's very low
volume, but if you want to reach out, you can send an email there and a few of
us will see your email and interact with you. And there's also I think the tech
build on crbug. So you can just look for build bugs and fix all our bugs for
us. That'd be cool.
48:51 SHARON: [LAUGHS]
48:51 NICO: And if there's anything specific, just talk to local OWNERS. Or if
you feel this is just something you're generally interested in and you're
looking for a project, you can talk to me, and I probably have a long list of -
I do have a long list of somewhat beginner-friendly projects that people could
help out with, I guess.
49:15 SHARON: Yeah. I mean, I think being able to - if you're looking for a
20%y kind of project or something else. But knowing how things actually get put
together is always a good skill and definitely applicable to other things. It's
the kind of thing where the more low level-knowledge you have, the more - it
works - it applies to things higher up, but not necessarily the other way
around, right?
49:34 NICO: Mm-hmm.
49:34 SHARON: So having that kind of understanding is definitely a good thing.
All right. Any last things you'd like to mention or shout out or cool things
that you want people to know about? [LAUGHS]
49:48 NICO: I guess -
49:48 SHARON: Or what - yeah, quickly, what is the future of the whole build
thing? Like, what's the ideal situation if -
49:55 NICO: Ideally, it'll all be way faster, I guess is the main thing. But
yeah, yeah, I think build speed is a big problem. And I'm not sure we have the
best handle on that. We're working on many things, but - not many. A bunch of
things. But it's - like, people keep adding all that much code, so if y'all
could delete some code, too, that would help us a lot. I mean, having -
supporting more languages is something we have to - this is something that's
happening. Like, Rust is happening. We are also on iOS also using Swift.
Currently, we can't LTO Swift with the rest because that's on a different OEM
version. There's this - in C++ - we keep upgrading C++ versions. So Peter
Kasting is working on moving us to C++20. And then 23, we'll have them, and so
on. There's maybe C++ modules at some point, which may or may not help with
build speed. And there's a bunch of tech debt that we need to clean up, but
that's not super interesting.
51:24 SHARON: I don't know. I think people in Chrome in general are more
interested and care about reducing tech debt in general, right? A lot of people
I know would be happy to just do tech debt clean-up things only, right?
Unfortunately, it doesn't really work out for job reasons. But a lot of people,
I think, are interested in, I think, in higher proportions than maybe other
places.
51:47 NICO: It depends on the tech debt. Some of it might work out for job
reasons. But, yeah.
51:54 SHARON: Yeah. I mean, some of it is easier than others, too, right? Some
of it is like, yeah, so, OK, well, go delete some code. Go clean up some
deprecated calls. [LAUGHS] All that.
52:08 NICO: Yeah, and again, I think finishing migrations is way harder than
starting them, so finish more migrations, start fewer migrations. That'd be
also cool.
52:16 SHARON: All right. I am sure everyone listening will go and do that right
away.
52:21 NICO: Yep.
52:21 SHARON: And things will immediately be better.
52:27 NICO: They've just been waiting to hear that from me, and now they're
like, ah, yeah, right. That makes sense.
52:27 SHARON: Yeah, yeah. All right. Well, you all heard it here first. Go do
that. Things will be better, et cetera. So all right. Well, thank you very
much, Nico, for being here answering all these questions. I learned a lot. A
lot of this is stuff that - everyone who works on Chrome builds Chrome, right?
But you can get by with a very minimal understanding of how these things are.
Like, you see your - you follow the Intro to Building Chrome doc. You copy the
things. You're like, OK, this works. And then you just keep doing that until
you have a problem. And depending on where you work, you might not have
problems. So it's very easy to know very little about this. But obviously, it's
so important because if we didn't have any of this infrastructure, nothing
would work. So one, I guess, thank you for doing all the stuff behind the
scenes, determinism, OSes, all that, making it a lot easier for everyone else,
but also thank you for sharing about it so people understand what's actually
going on when they run the commands they do every day.
53:31 NICO: Sure. Anytime. Thanks for having me. And it's good to hear that
it's possible to work on Chrome without knowing much about the build because
that's the goal, right? It should just work.
53:44 SHARON: Yeah.
53:44 NICO: Sometimes it does.
53:44 SHARON: [LAUGHS] Yeah. Well, thank you for all of it, and see you next
time.
53:51 NICO: Yeah. See you on the internet. Bye.
54:03 SHARON: OK. So we will stop recording -
54:03 NICO: Wee. Time for the second take.
54:03 SHARON: [LAUGHS] Let's do that, yeah, all over again.
54:11 NICO: Let's do it.
54:11 SHARON: I will stop recording.

@ -0,0 +1,978 @@
# Whats Up With Open Source
This is a transcript of [What's Up With
That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)
Episode 6, a 2023 video discussion between [Sharon (yangsharon@chromium.org)
and Elly
(ellyjones@chromium.org)](https://www.youtube.com/watch?v=zOr64ee7FV4).
The transcript was automatically generated by speech-to-text software. It may
contain minor errors.
---
What does it mean for Chrome to be open source? What exactly is Chromium? Can
anyone contribute to the browser? Answering all that is today's special guest,
Elly. She worked all over Chrome and ChromeOS, and is passionate about
accessibility, the open web, and free and open-source software.
Notes:
- https://docs.google.com/document/d/1a6sdrspJgAHDdQMMNGV0t7zo8QWgqq0hgsyV55tc_dk/edit
Links:
- [What's Up With BUILD.gn](https://www.youtube.com/watch?v=NcvJG3MqquQ)
- [What's Up With //content](https://www.youtube.com/watch?v=SD3cjzZl25I)
- [What are Blink Intents?](https://www.youtube.com/watch?v=9cvzZ5J_DTg)
---
00:00 SHARON: Hello, and welcome to "What's Up With That?" the series that
demystifies all things Chrome. I'm your host, Sharon. And today, we're talking
about open source. What does it mean to be open source? I've heard of Chrome,
but what's Chromium? What are all the ways you can get involved? Answering
those questions and more is today's special guest, Elly. Elly currently works
on the Chrome content team, which is focused on making the web more fun and
interesting to use. Previously, she's worked all over Chrome and Chrome OS.
She's passionate about accessibility, the open web, and free and open-source
software. Welcome, Elly.
00:34 ELLY: Thank you, Sharon.
00:34 SHARON: All right. First question I think is pretty obvious. What is open
source? What does that mean?
00:40 ELLY: Yeah, so open source is a pretty old idea. And it basically just
means, in the purist sense, that the source code for a program is open to be
read by others.
00:51 SHARON: OK. And Chrome, the source code is available to be read by
anyone. What else is it? Open source - I've heard of open-source community. It
seems like there's a lot to it. So why don't you just tell us more about open
source, generally?
01:03 ELLY: Yeah, for sure. There's quite a bit of nuance here. And there's
been differing historical interpretations of some of these terms, so I'll -
there's two big camps that are important to talk about. One is open source,
which means what I said - the source is available to be viewed. There's also
the idea of free software, which is software that actually has license terms
that allow for people to modify it, to make their own derivative versions of
it, and that kind of thing. And so historically, there was a pretty big
difference between those things. These days, the two concepts are often talked
about pretty interchangeably because a lot of open-source projects are free
software, and all free software projects basically are open source. But the
distinction used to be very important and is still pretty important, I guess.
Chromium is both open source and free software. So we ship under a license that
allows for - not only for everyone to read and look at our code, but also for
other folks to make their own versions of Chromium. So, yeah, Chromium, both
open source and free software.
01:56 SHARON: OK, very cool. And you mentioned Chromium in there. But I think
for most people, when they think of the browser, they call it Chrome. So what
is the difference between Chrome and Chromium? Are they the same thing? I think
people, myself included, sometimes use those interchangeably, especially when
you work on it. So what is the difference there?
02:16 ELLY: Yeah, fantastic question. So Chromium is an open-source and free
software web browser that is made by the Chromium Foundation, which is like an
actual .org that exists somewhere on the internet. Chrome is a Google-branded
web browser that is basically made by taking Chromium, which is an open-source
and free software web browser, adding some kind of Google magic to it, like
integrations with some Google services, some kind of media codecs that maybe
aren't themselves free software, that kind of thing, bundling that up into a
more polished product which we call Google Chrome, and then shipping that as a
web browser. So Chromium is an open-source project. Google Chrome is a Google
product that is built on top of Chromium.
03:03 SHARON: OK. So Google Chrome is a Chromium-based browser, which is a term
I think that people who work in any browser stuff - it's a term that they've
all [INAUDIBLE] before.
03:08 ELLY: Yeah, exactly. And in fact, you alluded to the fact that we
sometimes use those terms interchangeably. And especially at Google, we
sometimes get a little confused about what we're talking about sometimes
because we're - the Google Chrome team are the biggest contributors to
Chromium, the open-source project. And so we tend to sometimes talk about the
two things as though they're the same. But there's a really important
difference for folks who are working on other Chromium-derived browsers. So if
you're working on a Chromium derivative that a Linux distribution ships, for
example, your browser is based on Chromium, and it's really not Chrome. It's
Chromium, right? It is the open-source browser that Chrome is based on. But
it's not the same thing at all.
03:52 SHARON: Yeah, if you want to learn a bit more about basing things on
Chromium, the content episode is a good one to check out. We talk a bit about
that and embedding Chrome in Chromium and what that means. So -
04:03 ELLY: Yeah, absolutely.
04:03 SHARON: check it out if you [INAUDIBLE]...
04:03 ELLY: And there's also, in the Chromium source tree, there's actually a
thing called Content Shell, which is a minimal demonstration browser. It's like
the rendering engine from Chromium wrapped in the least amount of browser
possible to make it work. And we use it for testing, but it's also a really
good starting point if you're trying to learn how to build a Chromium
derivative browser.
04:22 SHARON: OK, very neat. So I think a next very natural question to come
out of this is, why is Chrome or Chromium - Chromium rather - going to try to
be good about using those correctly here - but why is Chromium open source?
04:40 ELLY: Yeah, so this is the decision that we made right when we were
starting the project actually. And it's based on this really fundamental idea
that the web benefits when users have better browsers. So if we, like the
Chromium project, come up with some super clever way of doing something, or we
come up with some really ingenious optimization to our JavaScript Engine or
something like that, it's better for the web, better for everyone, and
ultimately even better for Google as a business if those improvements are
actually adopted by other people and taken by other people and used by them. So
it is better for us if other people make use of anything clever that we do. And
separately from that, there's this idea that's really prevalent in open-source
communities of, if people can read the code, they're more likely to find bugs
in it. And that's something that Chromium constantly benefits from, is folks
who are outside the project, just kind of looking through our code base,
reading and understanding it, spotting maybe security flaws that are in there.
That kind of research is so much easier to do when the source code is just
there, and you're not trying to reverse-engineer something you can't see the
source to. So we get a lot of benefit from being open-source like that. And
those are the reasons we had originally, and those still all hold totally true
today, I think.
05:51 SHARON: That makes sense. Yeah, it seems, at first, a bit odd for a big
company like Google to make something like this open source. But there are
other massive open-source things at Google - Android, I think, being the other
canonical example, which we don't know too much about, but we won't be getting
too into that. But there are other big open-source projects around.
06:08 ELLY: Yeah, absolutely. And there's also, like - there's Go. That's an
open-source programming environment, like a language and a compiler and a bunch
of tools around it that is open source built by Google. There are plenty of
other open-source and free software projects built by large corporations, often
for really the same reasons. We benefit because the entire web benefits from
better technology.
06:32 SHARON: Yeah, I think some of the Build stuff we do is open source. Check
out the previous episode for that. And that's, yeah, exactly - not strictly
only used by -
06:37 ELLY: Yeah, and by the way, partly because we're open source - like, for
example, the Chromium base library, which is part of our C++ software
environment - our base library is regularly used in other projects, even things
that are totally unrelated to browsers, because it provides a high-quality
implementation of a lot of basic things that you need to do. And so that code
is being used in so many places we would never have anticipated and has done
honestly more good in the world than it would do if it was just part of a
really excellent browser.
07:13 SHARON: Something that someone on my first team told me was, if you've
changed anything in base, that probably is going to get run any time the
internet gets run, somewhere in that stack, which, if you think about it, is so
crazy.
07:26 ELLY: Oh, Yeah. Absolutely. Early in my career, I added a particular
error message to part of the Chrome network stack. And that network stack, too,
is one of those components that gets reused in a lot of places. And so
occasionally, I'll be running some completely other program. Like, I'll be
running a video game or something, and I'll see that error message that I added
being emitted from this program. And I'm like, oh, my code is living on in a
place I would never have really thought of.
07:51 SHARON: Oh, that's very cool.
07:51 ELLY: Yeah.
07:51 SHARON: Yeah.
07:51 ELLY: It's one of those unique open-source experiences in my book, of
seeing your own work being used like that by other folks you wouldn't have
anticipated.
07:57 SHARON: Yeah, that's very cool. So something I think I've heard you say
before that I thought sounded very cool was the open-source dream. So can you
tell us a bit more about what that is. What is that vision? It sounds very
nice.
08:09 ELLY: Yeah, so I talked about this a little bit. And earlier, I cautioned
against conflating open-source and free software. But it really is more of the
free software dream than the open-source dream, in some sense. That dream is
this idea that if we have software that is made available for free, under
licenses that let people modify it and make derivative works and keep using it,
that over time, everyone will get access to really high-quality and
freely-available software. And we will have a situation where the software that
people need is built by their communities, built by the people who are in those
communities, instead of being something that they have to buy from a company
that makes it. It'll be something they can instead produce for themselves. And
over time, I think that this has really played out in that way. If you look at
the state of operating systems today, for example, there are these really
high-quality, freely-available open-source free software operating systems that
are readily available and anyone can use, and they really do meet the needs for
a lot of folks. And then, in fact, it kind of circles back to where Linux is a
high-quality, free software open-source operating system that Google can then
turn around and make really good use of to build something like Chromium OS,
which is another free software open-source project that uses Linux as one of
its major components. And then we get to produce a product that the Chromium OS
engineering team would have had to spend a lot of time if we weren't able to
make use of that existing Linux kernel work. So you get into this cycle of
giving back and sharing and benefiting from the effects of other people
sharing. That's the free software dream to me.
09:57 SHARON: It does - yeah, that sounds great. And for sure - I try to use
open-source options when I can. When I edit these videos, I use something
open-source. It feels appropriate for what we're doing here. So, yeah, that
sounds like it would be - it's a good system that everyone contributes to and
everyone benefits from. And that's really nice.
10:10 ELLY: Yeah, absolutely.
10:16 SHARON: So going away from that towards the more less open-source part,
so what kind of things in Chrome, the browser, are not open source? You
mentioned a couple of things earlier. Can you tell us a bit more about some of
those things?
10:27 ELLY: Yeah, I'm going to caveat this by saying that I don't personally
work on the stuff I'm about to talk about. And so my knowledge is more
superficial. There's a couple things I'm pretty confident about. So one is, for
example, there's a few video formats that Chrome can play that Chromium cannot
play because Google has agreements with the companies that make those codecs
that allow us to basically license and embed their thing and ship it as part of
Chrome. But those agreements, we can't really extend them to everyone who might
make a Chromium browser. And so it ends up in a situation where there is a
closed-source component that's included in Chrome to make that possible. I'm
struggling to think of another example right off the top of my head. I believe
that there's also a couple things in Chrome that are integrating with Google
APIs, where they're features that are Chrome-specific because they're
Google-specific. And one of the things that is generally true between the two
products is that Chrome will have more Google integrations and more Google
magic and more Google smarts than Chromium will. And so I think some of those
are actually closed-source components that come from Google that get embedded
into Chrome. But because they're a closed-source, we wouldn't want to put them
into Chromium.
11:37 SHARON: Right. It seems like, yeah, I can sign into Chrome. I don't
expect that I'd be able to sign in with my gmail.com into, say, Chromium. I'm
not sure it's actually part of it, but that's a guess.
11:49 ELLY: Yeah, so that does work, except that you need to - any Chromium
distributor needs to go and talk to - basically, talk to the sign-in team to
get an API key that allows their browser to sign in. There is a process for
doing that. It doesn't actually require any closed-source code components. But
there is still a thing where you have to talk to the accounts team and
basically be like, hey, we're a legitimate web browser, and we want to allow
users to sign in. Because we don't want a situation where bots or malware are
doing fake user sign-ins from - pretending to be Chromium. That's bad.
12:25 SHARON: Right. That makes sense. Yeah, and I think because of where
Chrome and Chromium are positioned, I think there will be some interesting
comparisons and differences between Chrome, Chromium, and other internal
google3 projects. So that's kind of the term for things that are closed-source
Google - the typical Maps, Search, all that stuff - and also comparing Chromium
to other open-source projects. So we've talked a bit about the similarities and
differences between Chrome and Google internal. Are there any other things you
can think of that are either similar or different between Chrome the project
and the people who work on it and how people do things internally at Google?
13:11 ELLY: Yeah. So internally at Google, there's this very powerful, very
custom-built whole technology stack around the projects. There is a continuous
integration system. There's an editor. There's a source control system. There's
all of this stuff. Within Google, all of that is custom. And it's all fitted to
Google's needs. And a lot of it is just built from scratch, frankly. Whereas
for Chromium, we're using essentially off-the-shelf open-source stuff to meet a
lot of those needs. So, for example, for version control, we're just using Git,
which is I think the most popular version control system in the world right
now. It's definitely open source. And our build system, for example, which is
like GN and Ninja put together, those are both free software open-source
projects. Admittedly, both of them were, I think, started as part of Chromium
because we had those needs. But they, themselves, are free software components
that anyone else can also use to build a Chromium. And the reason why that's
done that way - like, why doesn't - it's actually a really good question. Why
doesn't Chrome, which is a Google project, use all of this amazing
infrastructure for engineering that Google has? And the answer is, we want the
Chromium project to be possible to work on for people who don't work at Google.
And so we can't say, oh, hey, whenever you're going to make a change, you have
to commit it into Google's internal source control system. That wouldn't work
at all. So we're almost - because we want to be an open-source project, and
because we want to have contributors from outside of Google, we end up almost
pushed into using this pretty open free software stack, which I - to be honest,
from my perspective, has a lot of other benefits. When we have new folks
joining the team, we can actually offer them tools they're already pretty
familiar with. They don't have the feeling that new Googlers sometimes get,
where they're totally disoriented. Like, everything they know about programming
doesn't apply anymore. We actually be like, hey, here's Git. You know how to
use this. Here's Gerrit, which is another piece of open-source software that we
use. They may not have used Gerrit before, but a lot of projects do. And so
they might have run into it previously. So it has pluses and minuses,
definitely. So that's a big difference. There's also a bit of what I would say
is a cultural difference more than anything else because most Google projects
that are not open source - so I'm not talking about things like Android or Go
or something like that - but projects that are really just not open source,
like Search, their ecosystem of discussion and culture and stuff is very much
inside Google. Whereas for Chromium, we constantly are getting ideas and
suggestions and code changes and stuff from outside of Google. And so we also
tend to have perspectives from outside of Google in our discussions more often
as we work on Chromium. So part of that is at the level of, if we're going to
make a change, we would have maybe input coming in on that change from Mozilla
even. They're a group we collaborate with a ton on web standards. And so we
would have their perspective in the discussion. Whereas if we were working
entirely within Google, we might not have those external perspectives. So
culture-wise, I feel like Chromium has more perspectives in the room sometimes
when we're thinking about stuff.
16:26 SHARON: That makes sense because browsers exist across other companies
too, and there's a lot of compatibility and standards and stuff. So just in
that nature of things, you have to have a lot more of this collaboration. If
you make a change, it'll affect all of the embedders maybe, and then you have
to think about this. And, yeah, there's a lot more discussion - [INTERPOSING
VOICES]
16:42 ELLY: Yeah, absolutely.
16:42 SHARON: If you're Search, you're like, OK, we're going to, I don't know,
do our thing.
16:47 ELLY: Yeah, you have more - I don't know if "autonomy" is the right word.
But, yeah. I want to caveat this by saying I'm not on Search. And so maybe it's
totally different. But that's how it looks to me as a person who works on
Chrome.
16:59 SHARON: Yeah. Yeah. And I think in terms of actual development and making
code changes and stuff, I think probably the biggest difference is that because
anyone can download the source repository and make changes and all that, the
actual programming and changes you do, you do those on a computer. Maybe that's
a machine you SSH into or a cloud top or whatever. But you have to actually
download all of the code. Whereas with all of the google3 stuff, everything
happens in a cloud somewhere. So everything is all connected, and you just do
things through the browser pretty much.
17:29 ELLY: That's very true. Actually, there's another important facet that
just occurred to me, which is, because Chromium is open source - and in
particular, some open-source projects will use this model where they send out a
release every so often. So they'll be like, we're shipping a new major release
of our program, and here's the source that corresponds to that. So there are
companies that do that. But we actually do what's called developing in the
open. So our main Git repository that stores our source is public. Which means
that as soon as you put in a commit, or even if you just put it up for code
review, that's public. Everyone on the internet can see what we're doing live,
which is really pretty interesting in terms of its effects on you. So for
example, if you're in - you're working inside google3, and you're like, I have
this really cool, wild idea, I'm going to go and make an experimental branch
and just make a prototype of it and see what happens, you can just go do that.
It's not a problem. But if you're working in Chromium, and you go and make your
wild prototype experimental branch, you have a pretty good chance that
someone's going to notice that. And then maybe you get a news story that's
like, hey, Chromium might be adding this amazing feature. And you're like, oh,
no, that was my wild, experimental idea. I didn't intend for this to happen.
But now people have really picked up on it, and people outside of the company
that you've never met are starting to get excited about something that you
never really intended to build and just wanted to try. So it's a different way
of working. You're sort of always in the public eye a little bit. And you want
to be a little bit more considerate about how something might look to people
way outside of your team and outside of your context. Whereas teams that are
inside google3 I don't think have to think about that as much.
19:07 SHARON: Yeah, I mean, for me, I've only really worked in Chromium full
time and all that. And I've just gotten used to the fact that all of my code
changes are fully public and anyone can look at them. Whereas I think people
who work in anything that's not like that - people in the company you work, I
can see it. But not just anyone out there. So I don't know. I've gotten used to
it, but I think it's not a typical thing to [INAUDIBLE].
19:30 ELLY: Oh, yeah. Absolutely. And in fact, this is something that folks who
are transferring into Chrome from other parts of Google sometimes have a little
difficulty with, is if you're used to writing a commit message where maybe the
only description in the commit message is go/doc about my project, for Chromium
that doesn't fly because only Googlers can actually follow those links. And so
the commit message to a non-Googler doesn't say anything. And so you actually
have to start thinking, how am I going to explain this whole thing I'm doing to
a non - to a person who doesn't have any of this Google-specific context about
what it is. You go through this little mental - you cross this little mental
bridge where you actually are forced to reframe your own work away from, what
are Google's business goals, and towards, how does this fit Chromium, the
open-source project, that other people also use? It's interesting and
occasionally a little frustrating, but interesting and usually really
beneficial.
20:26 SHARON: Yeah, for sure. And I think from people I've talked to, it just
seems like another, briefly, difference between internal Google stuff and
Chromium is that internal Google just has a ton of tools you can use.
20:37 ELLY: Yes, absolutely.
20:37 SHARON: Which both means a lot of things that are maybe a bit challenging
in Chromium are probably easier, but also maybe finding the right tool is hard.
But -
20:42 ELLY: Oh, yeah. That is very much the case. I have only limited
experience working inside google3. But I definitely have experienced the
profusion of tools and also the fact that the tools are just honestly amazing.
And it makes total sense. Google has many, many engineers whose whole job is to
build great tools. And Chromium is just not that big of a project. We just
don't have that many folks that are working on it. The folks who do build
infrastructure work for Chromium do amazing work, but there's not hundreds of
them. And so it's not on the same level.
21:12 SHARON: Yeah. And what you said earlier makes me have - gives me - has -
makes me wonder - and this ties us into the next thing - of other open-source
projects, they just do a release, and they don't maybe do development in the
open. And having not actually worked on other open-source projects really, I
kind of assumed that this development in the open was the norm. So how common
do you think or you know that that practice is?
21:45 ELLY: Gosh, I would really be guessing, to be honest with you. But I
would say the development in the open is by far the norm these days. And when
you see projects that follow the big release model instead, the way that looks
is they'll be like, hey, version 15 is out, and here's the source for
version 15. You can look at it. But the development, as it happens, happens
internally. I would tend to associate that with being maybe big company
projects that have a lot of confidentiality concerns. So for example, if you're
building the software that goes with some cool, new hardware for your company,
you don't want to start checking that software into Git publicly because then
people are going to read it and be like, ooh, this has support for a
billion-megapixel camera. That must be coming in the new thing. And so I think
that the big release model might be, these days, more prevalent when people are
doing hardware integrations, where there's other components that are shipping
at a fixed time and you don't want your source to be open until that point. But
honestly, the developing in the open model is, I think, much more common these
days. Historically, back in the '70s and '80s, when you would buy an operating
system and it would come with source, that was just a thing that you got as
part of the package, then it was much more of the source is released with the
OS model. Whereas these days, because distributed development is so easy with
modern version control systems, it's just so common to just develop in the open
like we do.
23:11 SHARON: Oh, cool. I didn't know that. So compared to other open-source
projects, what are some similarities and differences that Chromium has to
others that you may be familiar with?
23:25 ELLY: Ooh. All the ones I'm familiar with are quite a bit smaller than
Chromium. And so it's going to be hard to talk about it because, frankly -
23:32 SHARON: That's probably the common difference, though, right? Probably
very few are as big as Chromium.
23:32 ELLY: Oh, yeah. So in particular, one of the hardest problems in open
source - in running an open-source project is managing how humans relate to
other humans. The code problems are often relatively easy. The problems of how
do we make decisions about the direction of a project that maybe has a hundred
contributors who speak 10 different languages across a dozen time zones, that's
a hard problem. And so I often talk about the idea between open source, open
development, and then open governance. And so open source is just, like, you
can see the source. Open development is you can see the development process. So
the Git repo is open. The bug tracker is open. The mailing lists, where we do a
lot of our discussion, are open. So we do open development. But then you have
this next step of open governance, where the big decisions about where the
project is going are made in the open. And for Chromium, some of those are made
in the open, especially when it's really about the web platform or that kind of
thing. But some of them are not. For example, if we're deciding that we're
going to do some cool new UI design, that design and the initial development of
it might not necessarily be - or sorry, the development would be done in the
open, but the designing of it might not. That might be a discussion between a
few UX designers who all work at Google in a Google internal place. And so
Chromium has a bit of open governance but not all the way. A lot of smaller
projects have super open governance. So they'll literally be like, hey, should
we rewrite this entire thing in Rust? And they'll make that decision by arguing
about it on a mailing list, where everyone can see. And that's totally, totally
fine. Because Chromium is so big, we can't make those kinds of decisions by
having every Chromium engineer have their opinion and just post. It would be
complete chaos. And because we're big and prominent, a lot of the work that we
do is very much in the public eye. And so even discussions that are maybe
relatively speculative - like that example I gave before, where you have an
idea and you're like, wouldn't it be neat if we did this? It's easy for that to
turn into people inferring what Google's intentions are with Title Case, like,
Big Important Thing, and turning that into a lot when you would not have
intended it to be that way. And so we do end up keeping our governance
relatively on the closed side compared to other open-source projects I've
worked on. Other than that, in terms of engineering practices and what we do to
get the code written, we uphold a super high standard of quality. And in
particular - which is not to say that most open-source projects don't, because
they totally do. But Chromium, in my opinion, is really, really thoughtful
about not just, hey, how should code review work, but really evolving stuff
like, how should we bring new developers into this project? What should that
feel like? Those are discussions that we have. And I often feel like those are
discussions that other open-source projects don't talk about as much. What else
is different for us? I'm not sure. I think that those are some of the big ones.
The differences in scale are such that it's almost hard to talk about. The
difference between an open-source project that maybe has 5 contributors and one
that has 500 is very, very large.
27:07 SHARON: With the open governance thing you mentioned, something that that
made me think of is maybe Blink Intents, where you submit a thing to a list and
then that gets discussed. So that's part of the Chromium project, I think,
right? That falls under that category.
27:20 ELLY: Yep. Yep.
27:20 SHARON: And so that's where, if you want to make a change to Blink, the
rendering engine, you do this process of posting it to a list, and then people
weigh in.
27:25 ELLY: Yeah, absolutely. So Blink really does do open governance in a way
that I, honestly, very much admire. Blink and the W3C and a lot of these groups
that are setting standards for the internet do do open governance. Because,
frankly, it's the only way for them to work. It would not be good or healthy
for the web if it was just like, we're going to do whatever - whatever we,
Google, have decided to do and good luck everyone else. That would be very bad.
So yeah, Blink definitely does do open governance. But when it gets to things
that are more part of the browsers' behavior and features, we tend to have the
governance a little more closed.
28:08 SHARON: Right. And I think an example of Blink being more open governance
is the fact that BlinkOn is open to anyone to participate to. And that's the
channel that we're posting this on right now. It just happened to make sense
that I figured most of the audience who is watching Blink [INAUDIBLE] already
are interested in these, too. So that's why - [INTERPOSING VOICES]
28:27 ELLY: Yeah, absolutely.
28:27 SHARON: And for people who may not have - may have found these videos
that don't know about BlinkOn. That's what that is.
28:34 ELLY: Yeah. And just in that vein of open governance for Blink,
especially, there's also this idea of being a standard and then having things
be compatible with it. So the web platform is a collection of standards. And
other browsers have to implement those standards, too. And so for example, if
we make up a standard that is very difficult or impossible for, like, Firefox
to implement, that's not good. That's fragmenting the web platform. That's a
bad thing. Whereas the Chromium UI, like how the omnibox works in Chromium, for
example, isn't a standard. It doesn't matter whether Firefox or Edge or Opera
or whoever have the same omnibox behavior as us, right? And so there's much
less of a need to all agree. And instead, it's almost a little bit better to
have some variety there so that users can get a little bit more of a choice and
that collectively more things get tried in that vein. So there's places where
agreement and standardization are really important. And then there's places
where it's actually OK for each individual browser to go off on its own a bit
and be like, hey, we thought of this cool, new way to do bookmarks. And so we
have built this. And it doesn't matter whether the other browsers agree about
it because bookmarks are not a thing that interoperates between browsers.
29:44 SHARON: Yeah, that makes sense. So now let's talk about some of the
actual details of what it's like to work on Chromium and make changes, write
code, and new ideas. So I think you mentioned a few things, like bug tracking.
That's all public, in the open, apart from, of course, security-sensitive
things and other [INAUDIBLE] are hidden. What else is there? Code review - that
was Gerrit. You mentioned that. So You can see all the comments that everyone
leaves on everyone's changes.
30:16 ELLY: Oh, Yeah. And for better or for worse, by the way. It's good to
bear in mind that if you're like - you're going to type like a slightly jerk
message to someone on a code review, that's going to be preserved for all time,
and everyone's going to be able to see it.
30:29 SHARON: Yeah. Yeah. Be nice to people. [CHUCKLES] Version control -
that's Git. Probably people will know about that. Something that might be worth
mentioning is that a lot of people who contribute to Chromium, and if you look
at things like Gerrit and Chromium Code Search - that's also public, of
course - looks a lot like Google internal code search, but obviously it's open
source. So a lot of people have @chromium.org emails.
31:00 ELLY: Yes.
31:00 SHARON: So why are there separate emails? Because you can use at a
google.com or a GMail or any email. So why have this @chromium.org email thing?
31:05 ELLY: Yeah, so there's a few different reasons for that. So chromium.org
emails are available to members of the project, which is a little bit
nebulously defined, but it's definitely not just Googlers. And so there's a
couple reasons why people like having those. So for some folks, it's sort of a
signal that you are acting as a member of the open-source project rather than
acting with your Google hat on, if you like. And so for example, I help run the
community moderation team for Chromium. And so when I'm doing work for that
team, I'm very careful to use my chromium.org account because I want it to be
clear that I'm enforcing the Chromium community guidelines, which are something
that was agreed upon by a whole bunch of Chromium members, not just Googlers.
And so I'm not enforcing Google's code of conduct. I'm enforcing Chromium's
code of conduct in my role as a Chromium project person. So sometimes you
deliberately put on your Chromium hat so that you can make it clear that you
are acting on behalf of their project. Some folks - and I'm also one of these
folks, by the way - just happen to really be big fans and supporters of free
software and of open source. And so if I have the choice between wearing my
corporate identity and wearing my open-source project member identity, I might
just wear my open-source project member identity and decide to actually
contribute that way. And so a lot of the folks who've been on Chromium - or
have been on Chrome, I should say, for a while, that's part of their reasoning.
They joined because they were excited to work on something that was open. And
so they have this open-source identity, this Chromium identity, that they use
for that. There's a third factor, and this touches on one of the sometimes less
pleasant parts of working in open source, which is our commit log and our bug
tracker and all of that stuff are public. And what that means is that everyone
on the internet can go see them. And that is often great, but it's occasionally
not great. So for example, if you go and make an unpopular UI change, people on
the internet know that that was you. And that might not be something that
you're necessarily super ready to deal with. So for example, way, way, way, way
early in my career, I made a change to Chromium OS because I was working - I
was on the Chrome OS team as a brand Noogler. So this is I've been at Google
maybe five or six months. I made a change to Chrome OS. Somebody happened to
notice it and take issue with it. I don't even remember what the change was or
the issue. But they happened to notice it and take issue with it. They showed
up in our IRC channel, because we used IRC at the time, which was also public
because the whole project was very open like that, and really just started
yelling at me personally about it. And I'm like, this is not a cool experience.
This is something that if this was a Google coworker of mine, I would be
talking to HR about this. But it's actually just a random person on the
internet. And so there are some folks who use their Chromium username as a
little bit of a layer of insulation almost, where it's like, I want to work on
this project, but I don't - maybe my Google username has my full name in it. I
don't necessarily want every change I make to be done like that. And so if you
don't do that, you can end up in a situation where you make a change, and then
it's really attributed to you as though it was your personal idea and you did
this bad thing. And that's not a risk that everyone wants to take as part of
doing their work. And so sometimes people have a chromium.org account really
because they want an identity that's separate from their Google account - that
has a different name on it, that has different stuff like that. And so one of
the things that I'm always cautious to remind folks of on my team is, if you're
working with someone who has a chromium.org account, always use that
chromium.org account when you're speaking in public, always, always, always,
because you don't want to break that veil if someone is relying on it.
35:09 SHARON: Right. Yeah, that makes sense. And I think, in general, whenever
you are signing up for interacting in these public spaces, generally, I think
it's encouraged to use your chromium.org account. So for example, Slack, which
is the modern - current IRC often -
35:27 ELLY: It hurts my soul to hear you say that.
35:32 SHARON: Well - [LAUGHS]
35:32 ELLY: I'm a die-hard IRC user. I've been using IRC for 30 years. And I
was one of the few people who was I think very sad when we decided to move off
IRC. But you're right, that it is the modern IRC option.
35:44 SHARON: I think a lot of people are very die hard about IRC. So, you
know, but modern or not, that's what's currently being used.
35:49 ELLY: Absolutely.
35:55 SHARON: So Slack is where anyone can join and discuss Chromium stuff. And
generally, that kind of thing, you're encouraged to use your chromium.org
account.
36:01 ELLY: Yeah, absolutely. And to be fair to Slack also, the Slack has
probably 30 times as many people in it as the IRC channel ever did. So I think
that it's pretty clear that Slack is more popular than IRC was. But, yeah, no,
we use our Chromium identities a lot, really, really on purpose. And to be
honest, I would like it if we use them even more. Sometimes you will see folks
who actually have both identities signed up. So they'll have their google.com
and their Chromium, and that's always confusing for everyone. So if it was up
to me, I would say everyone has a Chromium identity, and they'd just all use it
when they're contributing.
36:39 SHARON: Yeah, that's definitely one of these unique two Chromium
[INAUDIBLE] pain points of someone [INAUDIBLE] use their maybe - often, they're
the same for most people. But sometimes they're different. Sometimes they're
very subtly different, and it's -
36:53 ELLY: Absolutely.
36:53 SHARON: you end up sending your [INAUDIBLE]...
36:53 ELLY: I also - I have met a couple folks who the Google username they
really wanted wasn't available, but it was available for chromium.org. And so
they picked a shorter, cooler username for chromium.org, which is totally -
totally fine to do. But then, every time you have to remember, oh, I know them
by this longer Google username, but actually they use this shorter username for
Chromium.
37:13 SHARON: Yeah, you have to remember their real life name. You have to
remember their work email. And then now you have to remember another work
email.
37:19 ELLY: Well, we have software that can help with that a bit.
37:25 SHARON: Yeah, for sure. So as part of that, and that's, in a way - a
thing that to me feels very related is there's a thing called being a committer
in Chromium. So what does it mean to be a committer? And what does it entail?
37:37 ELLY: Yeah, so committers are basically people who are trusted to commit
to CLs, for want of a better way of putting it. So the way the project is
structured, anyone can upload a CL. And anyone anywhere on the internet can
upload a CL. It has to be reviewed by the OWNERS of the directories that it
touches or whatever. But there are some files that are actually, like, OWNERS
equals star. So for example, the build file in Chrome browser, because
everybody needs to edit it all the time, it just has OWNERS equal star. And
there's a comment that's like, hey, if you're making a huge change, ask one of
these people. But otherwise, you're just freely allowed to edit it. And so if
the committer system didn't exist, anyone on the internet would be allowed to
edit a bunch of parts of the project without any review, which is pretty bad.
And so there's this extra little speed bump where it's like, you have to send
in a few CLs to show that you're really a legit person who's contributing to
the project. And once you've done that, you get this committer status, which
actually allows you to push the button that makes Gerrit commit your change
into the tree. And that's what it does mechanically. We culturally tend to have
it mean something a little different than that, but it's - culturally, it's
like a sign of trust of the other project members in you. So getting that
committer status really means, we collectively trust you to not totally screw
things up. That's what it is. And so you have to be a committer to actually be
in an OWNERS file, for example. You can't be listed as an owner until you're a
committer. Because if you're not a committer yet, we're not really - if we're
not trusting you to commit code, we're not really going to trust you to review
other people's code. And, yeah, when you're new joining the project, it's
actually a pretty big milestone to become a committer. You become a committer
after you've been working for anywhere from three to six months, I would say.
And it's definitely this moment of being like, yeah, I've really arrived. I'm
no longer new on the project. I'm now a full committer.
39:51 SHARON: Can you briefly tell us what the steps, mechanically, to becoming
a committer are?
39:51 ELLY: Yeah, so you need to have landed enough CLs to convince people you
know what you're doing. And there is no hard and fast limit, but it's like - it
should be convincing. And so I often hear maybe 15 to 20 of nontrivial CLs is a
pretty good number. Having done that, you need someone to propose you or
nominate you for committership. So there's actually - there's a mailing list
for having these discussions. And so whoever's going to nominate you, who has
to already be a committer, they'll send mail to that list, basically being
like, I would like to nominate this person for committer. There's a comment
period during which people can reply. And then if there's nobody who is raising
a big objection to you being a committer, after - I don't know what the actual
time period is - but after some amount of time, the motion carries with no
objections, and then your Chromium account becomes a committer. I think Google
accounts can also be committers as well, but I've only ever done this process
for Chromium accounts. And so those threads - what's going on in those threads
is mostly people endorsing the request. So let's say that I have someone who's
new on my team who I want to propose as a committer. I'll start the thread
nominating them as a committer, and then I'll go and talk to maybe two or three
of the people who have reviewed a lot of their changes, and I'll be like, hey,
would you endorse this person for a committer? If so, please post in this
thread. And so in the thread, there will actually be a couple of replies that
are like, plus 1, or, yes, this seems like a good fit. Very rarely, there might
be a reply, which is like, hey, I saw some - I saw some stuff on this CL that
shows that maybe this person isn't quite ready. We had a whole bunch of back
and forth comments, and eventually it really didn't seem like they understood
what I was asking for. And I feel like they're not really ready yet. Sometimes
that will happen. But usually the threads - by the time someone's nominating
you, you're already in good shape. So that's the mechanical process. And then
there is - it might actually just be Eric, individually, who goes through and
flips the bits on people being committers based on the threads. I'm not sure.
But there's some process by which those threads turn into people being
committers.
42:14 SHARON: OK, cool. Is there an analog of this either internally at Google
or in other open-source projects? Because internally at Google, there's the
concept of readability, which means you are vouched for that you know how to
code in this one language, which has some similarities. That's maybe a similar
thing. Are there any similar notions in other projects you've seen?
42:38 ELLY: Yeah, so many projects have this notion of being a member. And that
often combines our notions of committer and sometimes code owner. And so they
might - or for some open-source projects, you'll actually hear "maintainer" as
the thing. And so they'll be like, only people who are project members can
upload changes in the first place. And only people who are maintainers can
merge those changes. So that little speedbump on entry is pretty common.
Because it's a fact of life that if you are on the public internet and you have
no barriers to entry, you're going to have spam in your community no matter
what you do. And so that kind of split is super, super common. For some
projects that don't do open development, the entire thing might happen inside a
company or inside an organization anyway. And then there is no notion of
committer status because you're just hired onto that team and then you can
commit. But for projects that do open development and free software projects,
there is often a sense of, these are the people who are roughly trusted to land
code. And for a lot of projects, especially bigger ones, there's actually a
two-tiered model, where maybe you have people who are domain experts on a
specific thing, like, they maintain some subsystem. And they're trusted to make
whatever changes they need or approve other people's changes in that area. But
then at the wider scale, there's what's often called a steering committee or a
core group or something. And those groups have authority over the whole project
and the direction of everything that's going on. And so you'll often see that
kind of model in larger projects. At smaller scales, it's often literally a
list of one to five people who all have commit access to the same Git repo, and
there's no - no structure on top of that. But for bigger projects, governance
becomes a real concern. And so people start thinking about that.
44:35 SHARON: All right. Now, let's switch topics to talking about the more
day-to-day logistics of working on Chromium. So if you're not a Googler, don't
work at Google, to what extent can you effectively contribute to Chromium, the
project?
44:48 ELLY: Yeah, so that depends where you're coming from, both whether you're
part of another large organization, like maybe you work at Microsoft, you work
at Opera, Vivaldi, one of those companies, or if you're really an IC lone
contributor. If you're in a large organization, probably your org will have its
own structure around how you should contribute anyway. And so you might just
want to talk about that. So I'll really focus on the individual contributor
angle. And so for engineers specifically, like if you're a programmer who wants
to contribute to the code base, that's awesome. The best approach I think is
really to find an area that you're passionate about because it's so much more
fun and enjoyable to contribute when you're doing something you care about. So
find an area you care about. Get in touch with the team that works on that
area, either through their mailing lists or find their component in Monorail or
find them in the OWNERS files or whatever. Get in touch with those folks. Ask
them what are good places for you to contribute as a new person. That's often a
really great way to get started. And you'll have a person you can go to for
advice to be like, hey, how do I go about doing this thing? My experience has
been that Chromium contributors are pretty much all super helpful. And so
they're very willing to just give you guidance or do whatever. And you'll then
know who to send your code reviews to.
46:01 SHARON: Cool. Yeah. And if you're not an engineer, what are some ways you
can also contribute?
46:06 ELLY: Yeah, so there's a whole bunch of these. And by the way, these all
apply to basically every open-source project, so not just Chromium
specifically. So open-source projects, if you are a good writer, if you enjoy
doing technical writing or you enjoy doing UX writing or you want to do that
kind of thing, almost every open-source project out there is looking for people
to contribute documentation. And Chromium is no exception at all to that. So
high-quality documentation, we love that stuff. Or even if you're just honing
that craft and you want to practice, Chromium is not a bad spot to do that. If
you're a UX designer or a visual designer, a lot of open-source projects will
actually appreciate your contributions of you bringing in, like, hey, I thought
of a way that this user experience could feel or how the screen could look or
something like that. They'll often appreciate that kind of input or design
work. If you are someone who speaks multiple languages, translations are
another great way to contribute to open-source projects. A lot of open-source
projects don't have access to the same kind of - Chromium has access to a
translation team within Google who do a lot of our translations. A lot of
open-source projects don't have that. And so contributing translations of
documentation, of user-facing interface, stuff like that, can be super
valuable. And the last thing I'll say, which can be done by really anyone - you
don't even need special skills for this one - is try early releases of stuff.
So try development branches. If you're a Chrome user, try running Beta or Dev
or Canary. And then when something doesn't feel right or when it's - when it
doesn't work for you or it crashes or whatever, file bugs. And try to get
practiced at filing good bugs, with details and info and steps to reproduce the
bug and stuff like that. That's such a huge help as a developer of any
open-source projects - to get that early-user feedback and be able to correct
problems before they make it to the stable channel. And on Chromium, I've run
into a few folks who just - their main contribution to the project is really
just that they file great bugs all the time. There's a few folks who all they
really do is they run Canary on Mac, and they notice when something doesn't
feel quite right. And so they file stuff that's like, maybe the engineering
team wouldn't necessarily have noticed it. But when someone calls it out, we're
like, oh, that actually does feel kind of janky, and now we can go fix that.
And getting that feedback early is so, so valuable. So there's a lot of
different ways. Those are some, but there's plenty more, too.
48:21 SHARON: OK. Cool. Yeah, and a few things on that. If you want to really
try out random things, you can go to chrome://flags, play around there, see
what happens. In terms of going back a bit for being an engineer, there's other
web-adjacent stuff that you can do that we won't get into too much now. But
that can be things like adding web platform test, web standard stuff. And for
people who are into security, we have a VRP, Vulnerability Rewards Program. But
if you know about that, probably you're into the whole security space. This is
not how you're going to - maybe this is how you heard about it, and you want to
get into it. But, anyway -
48:59 ELLY: Yeah. I will say, if you're a security researcher and you aren't
familiar with the Chromium VRP, you should go take a look because it's -
Chromium is a really interesting project to audit for security. And the VRP can
make it very worth your while to do so if you find good bugs.
49:12 SHARON: Mm-hmm. Yeah. And going back a bit earlier to being an engineer,
like an IC, who is not at Google or any of these other big companies, there are
other barriers to entry to being a contributor, right?
49:28 ELLY: Oh, yeah.
49:28 SHARON: So I definitely encountered this after my internship. I worked on
Chrome. I was like, hey, I know what's going on now at the end of it. A couple
things we didn't finish. I'll go home, and I will keep working on this - good
intentions. And I got home, got my laptop, which was a pretty good laptop, but
still a laptop. I downloaded Chrome. That took a very long time. I built it for
the first time, which always takes a bit longer. But that took so long. And
even the incremental builds just took so long that I was like, OK, this is not
happening. I'm in school right now. I've got other things to worry about. So
how feasible is it for a typical person, let's say, to actually make changes in
Chromium?
50:05 ELLY: Yeah, that is unfortunately probably the biggest barrier to entry
for individuals who want to make technical contributions. Obviously, it doesn't
affect you if you're contributing documentation translations, whatever. But if
you're trying to modify the code, yeah, the initial build is going to be very
slow, and then the incremental builds are going to be very slow. And a lot of
the ancillary tasks are slow too, like running the test suite or running stuff
in a debugger. The project is just very big. And that's something that I think
a lot of folks on the Chromium team wish we could reduce. But Chromium is big
because the web is big and because what people want it to do is big. And so
it's not just big for no reason. But it does make it harder to get started as a
contributor. I've had this experience, too. I have a modern laptop sitting on
the desk over there. And it takes seven to eight hours to do a clean Chromium
build on that. Whereas on my work workstation, which has access to Goma,
Google's compile farm, it takes a few minutes. And the large organizations that
contribute also all have compile farms for the same reason. It's just so slow
to work when you're only doing local building and don't have access to a ton of
compilation power.
51:12 SHARON: Mm-hmm. Yeah. I wonder if we could, I don't know, do a thing for
people who are individuals who contribute more. Probably that would be really
hard to do. Probably people have thought about it. But, yeah.
51:24 ELLY: It would be nice if we could. I don't know what the challenges
would be offhand, but it would be very cool if we could somehow make that
available.
51:30 SHARON: All right. That all sounds very cool. I know I learned a lot.
Hopefully some of you learned a lot, too. I think if you are working within
Google, it's really easy to not really interact with any of this more
open-source stuff, depending on which part you work on. Maybe you work on a
part that's very Google Chrome specific. I know before I was working on
Fuchsia, so that was before Launch. So that was not really something we were
open to the public about anyway. And a lot of even the typical Chrome tools I
was unfamiliar with. So I think depending on which part you work on, this
stuff - it's all there, but you might not have had a chance to interact with.
So thank you, Elly, for telling us about it and giving us some context about
free and open-source software in general.
52:08 ELLY: Yeah, of course.
52:08 SHARON: Is there anything you would like to give a shout out? Normally,
we shout out a specific Slack channel. I think in this case, the Slack in
general is the shout out. Anything else?
52:20 ELLY: The Slack, in general, definitely deserves it. Honestly, I'm going
to go a little bit larger scale here. I'm going to shout out all of the folks
who have contributed to Chromium, both at Google and elsewhere. It is the work
of many hands. And it would not be what it is without the contributions from
the folks at Google, the folks at Microsoft, folks at Yandex, folks at Naver.
All of these different browsers and projects and all of the different
individuals that have contributed, like everyone in the AUTHORS file - so shout
out to all of those folks. And also, I really want to shout out the open-source
projects not even part of Chromium that we use and rely on every day. So for
example, we use LLVM, which is a separate open-source project for our
compilation toolchain. And I think I would not be exaggerating to say that
Chromium couldn't exist in its current form without the efforts of a bunch of
other open-source projects that we're making use of. And so I'm really hopeful
and optimistic that Chromium can live up to that. We're standing on the
shoulders of a lot of other open-source projects to build the thing that we've
built. And I'm hopeful that, in turn, other projects are going to stand on our
shoulders to build yet cooler stuff and yet - yet better programs and build a
yet better open-source community. So shout out to all of the authors of all the
open-source software that Chromium uses, which is a lot of people. But they
deserve it.
53:37 SHARON: Yeah, for sure. It's very cool how it's very - all very related.
And even within Chrome, I think people stick around longer than typical other
projects. And it's cool to see people around, like a decent number of them,
from before Chrome launched. And that's probably [INAUDIBLE] to a generally
more positive engineering culture. So that's very good.
53:58 ELLY: I think so. But I'm biased, of course.
53:58 SHARON: Yeah, maybe. [LAUGHS] Cool. You mentioned mailing lists a bunch.
Any favorites that you have?
54:08 ELLY: Oh, yeah. chromium-dev is the mailing list of my heart, I would
say. It's the main open-source development mailing list for us. It's a great
place for all of your newbie questions. If you're just like, how the heck do I
even check out the source, that's a good place to ask. The topic-specific
mailing lists, especially net-dev and security-dev, are really good if you have
questions in those specific areas. But honestly, all of the mailing lists on
chromium.org are good. I haven't yet encountered one where I'm like, that
mailing list is bad. So check them all out.
54:33 SHARON: Cool. All right. Check out every single mailing list. Sounds
good.
54:38 ELLY: Yeah, every mailing list, every Slack channel.
54:38 SHARON: All right. Great.
54:38 ELLY: You're all good.
54:38 SHARON: Every Slack channel, I think - yeah, I'll add myself to the rest
of them. All right. Well, thank you very much, Elly.
54:45 ELLY: Of course.
54:45 SHARON: Thank you for chatting with us. And see you all next time.
54:51 ELLY: All right. Thank you, Sharon. Easter egg - in the second part of
this video, Elly is drinking soda.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

@ -0,0 +1,691 @@
# Whats Up With Site Isolation
This is a transcript of [What's Up With
That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)
Episode 9, a 2023 video discussion between [Sharon (yangsharon@chromium.org)
and Charlie (creis@chromium.org)](https://www.youtube.com/watch?v=zOr64ee7FV4).
The transcript was automatically generated by speech-to-text software. It may
contain minor errors.
---
Site Isolation is a major part of Chrome's security. What exactly is it? How
does it fit into navigation? What about security? Todays special guest telling
us all about it is Charlie, who made it happen. He's also worked all over
navigation, making sure it works with all its complexities and remains secure.
Notes:
- https://docs.google.com/document/d/19LTLcwd2_JfiIklPXY0yu0ktpy-p8za2ZZXXzqBBVIY/edit
Links:
- [What's Up With Processes](https://www.youtube.com/watch?v=Qfy6T6KIWkI)
- [Life of a Navigation](https://www.youtube.com/watch?v=OFIvyc1y1ws)
---
0:00 SHARON: Hello, and welcome to "What's Up With That?" the series that
demystifies all things Chrome. I'm your host, Sharon, and today we're talking
about site isolation, what exactly is it? How does it fit into navigation? What
about security? Today's special guest telling us all about it is Charlie. He
helped make site isolation happen. He's worked on Chrome since before the
launch, though as an intern, and since then, he has worked all over navigation
including things like the process model, site isolation, and just making sure
that changes to that are all secure and that things still work. So welcome,
Charlie.
0:30 CHARLIE: Thank you for having me.
0:30 SHARON: OK, let's start off with what is site isolation?
0:36 CHARLIE: So site isolation is a way to use Chrome's sandbox to try to
protect websites from each other. So it's a way to improve the browser security
model.
0:43 SHARON: OK, we like security. And can you tell us a bit about what a
sandbox is?
0:50 CHARLIE: Yeah. So sandbox is a mechanism that tries to keep web pages
contained within the renderer process even if something goes wrong. So if they
find a bug to exploit, it should still be hard for them to get out and install
malware on your computer or do things outside the renderer process.
1:05 SHARON: OK. Last video, we talked all about the different types of
processes and what they all do. So why are we particularly concerned about
renderer processes in this case?
1:17 CHARLIE: Sure. So renderer processes really have the most attacked
surface. So browser's job is to go out and get web pages from websites you
don't necessarily trust, pull down code, and run that on your machine. And most
of that code is running within this sandbox renderer process. So an attacker
may be able to run code in there and try and find bugs to exploit. The renderer
process is where most of those bugs are going to be. It's where the attacker
has the most options and direct control. So we want that to be locked down as
much as possible.
1:55 SHARON: OK. Right. So how exactly does this work? How am I getting
attacked?
2:02 CHARLIE: Right. So all software tends to have bugs, and an attacker will
try to find ways to exercise those bugs in the code to let them accomplish
their goals. So maybe they find that there's some parsing error, and so the
code in the web browser does the wrong thing when you give it some input. And
for an attacker on the web, that input could be something in HTML or JavaScript
that makes the browser do something wrong, and maybe they can use that to their
advantage.
2:36 SHARON: So say I do get attacked. What's the worst that can happen? Should
I really be concerned about this?
2:42 CHARLIE: Well, that's exactly what we think about in the browser security
model is, what's the worst that can happen? How can we make that not be as bad
as it could be? So in the old days when browsers were first introduced, it was
basically just a program, it's all one process. And it would fetch content from
the web, and so if something went wrong, there was no sandbox. There was no
other protection. You were just relying on there not being bugs in the browser.
But if something did go wrong, that web page could then install malware in your
computer and your whole machine would be compromised. And so that might give
them access to files on your disk or other things that you have access to on
the network like your bank account or so on, which, obviously, is a big deal.
3:28 SHARON: Right. Yeah, it would like to not have other people have that. OK,
cool. So can you tell us a bit about how site isolation actually works? What is
the mechanism behind it? What is going on?
3:41 CHARLIE: Sure. So when Chrome launched, we were using the sandbox to try
and prevent that first type of attack of installing malware in your machine or
having access to the file system or to network, but we wanted it to do more to
protect websites from each other. And to do that, you have to treat each
renderer process like it can only load pages from one website. And if you go to
visit a different website, that should be in a different process. And so
there's a bunch of aspects of site isolation for, well, OK, as you go from one
website to another, we need to use a different process, but the big one that
made this such a large change to the browser was making cross-site iframes run
in a different process.
4:30 SHARON: What is an iframe?
4:30 CHARLIE: So an iframe is basically a web page embedded inside of another
web page. So you can think about this as an ad or a YouTube video. It might be
from a different origin from the top level page that you're viewing, but it's
another web page embedded inside it. And so that has a different security
context that it's running on.
4:54 SHARON: You mentioned it might be from a different origin, and it might be
useful to know what the difference between a site and an origin is, especially
as it relates to what we call site isolation.
5:00 CHARLIE: Yeah, so we're being specific in using the word site isolation
instead of origin isolation. A site is a little broader, so it's a registered
domain name plus a scheme, so https://example.com would be an example of a
site, but you might have many origins within that as you get into subdomains.
So if you had foo.example.com and bar.example.com, those would be different
origins within the example.com site. Web security models all about origins.
Those foo.example.com and bar.example.com shouldn't be able to access each
other, but there are some old web APIs that have stuck with us like being able
to modify something called document.domain, where two different origins in the
same site can sometimes access and modify each other, and we don't know in
advance if they're going to do this. So therefore, we have to put everything
from a site in the same process because we can't move things from one process
to another later. We hope that someday we can get rid of that. There is some
work in progress for that to go away. Maybe we can do origins.
6:10 SHARON: Cool. So the site isolation stuff is all in the browser, so that's
the browser security model. What's the difference between that and the web
security model? Are these the same?
6:16 CHARLIE: They're certainly related to each other, but they're a little
different. So the web security model is conceptually what can web pages do, in
general, what are they allow to access for another website or for another
origin or for things on your machine, camera, and microphone, and things like
that. And the browser security model is more about how we build that and how do
we enforce the web security model, but also, provide some extra lines of
defense in case things go wrong. So that incorporates things like the sandbox,
the multi-process architecture, site isolation. What can we do to make it
harder for attackers to accomplish their goals, even if there are bugs.
7:04 SHARON: It seems like good stuff to have. So a couple other, maybe
definitions to get through. So what is a security context?
7:10 CHARLIE: Yeah. So that's the environment where this code is running. In
the web, it's something like an HTML document or a worker, like a service
worker, someplace where code is running from what we would call security
principal, which is, for the web, something like an origin. So if you have an
HTML document you've gotten from example.com, that's running in a web page in
the browser that has a security context. And an ad from a different origin
would be a different security context.
7:49 SHARON: And a security context and security principal always the same, or
are there times where those are different?
7:55 CHARLIE: No, you can have two different security contexts, like two
different documents that had the same security principal, and they might be
able to access each other. Or they might be living in different processes, but
still have access to the same cookies or local storage, things on disk. So the
principal is, this is the entity that has access to something.
8:16 SHARON: When people think of site isolation, often, they think about
navigation as well, partly because that's how our teams are structured, so how
exactly do these relate, and where in the life of a navigation - name of a
talk, want to go watch - does site isolation stuff happen?
8:34 CHARLIE: Yeah, so they're definitely related. So navigation is about how
you get from one web page to another, and that might be a different security
context, different security principal. And I got interested and involved with
navigation because of site isolation, my interest in that. And as you think of
the web browser as an operating system for running programs, it's how you're
getting from one program to another. So it would make sense that as you go from
one website to another, you get a new container for that, a new process. So
that was one part of how I got involved with navigation was building what we
call a cross-process navigation. So you have to start in one renderer process
and then be able to end up in a different renderer process with all the various
parts of the life of a navigation, where you go out to the network and ask for
the web page. And maybe you have to run some - before, unload events first to
see if you were actually allowed to leave, or maybe the user has some unsaved
data. All the timing of that is tricky, and then switch to the new process at
the right time. So navigation has a lot of different corner cases and
complexity that then get involved with the process model so that you can do
this in any type of navigation, in any frame. And so that's where our team ends
up involved in both site installation work and the navigation code and the
browser.
10:06 SHARON: Right. What a cool team. So you mentioned the process model, and
that is related, but not the same as the multi-process architecture. So let's
just quickly mention what the differences there are, because in this case, it
is important.
10:22 CHARLIE: Yes. So the process model for the browser is how we decide what
goes into each process, and specifically, we're talking about renderer
processes and web pages here, where we can decide, as we create new tabs and we
visit websites on those tabs which renderer processes are we going to use. So
without site isolation, maybe it's that each newly created tab gets its own
process. But anything you visit within a given tab stays in the same process.
Or maybe you can do some cross-process transitions within that tab as long as
you're not breaking scripting between existing pages. So site isolation defines
a process model that says you can never put web pages from two different
websites in the same renderer process, and then that provides a bunch of
constraints for how navigation works.
11:16 SHARON: And then the multi-process architecture is more just the fact
that we have these different processes.
11:22 CHARLIE: Right. It makes this possible, because it gives us this ability
to run browser code and renderer code separately and plug-in code and other
utilities and network service that - yeah.
11:27 SHARON: Yeah, because back in the day, that wasn't the case. That's what
made Chrome different.
11:34 CHARLIE: Right. So when Chrome launched, we were moving from this more
monolithic browser architecture that was common at the time, where everything
ran in one process to separate browser process, renderer process that was
sandbox, and we could play around with different process models. So when Chrome
launched, part of the internship that I was doing was looking at what should go
in each renderer process? What process model should we use? And we thought site
isolation would be great, but you can't really do that yet. It's too
complicated to get the iframe things to work. So maybe we can do a hybrid where
sometimes we swap to a new renderer process as you go from one website to
another at the top level, but then other times, you'll end up with multiple
sites in the same process. And it was like that until we were able to ship site
isolation much later.
12:23 SHARON: Cool. So this sounds, conceptually, like it makes sense. You want
to have different sites/different origins in different renderer processes, and
it sounds like it shouldn't be that hard, but it is/was/still is very hard. So
can you briefly just tell us about how and why navigation is hard? Because
other people who don't work on browsers at all or tech or even people in
Chrome, I feel like, they're just like, isn't navigation just done? This just
works, right? So why is there still a team doing this, and what is so hard
about it?
12:59 CHARLIE: That was often the most common question we would get when we
were explaining what work we were doing on site isolation was, oh, doesn't it
already work that way? And it's like, yeah, I wish. Yeah, so there's two parts
of that. There is, why is navigation hard, and why is site isolation hard? So
tying into any kind of navigation thing is tricky because of how many different
types of navigation and corner cases there are. As you're going from one page
to another, is it redirecting to a different website, or does it end up not
actually giving you a web page back? Maybe it's a download. Is it not moving to
a new document at all and it's just a navigation within the same document,
which has different properties. There's a lot of things that we need to keep
track of in the navigation system and how it affects the back-forward history
that makes it tricky. And then it continues to get more complicated over time,
as we add new fancy features to the browser. So there's lots of things that
we've layered on top of that with back-forward cache and pre-rendering and new
navigation APIs for interacting with session history, which make things faster
and nicer for web developers, but also, provide even more ways that navigation
can get into interesting corner cases, like why didn't we think that about
pre-rendering a page with a sandbox iframe that might cause a different path to
happen? So that's where a lot of the complexity in navigation comes from and
why there's ongoing challenges, even though it's something that seems like it
has worked from the beginning. Site isolation being hard is related to the fact
that you can navigate in any frame in a page, and iframes being embedded is
something that we used to just handle entirely within the renderer process. So
this is a fun way to think about the multi-process architectures that shipped
around when Chrome was launched and then other browsers that did similar things
was we could take the rendering engines that had existed already for a decade
or so from existing browsers and just run multiple copies of them. So as you
open up a new tab, we've got another copy of WebKit, which is the rendering
engine we were using at the time, and we had to make changes to make it work in
the renderer process talking to the browser process, but we didn't really need
to change fundamentally how it rendered a web page. And so it was in charge of
deciding what network requests it was going to make for getting iframe content
and then rendering the iframe and where a click was going to go, that kind of
thing. And to do out-of-process iframes, you need the iframe inside the page to
be rendered in an entirely separate renderer process. And that is a big change
to how the rendering engine works. And so that was what took all the time and
what made site isolation a multi-year project, where we had to fundamentally
introduce these new data structures, like render frame host and representations
of each frame in the browser process, change how the rendering engine worked,
and then change all the features in the browser that assumed the renderer would
take care of this. And now, we need to handle them spread across multiple
processes.
16:28 SHARON: How did that fit in with the forking of WebKit into Blink, which
is what the rendering engine in Chrome is now?
16:34 CHARLIE: Yeah, so the fork was absolutely necessary to do this. We pretty
much had to wait until that happened, because we didn't have as much
flexibility to make large, architectural changes to WebKit as we were sharing
it with other browsers, like Safari and so on. We were looking into ways that
we might be able to of approximate what we want, but as the decision to fork
WebKit into Blink was made, it opened the door and gave us a chance to say, we
can do this now. Let's go ahead and dive in and make site isolation happen.
17:14 SHARON: That makes sense. In a quite early talk, it was probably from 10
years ago now, Darin gave a talk, and he was saying how having per site, having
each renderer have just one site in it was like the Holy Grail, and he seemed
very excited about it. So that makes sense because of the -
17:34 CHARLIE: Yeah, and it feels like the natural use of a sandbox in a
browser. The same reason that we got all these questions, like isn't that how
it already works? Is that it's such a natural fit for we have a container for
running a web page, what is this unit that you want to put in the container?
It's a website that you're visiting. And the fact that we couldn't easily pull
them apart into different processes was totally an artifact of how web browsers
were originally built that didn't foresee this - oh, they're being used as
complicated programs with different security principles.
18:13 SHARON: Yeah, in a different talk, John from Episode 3 content had
mentioned that site isolation was basically the biggest change to Chrome since
it launched and probably is still the case. So yeah, it was a project.
18:29 CHARLIE: Yeah, it was a long project, and we had a lot of help from many
people across the Chrome team, but it was cool to get to this outcome, where we
could then say, now we have processes that are locked to a single security
principal, so it's nice to get to that outcome.
18:47 SHARON: So for people on the Chrome team now, what do you wish they knew
about site isolation/navigation in terms of as an engineer? Because before, I
was on a different team, and someone on my team said, oh, you should know how
navigation works. And I said, yeah, that sounds like a great idea, but how? So
what are things that people should just keep in mind when they're out and about
doing their stuff that usually isn't directly interacting with navigation even?
19:14 CHARLIE: Right. Yeah, so I think that the biggest thing to keep in mind
is to limit what we put into a renderer process or what a renderer process has
access to, to not include cross-site data. And we already have to have this
mentality in Chrome that we don't trust the renderer process. If it sends an
IPC or Mojo call to the browser process, we should assume that it might be
lying or asking for things that it shouldn't have access to. And I think it's
in the back of a lot of people's heads already that, OK, I shouldn't let it
like go get a file from disk, but also, we don't want it to mix data from
different sites. It shouldn't be able to ask for something from - to lie and
say, oh, I'm origin x, please give me data from there. Because that's often how
APIs used to work in Chrome was, the renderer process would say what origin
it's asking for, and please give me the cookie for that.
20:12 SHARON: That sounds bananas.
20:12 CHARLIE: Yeah. Now, it sounds crazy. And so we think that the browser
process should already know based on who's asking what they have access to. So
that's really the thing that, in order to avoid site isolation bypasses, that's
what developers should keep in mind. So for features like Autofill or something
where it's easy to think, oh it would be nice for me to just have that data on
hand in the renderer process and I can just put it in when it's needed. No, you
should keep it out of the renderer, and then only provide the data that's
needed.
20:51 SHARON: In security-discuss circles, another term you hear often is a
renderer escape or renderer bypass or whatever. Is that the same as a site
isolation bypass, or are those different?
21:00 CHARLIE: Yeah, so sandbox escape is a common term that is used for when
an attacker has found some bug already, and then they are able to escalate
their privilege to affect the browser process or get out of the browser process
and to the operating system. So a sandbox escape is a lot worse than a site
isolation bypass. It would give the attacker control of your computer and
installing malware and things. So sandbox escapes, we want to have as many
boundaries as possible to try to prevent that from happening. A site isolation
bypass is not as bad as a full sandbox escape, but it would be a way that an
attacker could find some way to get access to another website's data or attack
that website. So maybe it's able to trick the browser into giving it cookies
from that site or using the permissions that have been granted to another
website. And then renderer compromise would be another type of exploit that
happens entirely within the renderer process. That's one where the attacker has
found some bug, they can run whatever native code they want within the renderer
process, and that's what we're trying to contain with the sandbox and what site
isolation tries to make even less useful to the attacker. Because even if you
can run any code you want within the renderer process, you shouldn't be able to
install malware because of the sandbox, and you shouldn't be able to access
other site's data because of site isolation
22:47 SHARON: Yeah, I think when I was learning about site isolation and stuff,
I was like, whoa, this is a lot going on, and most people just have no idea
about it. And in terms of how other bugs and whatnot, something that is often
mentioned is Spectre and that still affect thing. And the only mention, on
Wikipedia in the Mitigation section of Spectre, they mentioned site isolation,
but I was like, this should have its own page, so maybe one day -
23:20 CHARLIE: Maybe one day.
23:20 SHARON: one of us is going to write a thing about that. But yeah, that's
kind of the bug, right? So can you just talk about that?
23:25 CHARLIE: Yeah, so Spectre and Meltdown were certainly a big change to the
security landscape for browsers. At a high level, those are attacks that are
based on the micro-architectural parts of the CPU. The way that the basic CPU
hardware works, there are ways to leak data that weren't anticipated. And we
can view it as it gives attackers what we call an arbitrary read primitive,
something that can access anything in your address space in a process. You can
think about it as the CPU wants to not stop and wait for going and accessing
data from RAM, so it thinks, well, I'll just guess what the answer is going to
be and then keep running some instructions. And if I was right in my guess, the
next several steps are done already, and I can just move on from there. And if
I was wrong, well, I just throw away that work, and I do the right thing, and
we move on, and everybody is fine. But attackers found that while you're doing
those extra steps ahead of time, you're also affecting the caches on the CPU,
and cache timing attacks let you find out what work was done there. So some
very clever researchers found that you can do some things in those extra steps
that happen in this speculative state to find out what data is in addresses you
don't have access to. And so places where we thought some check in the renderer
process could say, oh, you don't have access to this thing from another
website. We're fine. Now, you could get access to it, just based on how CPUs
work, without needing any bugs in the browser. So now, we're thinking, OK,
we're running JavaScript, and if it can leak things from the renderer process,
we can't have data we're stealing in the renderer process. You could try to
find ways to prevent those attacks, but those ended up being difficult. And
ultimately, we found that it wasn't really feasible to prevent the attacks in
all the forms that they could happen. So site isolation became the first line
of defense to say, data from other websites, data we're stealing should not be
in the render process where a Spectre attack could get access to it. Now, that
was actually one of the big, exciting events that helped us accelerate the work
on site isolation and get it launched when that was discovered in 2017 or 2018.
26:24 SHARON: So at that point, site isolation was mostly done, and it was just
getting it out?
26:24 CHARLIE: Yeah, it was really interesting. So we'd been working on it for
several years for a different reason for the fact that we wanted it to be a
second line of defense against compromised rendering processes. We assume
people are going to find bugs in the renderer process, in V8 or in Blink or
things like that, and we wanted that to not be as big of a problem. We wanted
to say, OK, whatever. There isn't data we're stealing in that process. We had
already shipped some initial uses of out-of-process iframes in 2017 for
extensions, and we were working on trying to do some sort of initial steps
towards using site isolation for some websites and see how that goes when we
found out about Spectre and Meltdown. And so that next six months or so was a
very accelerated, OK, we've got to get everything else working with the way
that site isolation interacted with DevTools and extensions and printing and a
bunch of other features in the browser that we needed to get working. And so it
was an interesting accelerated rollout, where we even had an optional mode and
an enterprise policy where you could say, I don't care if printing doesn't
work, turn on site isolation so that Spectre attacks won't find other data
we're stealing in the process. And then we got to where it was working well
enough we could ship it for all desktop users in, I think it was Chrome 67 in
mid 2018. So it was good that far along that we were able to ship the full
thing within a few months.
28:19 SHARON: Very cool. Yeah, I mean, those are all the things that make
navigation hard, like extensions as part of it, and there's just all these
things and all of these go-through navigation and effective, so that's very
exciting. So what is the state of site isolation now, and are there still going
to be changes? That was a few years ago, so are things still happening?
28:45 CHARLIE: Yeah, we're still trying to make several different improvements.
We've made several improvements since the launch, so that initial launch, since
it was mostly focused on Spectre, didn't have all the defenses we wanted
against compromise renderer processes, because the Spectre attack can't affect
actual running code. It can't go and lie to the browser process. It won't give
you full control over what's running in the renderer process, but it can leak
all the data that's in there. So anything that a web page can pull into a
renderer process can be leaked. So after that initial launch, we needed to go
and actually finish the compromise renderer defenses and say, OK, all the IPCs
that come out of the renderer, make sure they can't lie and steal someone
else's data, so get all the browser process enforcements in place. Another big
thing after that was getting it to work on Android, where we wanted this
defense. We have a much different set of resource constraints on mobile
devices, where there's not nearly as much memory and renderer processes are
often killed or just discarded. So there, we couldn't isolate all websites from
each other. We had to use heuristics to say, here are the sites that need it
the most, so sites where users log in, in general, or sites where this
particular user is logged in or other signals that this site probably needs
some protection, we'll give those isolation, and then other ones can share a
renderer process. So we've tried to improve those heuristics and isolate as
many sites as we can there. And then things that we weren't initially isolating
from each other, we have been able to. So extensions was an example where we
started by just making sure extensions didn't share a process with web pages,
but now, we make sure that no extensions can share a process with each other.
And we're trying to get to where we could isolate all origins from each other,
depending on what resources are available, but there's some changes with,
basically, deprecating document.domain that are in flight that might make that
possible.
30:57 SHARON: So say I have a fancy computer, and I just want maximum site
isolation because I care about security. How do I go get that?
31:03 CHARLIE: Yeah, so there are some experimental ways to do that. You can go
into the chrome://flags page, where you can turn on and off different features
and experiments that are in progress. And there's one there called strict
origin isolation, which will ensure that all origins within various sites are
isolated from each other, and that works on desktop and Android. It'll just
create slightly more processes than we do today. Similarly, on Android, if you
wanted to isolate all sites, there is an option for full site isolation there
called site-per-process, which you could use that or strict origin isolation to
get maximum site isolation today.
31:51 SHARON: So another platform that Chrome does exist on is iOS. So can we
do anything there? Why is that not in [INAUDIBLE]
31:58 CHARLIE: So Chrome for iOS has to use Apple's WebKit rendering engine
today, and current versions doesn't have site isolation, and we don't have the
ability to run our own rendering engine that has support for it. So we don't
have it today, but my understanding is that WebKit is working on site isolation
as well, and actually, Firefox has also shipped their version of site
isolation, which is pretty cool to see other browser vendors building this as
well. And so if that were made available to other third-party browsers on iOS,
then maybe it could be used there. But at the moment, we're constrained, and we
can't ship it on that platform.
32:47 SHARON: In terms of how the internet happens, this seems like a good
thing to just have generally. So is it possible that this could be a spec one
day that any browser should implement, or is it - because it's under the hood
and it's not something that's maybe necessarily visible to websites, maybe
that's not part of it, but is this an option?
33:04 CHARLIE: Yeah. I think it ties back to the earlier question about web
security model versus browser security model, where the web visible parts of
this, it's meant to be transparent to the websites. There's no behavior changes
to the web platform by turning on site isolation. There's not meant to be. And
so it's not really a spec visible thing, it's more part of the browser's
architecture, the same way that there's no spec for sandboxes in a browser. You
could build a browser that doesn't have a sandbox, but today, the best practice
is to have better security by having a sandbox. So I think the relevant thing
for web specs is just that we don't introduce APIs that don't work when
different origins are in different processes. And that sounds like, well OK,
that makes sense, and thankfully, we were sort of in that state to begin with,
and in some places we got lucky. Like postmessage is asynchronous, which is a
mechanism for sending a message to another origin, but they don't need to run
in the same process because that message will be delivered at a later time. So
we can send it to a different process running on a different thread. Some
places we got unlucky, like document.domain, where web APIs said that different
origins can script each other if they agree that it's OK, as long as they're in
the same site, and that constrained us in the process model. So we're trying to
improve things about the web spec. You could almost say that deprecating
document.domain is a way of seeing that the browser security model and the web
security model aligning with each other to say, OK, we want to use processes.
We want this asynchronous boundary. You shouldn't be able to script other
origins from the same site. So I think that's the closest is making sure that
specced APIs fit well with this multi-process site isolation world.
35:12 SHARON: There are some headers and tags and whatever that websites can
use to alter how the browser handles things though, right?
35:23 CHARLIE: Yes, absolutely. And those are both good ways that websites can
more effectively isolate themselves, in general, both from web visible behavior
and from the browser's architecture and ways that browsers that don't have
full-site isolation, that don't have out-of-process iframes in all cases, web
pages might still be able to get some of the isolation benefits using those
APIs. And so those are things like cross-origin opener policies that says, for
example, if I open a pop up to a different website, there's not going to be any
communication between me and that pop up. So it's OK to put them in different
processes, and they can be better isolated from each other. That's good from an
architecture perspective. It's also nice from a web perspective in that you
don't have to worry about is the window.opener variable in the pop up able to
be used to do sneaky things to the page that opened it. So there's nice,
web-visible reasons to use something like a cross-origin opener policy to keep
them protected from each other. So that's one example of that. There's others
as well.
36:46 SHARON: Something I've seen around that is a web spec is content security
policy. Is that related to any of this at all?
36:52 CHARLIE: It kind of is. Yeah, so content security policy is another way
for websites to tell the browser better ways to secure that site. And so some
of it is useful for saying I want to do a better job preventing cross-site
scripting attacks on my page, so don't run a script if you find it in these
random places. It should only come from these URLs or in these contexts on my
page. So that's more about what happens in a given renderer process, but there
are some places where content security policy does overlap a bit with site
isolation. There is a sandbox value you can put into a content security policy
header that makes it get treated like a sandbox iframe. And while we don't yet
have support for putting sandbox iframes in another process, that was work
that's in progress and we're hoping to ship before long. And so CSP headers
that say sandbox will also be able to be isolated from the rest of their site.
So if they have some kind of untrustworthy content in them, that won't be able
to attack the rest of the site.
38:04 SHARON: OK. Yeah, so it's that difference between the web versus browser,
what's visible, what's an option versus how it's actually implemented.
38:11 CHARLIE: Right.
38:11 SHARON: Cool. So a lot of this, we've talked about security a lot, and I
think for people who don't know about security, the image you have is people
trying to break into - like I'm in, that whole thing, and that's very much not
what's going on here, because we're not trying to break things. So can you tell
us just a bit about the difference between offensive and defensive security and
how this is one of those.
38:38 CHARLIE: Yeah, so a lot of attention in the security space goes to big,
exciting, flashy attacks that are found. On the offensive side, look, I found a
way to break the security of this thing, and we have big vulnerability reward
bounties to reward when people find these things so we can get them fixed. So
even on the defensive side, you want people working on offensive security,
looking for these bugs, looking for things that need to be fixed so we can
defend users. But the defensive side is super important and I find it a
satisfying place to be, even if it isn't always as glamorous. It's like, you
have to have all the defenses in place and all of these different attacks that
are found, it's like, yeah, we need to fix them, and we need to find ways to
make that less likely. But ultimately, this is the real goal, is we want to
have systems that we can trust, that are safe to use, and that we can go and
visit untrustworthy web content and not have to worry about it. You need these
extra lines of defense. You need all these different ways of defending the
product and shipping security fixes fast, all the things that security works on
in a defensive sense so that people can use these systems and depend on them in
their lives. So that's the fun and fulfilling part of this, even if it isn't
quite as glamorous as I found a sandbox escape, but those are fun to look at
too.
40:17 SHARON: I heard security described as a bunch of layers of Swiss cheese.
So you have all these different layers of mitigations to try to keep bad things
from happening, but each of them is not perfect. And if the holes in those
layers line up, then that's where you get a vulnerability. So in this very
approximate metaphor, what are the neighboring slices of cheese to site
isolation? What other defensive things are related to this and are trying to
achieve the same goal sure?
40:46 CHARLIE: Sure. Yeah, so there's going to be holes in any layer that you
build we. Have bugs in software, and in site isolation's case, it's trying to
put this boundary between the renderer process, where we assume everything is
compromised already and the data that the attacker wants to get to, other
websites, data on your machine and so on. So the adjacent layers of Swiss
cheese would be within the render process, we do have security checks that try
to say we have same origin policy checks, things that try to keep certain data
opaque to a web page so the JavaScript can't look at it. Those checks in the
renderer process do matter. Today, we do have multiple origins from the same
site in the same process. The renderer process' job is to make sure that they
don't attack each other. But there's some fairly large Swiss cheese holes in
that layer that we try to fix whenever we find them. And so site isolation's
job is to be the next layer, which won't have holes in the same places,
hopefully. Its holes, site isolation bypasses, might be, oh, there's some way
for the renderer process to ask the browser process for something it shouldn't
have access to, and it tricks it, and it gets access to that. We hope that it's
tough to line those holes up, that an attacker has to find both a bug in the
renderer process and a bug in site isolation and luck out in that those bugs
line up and you can get to one from the other in order to get access to another
website's data. And then the next layer of Swiss cheese would be all the things
that the browser process does to keep the renderer isolated from the user's
machine and the sandbox itself that you shouldn't have access to the OS APIs
and so on. So those would be other ways to try and get beyond site isolation to
other things.
42:48 SHARON: That makes sense. Yeah, when I first heard about it, I was like,
oh, that's such a fun way to think about it, really. It's a good visual seeing,
OK, this is how things go wrong. All right, cool. Do you have any other fun
stories about site isolation, making it happen, stuff since then?
43:08 CHARLIE: I mean, it's been a really fun journey the whole way. There's
been different projects and different exploratory phases, where we weren't sure
what was going to work or what we needed to get done. I've worked with a bunch
of great interns and people who have been on the team on early phases like
getting postmessage to work across renderer processes, later phases about what
would it look like to build out a process iframes using something like the
plugin infrastructure, just is this feasible? Or what is it that we could
protect that a particular renderer process is allowed to ask for. If can we
keep allowing JavaScript data from other websites into a renderer process,
while blocking your bank account information from getting it, those both look
like network responses from different websites, but one has to be let through
for compatibility reasons, and one has to be blocked. Can we build that? Are we
doing a good job of keeping that sensitive data out? These are things that. We
had some great PhD interns working with us on, and ultimately, got us to where
we could ship this and protect a lot of data. So it's fun working with all
those people along the way.
44:35 SHARON: Yeah, that sounds very cool. These days, so earlier on, you
mentioned people whose questions were like, why doesn't this already happen? So
these days, it does happen more or less like that. So what kind of questions or
misconceptions do you still see folks who typically work on Chrome still have
when it comes to this kind of stuff?
44:52 CHARLIE: I think it's often assuming that navigation is simpler than it
is and not realizing how many corner cases matter and how all of these
different features that have built on top of navigation interact with each
other. So I think that's where we spend a lot of our time these days beyond the
we want to improve site isolation. We want to make these abstractions easier
for other people to understand. So I think that's one of the big challenges now
is how many different directions the navigation code has been pulled and how
those things interact with each other.
45:24 SHARON: Right. And that's kind of - was intentional initially, right? You
don't want everyone who works on Chrome to have to know how all of this works,
but then when you hide it so well, they're like, oh, this is fine. I'll just do
my thing. It'll just be my one thing, but then everyone has such a thing, and
then it becomes too many things. Yeah, I used to work on a different part of
Chrome that was not related to this, and you see some of these big classes,
like web content or whatever. You're like, oh, I'll just get what I need from
that, and things will be fine, but you just don't even have any idea of all the
things that could go wrong. So it's cool that someone is out here trying to
keep that under control.
46:00 CHARLIE: And I'm glad there's a lot of efforts to try to improve the APIs
for how we expose these things, web content to web content, observer which is
growing into quite a large API with many users, looking at ways to make these
APIs easier to use and harder to make mistakes with. So I think those are
worthwhile efforts.
46:20 SHARON: OK. Cool. Well, I think that covers all of it. Now, folks know
how isolation works. Problem solved. This is great. All right, thank you very
much. Great.
46:34 CHARLIE: Thanks. Oh, no. What? OK, hold on.