[docs] add "What's Up With That" transcripts
Change-Id: Ie7f34cd19b5f97f9330e914d13de0f6e3ea2d7de Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4886394 Commit-Queue: Nigel Tao <nigeltao@chromium.org> Reviewed-by: Sharon Yang <yangsharon@chromium.org> Cr-Commit-Position: refs/heads/main@{#1202896}
This commit is contained in:

committed by
Chromium LUCI CQ

parent
238a174121
commit
187a479a8a
@ -438,6 +438,22 @@ used when committed.
|
||||
### UI
|
||||
* [Chromium UI Platform](ui/index.md) - All things user interface
|
||||
|
||||
### What's Up With That Transcripts
|
||||
|
||||
These are transcripts of [What's Up With
|
||||
That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq),
|
||||
a video series of interviews with Chromium software engineers.
|
||||
|
||||
* [What's Up With Pointers - Episode 1](transcripts/wuwt-e01-pointers.md)
|
||||
* [What's Up With DCHECKs - Episode 2](transcripts/wuwt-e02-dchecks.md)
|
||||
* [What's Up With //content - Episode 3](transcripts/wuwt-e03-content.md)
|
||||
* [What's Up With Tests - Episode 4](transcripts/wuwt-e04-tests.md)
|
||||
* [What's Up With BUILD.gn - Episode 5](transcripts/wuwt-e05-build-gn.md)
|
||||
* [What's Up With Open Source - Episode 6](transcripts/wuwt-e06-open-source.md)
|
||||
* [What's Up With Mojo - Episode 7](transcripts/wuwt-e07-mojo.md)
|
||||
* [What's Up With Processes - Episode 8](transcripts/wuwt-e08-processes.md)
|
||||
* [What's Up With Site Isolation - Episode 9](transcripts/wuwt-e09-site-isolation.md)
|
||||
|
||||
### Probably Obsolete
|
||||
* [TPM Quick Reference](tpm_quick_ref.md) - Trusted Platform Module notes.
|
||||
* [System Hardening Features](system_hardening_features.md) - A list of
|
||||
|
601
docs/transcripts/wuwt-e01-pointers.md
Normal file
601
docs/transcripts/wuwt-e01-pointers.md
Normal file
@ -0,0 +1,601 @@
|
||||
# What’s Up With Pointers
|
||||
|
||||
This is a transcript of [What's Up With
|
||||
That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)
|
||||
Episode 1, a 2022 video discussion between [Sharon (yangsharon@chromium.org)
|
||||
and Dana (danakj@chromium.org)](https://www.youtube.com/watch?v=MpwbWSEDfjM).
|
||||
|
||||
The transcript was automatically generated by speech-to-text software. It may
|
||||
contain minor errors.
|
||||
|
||||
---
|
||||
|
||||
Welcome to the first episode of What’s Up With That, all about pointers! Our
|
||||
special guest is C++ expert Dana. This talk covers smart pointer types we have
|
||||
in Chrome, how to use them, and what can go wrong.
|
||||
|
||||
Notes:
|
||||
- https://docs.google.com/document/d/1VRevv8JhlP4I8fIlvf87IrW2IRjE0PbkSfIcI6-UbJo/edit
|
||||
|
||||
Links:
|
||||
- [Life of a Vulnerability](https://www.youtube.com/watch?v=HAJAEQrPUN0)
|
||||
- [MiraclePtr](https://www.youtube.com/watch?v=WhI1NWbGvpE)
|
||||
|
||||
---
|
||||
|
||||
0:00 SHARON: Hi, everyone, and welcome to the first installment of "What's Up
|
||||
With That", the series that demystifies all things Chrome. I'm your host,
|
||||
Sharon, and today's inaugural episode will be all about pointers. There are so
|
||||
many types of types - which one should I use? What can possibly go wrong? Our
|
||||
guest today is Dana, who is one of our Base and C++ OWNERS and is currently
|
||||
working on introducing Rust to Chromium. Previously, she was part of bringing
|
||||
C++ 11 support to the Android NDK and then to Chrome. Today, she'll be telling
|
||||
us what's up with points. Welcome, Dana!
|
||||
|
||||
00:31 DANA: Thank you, Sharon. It's super exciting to be here. Thank you for
|
||||
letting me be on your podcast thingy.
|
||||
|
||||
00:36 SHARON: Yeah, thanks for being the first episode. So let's just jump
|
||||
right in. So when you use pointers wrong, what can go wrong? What are the
|
||||
problems? What can happen?
|
||||
|
||||
00:48 DANA: So pointers are a big cause in security problems for Chrome, and
|
||||
that's what we mostly think about when things go wrong with pointers. So you
|
||||
have a pointer to some thing, like you've pointed to a goat. And then you
|
||||
delete the goat, and you allocate some new thing - a cow. And it gets stuck in
|
||||
the same spot. Your pointer didn't change. It's still pointing to what it
|
||||
thinks is a goat, but there's now a cow there. And so when you go to use that
|
||||
pointer, you use something different. And this is a tool that malicious actors
|
||||
use to exploit software, like Chrome, in order to gain access to your system,
|
||||
your information, et cetera.
|
||||
|
||||
01:39 SHARON: And we want to avoid those. So what's that general type of attack
|
||||
called?
|
||||
|
||||
01:39 DANA: That's a Use-After-Free because you have freed the goat and
|
||||
replaced it with a cow. And you're using your pointer, but the thing it pointed
|
||||
to was freed. There are other kinds of pointer badness that can happen. If you
|
||||
take a pointer and you add to it some number, or you go to an offset off the
|
||||
pointer, and you have an array of five things, and you go and read 20, or minus
|
||||
2, or something, now you're reading out of bounds of that memory allocation.
|
||||
And that's not good. these are both memory safety bugs that occur a lot with
|
||||
pointers.
|
||||
|
||||
02:23 SHARON: Today, we'll be mostly looking at the Use-After-Free kind of
|
||||
bugs. We definitely see a lot of those. And if you want to see an example of
|
||||
one being used, Dana has previously done a talk called, "Life of a
|
||||
Vulnerability." It'll be linked below. You can check that out. So that being
|
||||
said, should we ever be using just a regular raw pointer in C++ in Chrome?
|
||||
|
||||
02:41 DANA: First of all, let's call them native pointers. You will see them
|
||||
called raw pointers a lot in literature and stuff. But later on, we'll see why
|
||||
that could be a bit ambiguous in this context. So we'll call them a native
|
||||
pointer. So should you use a native pointer? If you don't want to
|
||||
Use-After-Free, if you don't want a problem like that, no. However, there is a
|
||||
performance implication with using smart pointers, and so the answer is yes.
|
||||
The style guide that we have right now takes this pragmatic approach of saying
|
||||
you should use raw pointers for giving access to an object. So if you're
|
||||
passing them as a function parameter, you can share it as a pointer or a
|
||||
reference, which is like a pointer with slightly different rules. But you
|
||||
should not store native pointers as fields and objects because that is a place
|
||||
where they go wrong a lot. And you should not use a native pointer to express
|
||||
ownership. So before C++ 11, you would just say, this is my pointer, use a
|
||||
comment, say this one is owning it. And then if you wanted to pass the
|
||||
ownership, you just pass this native pointer over to something else as an
|
||||
argument, and put a comment and say this is passing ownership. And you just
|
||||
kind of hope it works out. But then it's very difficult. It requires the
|
||||
programmer to understand the whole system to do it correctly. There is no help.
|
||||
So in C++ 11, the type called `std::optional_ptr` - or sorry, `std::unique_ptr`
|
||||
- was introduced. And this is expressing unique ownership. That's why it's
|
||||
called `unique_ptr`. And it's just going to hold your pointer, and when it goes
|
||||
out of scope, it gets deleted. It can't be copied because it's unique
|
||||
ownership. But it can be moved around. And so if you're going to express
|
||||
ownership to an object in the heap, you should use a `unique_ptr`.
|
||||
|
||||
04:48 SHARON: That makes sense. And that sounds good. So you mentioned smart
|
||||
pointers before. You want to tell us a bit more about what those are? It sounds
|
||||
like `unique_ptr` is one of those.
|
||||
|
||||
04:55 DANA: Yes, so a smart pointer, which can also be referred to as a
|
||||
pointer-like object, perhaps as a subset of them, is a class that holds inside
|
||||
of it a pointer and mediates access to it in some way. So unique pointer
|
||||
mediates access by saying I own this pointer, I will delete this pointer when I
|
||||
go away, but I'll give you access to it. So you can use the arrow operator or
|
||||
the star operator to get at the underlying pointer. And you can construct them
|
||||
out of native pointers as well. So that's an example of a smart pointer.
|
||||
There's a whole bunch of smart pointers, but that's the general idea. I'm going
|
||||
to add something to what a native pointer is, while giving you access to it in
|
||||
some way.
|
||||
|
||||
05:40 SHARON: That makes sense. That's kind of what our main thing is going to
|
||||
be about today because you look around in Chrome. You'll see a lot of these
|
||||
wrapper types. It'll be a `unique_ptr` and then a type. And you'll see so many
|
||||
types of these, and talking to other people, myself, I find this all very
|
||||
confusing. So we'll cover some of the more common types today. We just talked
|
||||
about unique pointers. Next, talk about `absl::optional`. So why don't you tell
|
||||
us about that.
|
||||
|
||||
06:10 DANA: So that's actually a really great example of a pointer-like object
|
||||
that's not actually holding a pointer, so it's not a smart pointer. But it
|
||||
looks like one. So this is this distinction. So `absl::optional`, also known as
|
||||
`std::optional`, if you're not working in Chromium, and at some point, we will
|
||||
hopefully migrate to it, `std::optional` and `absl::optional` hold an object
|
||||
inside of it by value instead of by pointer. This means that the object is held
|
||||
in that space allocated for the `optional`. So the size of the `optional` is
|
||||
the size of the thing it's holding, plus some space for a presence flag.
|
||||
Whereas a `unique_ptr` holds only a pointer. And its size is the size of a
|
||||
pointer. And then the actual object lives elsewhere. So that's the difference
|
||||
in how you can think about them. But otherwise, they do look quite similar. An
|
||||
`optional` is a unique ownership because it's literally holding the object
|
||||
inside of it. However, an `optional` is copyable if the object inside is
|
||||
copyable, for instance. So it doesn't have quite the same semantics. And it
|
||||
doesn't require a heap allocation, the way unique pointer does because it's
|
||||
storing the memory in place. So if you have an `optional` on the stack, the
|
||||
object inside is also right there on the stack. That's good or bad, depending
|
||||
what you want. If you're worried about your object sizes, not so good. If
|
||||
you're worried about the cost of memory allocation and free, good. So this is
|
||||
the trade-off between the two.
|
||||
|
||||
07:51 SHARON: Can you give any examples of when you might want to use one
|
||||
versus the other? Like you mentioned some kind of general trade-offs, but any
|
||||
specific examples? Because I've definitely seen use cases where `unique_ptr` is
|
||||
used when maybe an `optional` makes more sense or vise versa. Maybe it's just
|
||||
because someone didn't know about it or it was chosen that way. Do you have any
|
||||
specific examples?
|
||||
|
||||
08:14 DANA: So one place where you might use a `unique_ptr`, even though
|
||||
`optional` is maybe the better choice, is because of forward declarations. So
|
||||
because an `optional` holds the type inside of it, it needs to know the type
|
||||
size, which means it needs to know the full declaration of that type, or the
|
||||
whole definition of that type. And a `unique_ptr` doesn't because it's just
|
||||
holding a pointer, so it only needs to know the size of a pointer. And so if
|
||||
you have a header file, and you don't want to include another header file, and
|
||||
you just want to forward declare the types, you can't stick an optional of that
|
||||
type right there because you don't know how big it's supposed to be. So that
|
||||
might be a case where it's maybe not the right choice, but for other
|
||||
constraining reasons, you choose to use a `unique_ptr` here. And you pay the
|
||||
cost of a heap allocation and free as a result. But when would you use an
|
||||
`optional`? So `optional` is fantastic for returning a value sometimes. I want
|
||||
to do this thing, and I want to give you back a result, but I might fail. Or
|
||||
sometimes there's no value to give you back. Typically, before C++ - what are
|
||||
we on now, was it came in 14? I'm going to say it wrong. That's OK. Before we
|
||||
had `absl::optional`, you would have to do different tricks. So you would pass
|
||||
in a native pointer as a parameter and return a bool as the return value to say
|
||||
did I populate the pointer. And yes, that works. But it's easy to mess it up.
|
||||
It also generates less optimal code. Pointers cause the optimizer to have
|
||||
troubles. And it doesn't express as nicely what your intention is. A return,
|
||||
this thing, sometimes. And so in place of using this pointer plus bool, you can
|
||||
put that into a single type, return an `optional`. Similar for holding
|
||||
something as a field, where you want it to be held in line in your class, but
|
||||
you don't always have it present, you can do that with an `optional` now, where
|
||||
you would have probably used a pointer before. Or a `union` or something, but
|
||||
that gets even more tricky. And then another place you might use it as a
|
||||
function argument. However, that's usually not the right choice for a function
|
||||
argument. Why? Because the `optional` holds the value inside of it.
|
||||
Constructing an `optional` requires constructing the whole object inside of it.
|
||||
And so that's not free. It can be arbitrarily expensive, depending on what your
|
||||
type is. And if your caller to your function doesn't have already an
|
||||
`optional`, they have to go and construct it to pass it to you. And that's a
|
||||
copy or move of that inner type. So generally, if you're going to receive a
|
||||
parameter, maybe sometimes, the right way to spell that is just to pass it as a
|
||||
pointer because a native pointer, which can be null, when it's not present.
|
||||
|
||||
11:29 SHARON: Hopefully that clarifies some things for people who are trying to
|
||||
decide which one best suits their use case. So moving on from that, some people
|
||||
might remember from a couple of years ago that instead of being called
|
||||
`absl::optional`, it used to be called `base::optional`. And do you want to
|
||||
quickly mention why we switched from `base` to `absl`? And you mentioned even
|
||||
switching to `std::optional`. Why this transition?
|
||||
|
||||
11:53 DANA: Yeah, absolutely. So as the C++ standards come out, we want to use
|
||||
them, but we can't until our toolchain is ready. What's our toolchain? So our
|
||||
compiler, our standard library, and unfortunately, we have more than one
|
||||
compiler that we need to worry about. So we have the NaCl compiler. Luckily, we
|
||||
just have Clang for the compiler choice we really have to worry about. But we
|
||||
do have to wait for these things to be ready, and for a code base to be ready
|
||||
to turn on the new standard because sometimes there are some non-backwards
|
||||
compatible changes. But we can forward port stuff out of the standard library
|
||||
into base. And so we've done that. We have a bunch of C++ 20 backport in base
|
||||
now. We had 17 back ports before. We turned on 17, now they should hopefully be
|
||||
gone. And so `base::optional` was an example of a backport, while `optional`
|
||||
was still considered experimental in the standard library. We adopted use of
|
||||
`absl` since then, and `absl` had also, essentially, a backport of the
|
||||
`optional` type` inside of it for presumably the same reasons. And so why have
|
||||
two when you can have one? That's a pretty good rule. And so we deprecated the
|
||||
`base` one, removed it, and moved everything to the `abslq one. One thing to
|
||||
note here, possibly interest, is we often add security hardening to things in
|
||||
`base`. And so sometimes there is available in the standard library something.
|
||||
But we choose not to use it and use something in `base` or `absl`, but we use
|
||||
it in `base` instead, because we have extra hardening checks. And so part of
|
||||
the process of removing `base::optional` and moving to `absl::optional` was
|
||||
ensuring those same security hardening checks are present in `absl`. And we're
|
||||
going to have to do the same thing to stop using `absl` and start using the
|
||||
standard one. And that's currently a work in progress.
|
||||
|
||||
13:48 SHARON: So let's go through some of the `base` types because that's
|
||||
definitely where the most of these kind of wrapper types live. So let's just
|
||||
start with one that I learned about recently, and that's a `scoped_refptr`.
|
||||
What's that? When should we use it?
|
||||
|
||||
13:59 DANA: So `scoped_refptr` is kind of your Chromium equivalent to
|
||||
`shared_ptr` in the standard library. So if you're familiar with that, it's
|
||||
quite similar, but it has some slight differences. So what is `scoped_refptr`?
|
||||
It gives you share ownership of the underlying object. And it's a smart
|
||||
pointer. It holds a pointer to an object that's allocated in the heap when all
|
||||
`scoped_refptr` that point to the same object are gone, it'll be deleted. So
|
||||
it's like `unique_ptr`, except it can be copied to add to your ref count,
|
||||
basically. And when all of them are gone, it's destroyed. And it gives access
|
||||
to the underlined pointer in exactly the same ways. Oh, but why is it different
|
||||
than `shared_ptr`? I did say it is. `scoped_refptr` requires the type that is
|
||||
held inside of it to inherit from `RefCounted` or `RefCountedThreadSafe`.
|
||||
`shared_ptr` doesn't require this. Why? So `shared_ptr` sticks an allocation
|
||||
beside your object and then puts your object here. So the ref count is
|
||||
externalized to your object being stored and owned by the shared pointer.
|
||||
Chromium took this position to be doing intrusive ref counting. So because we
|
||||
inherit from a known type, we stick the ref count in that base class,
|
||||
`RefCounted` or `RefCountedThreadSafe`. And so that is enforced by the
|
||||
compiler. You must inherit from one of these two in order to be stored and
|
||||
owned in a `scoped_refptr`. What's the difference? Ref counted is the default
|
||||
choice, but it's not thread safe. So the ref counting is cheap. It's the more
|
||||
performant one, but if you have a `scoped_refptr` on two different threads
|
||||
owning the same object, their ref counting will race, can be wrong, you can end
|
||||
up with a double free - which is another way that pointers can go wrong, two
|
||||
things free in the same thing - or you could end up with potentially not
|
||||
freeing it at all, probably. I guess I've never checked if that's possible. But
|
||||
they can race, and then bad things happen. Whereas, ref counted thread safe
|
||||
gives you atomic ref counting. So atomic means that across all threads, they're
|
||||
all going to have the same view of the state. And so it can be used across
|
||||
threads and be owned across threads. And the tricky part there is the last
|
||||
thread that owns that object is where it's going to be destroyed. So if your
|
||||
objects destructor does things that you expect to happen on a specific thread,
|
||||
you have to be super careful that you synchronize which thread that last
|
||||
reference is going away on, or it could explode in a really flakey way.
|
||||
|
||||
17:02 SHARON: This sounds useful in other ways. What are some kind of more
|
||||
design things to consider, in terms of when a scope ref pointer is useful and
|
||||
does help enforce things that you want to enforce, like relative lifetimes of
|
||||
certain objects?
|
||||
|
||||
17:15 DANA: Generally, we recommend that you don't use ref counting if you can
|
||||
help it. And that's because it's hard to understand when it's going to be
|
||||
destroyed, like I kind of alluded to with the thread situation. Even in a
|
||||
single thread situation, how do you know which one is the last reference? And
|
||||
is this object going to outlive that other object? Maybe sometimes. It's not
|
||||
super obvious. It's a little more clear with a `unique_ptr`, at least local to
|
||||
where that `unique_ptr`'s destruction is. But there's usually no
|
||||
`scoped_refptr`. You can say this is the last one. So I know it's gone after
|
||||
this thing is gone. Maybe it is, maybe it's not often. So it's a bit tricky.
|
||||
However, there are scenarios when you truly want a bunch of things to have
|
||||
access to a piece of data. And you want that data to go away when nobody needs
|
||||
it anymore. And so that is your use case for a `scoped_refptr`. It is nicer
|
||||
when that thing being with shared ownership is not doing a lot of interesting
|
||||
things, especially in its destructor because of the complexity that's involved
|
||||
in shared ownership. But you're welcome to shoot yourself in the foot with this
|
||||
one if you need to.
|
||||
|
||||
18:33 SHARON: We're hoping to help people not shoot themselves in the foot. So
|
||||
use `scoped_refptr` carefully, is the lesson there. So you mentioned
|
||||
`shared_ptr`. Is that something we see much of in Chrome, or is that something
|
||||
that we generally try to avoid in terms of things from the standard library?
|
||||
|
||||
18:51 DANA: That is something that is banned in Chrome. And that's just
|
||||
basically because we already have `scoped_refptr`, and we don't want two of the
|
||||
same thing. There's been various times where people have brought up why do we
|
||||
need to have both? Can we just use `shared_ptr` now? And nobody's ever done the
|
||||
kind of analysis needed to make that kind of decision. And so we stay with what
|
||||
we're at.
|
||||
|
||||
19:18 SHARON: If you want to do that, there's someone that'll tell you what to
|
||||
do. So something that when I was using `scoped_refptr`, I came across that you
|
||||
need a WeakPtrFactory to create such a pointer. So weak pointers and WeakPtr
|
||||
factories are one of those things that you see a lot in Chrome and one of these
|
||||
base things. So tell us a bit about weak pointers and their factories.
|
||||
|
||||
19:42 DANA: So WeakPtr and WeakPtrFactory have a bit of an interesting history.
|
||||
Their major purpose is for asynchronous work. Chrome is basically a large
|
||||
asynchronous machine, and what does that mean? It means that we break all of
|
||||
the work of Chrome up into small pieces of work. And every time you've done a
|
||||
piece, you go and say, OK, I'm done. And when the next piece is ready, run this
|
||||
thing. And maybe that next thing is like a user input event, maybe that's a
|
||||
reply from the network, whatever it might be. And there's just a ton of steps
|
||||
in things that happen in Chrome. Like, a navigation has a request, a response,
|
||||
maybe another request - some redirects, whatever. That's an example of tons of
|
||||
smaller asynchronous tasks that all happen independently. So what goes on with
|
||||
asynchronous tasks? You don't have a continuous stack frame. What does that
|
||||
mean? So if you're just running some synchronous code, you make a variable, you
|
||||
go off and you do some things, you come back. Your variable is still here
|
||||
right. You're in this stack frame. And you can keep using it. You have
|
||||
asynchronous tasks. You make a variable, you go and do some work, and you are
|
||||
done your task Boop, your stack's gone. You come back later, you're going to
|
||||
continue. You don't have your variable anymore. So any state that you want to
|
||||
keep across your various tasks has to be stored and what we call bound in with
|
||||
that task. If that's a pointer, that's especially risky. So we talked earlier
|
||||
about Use-After-Frees. Well, you can, I hope, imagine how easy it is to stick a
|
||||
pointer into your state. This pointer is valid, I'm using it. I go away, I come
|
||||
back when? I don't know, sometime in the future. And I'm going to go use this
|
||||
pointer. Is it still around? I don't own it. I didn't use a `unique_ptr`. So
|
||||
who owns it. How do they know that I have a task waiting to use it? Well,
|
||||
unless we have some side channel communicating that, they don't. And how do I
|
||||
know if they've destroyed it if we don't have some side channel communicating
|
||||
that. I don't know. And so I'm just going to use this pointer and bad things
|
||||
happen. Your bank account is gone.
|
||||
|
||||
22:06 SHARON: No! My bank account!
|
||||
|
||||
22:06 DANA: I know. So what's the side channel? The side channel that we have
|
||||
is WeakPtr. So a WeakPtr and WeakPtrFactory provide this communication
|
||||
mechanism where WeakPtrFactory watches an object, and when the object gets
|
||||
destroyed, the WeakPtrFactory inside of it is destroyed. And that sends this
|
||||
little bit that says, I'm gone. And then when your asynchronous task comes back
|
||||
with its pointer, but it's a WeakPtr inside of it and tries to run, it can be
|
||||
like, am I still here? If the WeakPtrFactory was destroyed, no, I'm not. And
|
||||
then you have a choice of what to do at that point. Typically, we're like,
|
||||
abandon ship. Don't do anything here. This whole task is aborted. But maybe you
|
||||
do something more subtle. That's totally possible.
|
||||
|
||||
22:59 SHARON: I think the example I actually meant to say that uses a
|
||||
WeakPtrFactory is a SafeRef, which is another base type. So tell us a bit about
|
||||
SafeRefs.
|
||||
|
||||
23:13 DANA: WeakPtr is cool because of the side channel that you can examine.
|
||||
So you can say are you still alive, dear object? And it can tell you, no, it's
|
||||
gone. Or yeah, it's here. And then you can use it. The problem with this is
|
||||
that in places where you as the code author want to believe that this object is
|
||||
actually always there, but you don't want a security bug if you're wrong. And
|
||||
it doesn't mean that you're wrong now, even. Sometime later, someone can change
|
||||
code, unrelated to where this is, where the ownership happens, and break you.
|
||||
And maybe they don't know all the users of a given object and change in its
|
||||
lifetime in some subtle way, maybe not even realizing they are. Suddenly you're
|
||||
eventually seeing security bugs. And so that's why native pointers can be
|
||||
pretty scary. And so SafeRef is something we can use instead of a native
|
||||
pointer to protect you against this type of bug. It's built on top of WeakPtr
|
||||
and WeakPtrFactory. That's its relationship, but its purpose is not the same.
|
||||
so what SafeRef does is it says - SafePtr?
|
||||
|
||||
24:31 SHARON: SafeRef.
|
||||
|
||||
24:31 DANA: SafeRef.
|
||||
|
||||
24:31 SHARON: I think there's also a safe pointer, but there -
|
||||
|
||||
24:38 DANA: We were going to add it. I'm not sure if it's there yet. But so two
|
||||
differences between SafeRef and WeakPtr then, ref versus ptr, it can't be null.
|
||||
So it's like a reference wrapper. But the other difference is you can't observe
|
||||
whether the object is actually alive or not. So it has the side channel, but it
|
||||
doesn't show it to you. Why would you want that? If the information is there
|
||||
anyway, why wouldn't you want to expose it? And the reason is because you are
|
||||
documenting that you as the author understand and expect that this pointer is
|
||||
always valid at this time. It turns out it's not valid. What do you do? If it's
|
||||
a WeakPtr, people tend to say, we don't know if it's valid. It's a WeakPtr.
|
||||
Let's check. Am I valid? And if I'm not, return. And what does that result in?
|
||||
It results in adding a branch to your code. You do that over, and over, and
|
||||
over, and over, and static analysis, which is what we as humans have to do -
|
||||
we're not running the program, we're reading the code - can't really tell what
|
||||
will happen because there's so many things that could happen. We could exit
|
||||
here, we could exit there, we could exit here. Who knows. And that makes it
|
||||
increasingly hard to maintain and refactor the code. So SafeRef you the option
|
||||
to say this is always going to be valid. You can't check it. So if it's not
|
||||
valid, go fix that bug somewhere else. It should be valid here.
|
||||
|
||||
26:16 SHARON: So what kind of -
|
||||
|
||||
26:16 DANA: The assumptions are broken.
|
||||
|
||||
26:16 SHARON: So what kind of errors happen when that assumption is broken? Is
|
||||
that a crash? Is that a DCHECK kind of thing?
|
||||
|
||||
26:22 DANA: For SafeRef and for WeakPtr, if you try to use it without checking
|
||||
it, or write it incorrectly, they will crash. And crashing in this case means a
|
||||
safe crash. It's not going to lead to a security bug. It's literally just
|
||||
terminating the program.
|
||||
|
||||
26:41 SHARON: Does that also mean you get a sad tab as a user? Like when the
|
||||
little sad file comes up?
|
||||
|
||||
26:47 DANA: Yep. It would. If you're in the render process, you take it down.
|
||||
It's a sad tab. So that's not great. It's better than a security bug. Because
|
||||
your options here are don't write bugs. Ideal. I love that idea, but we know
|
||||
that bugs happen. Use a native pointer, security problem. Use a WeakPtr, that
|
||||
makes sense if you wanted it to sometimes not be there. But if you want it to
|
||||
always be there - because you have to make a choice now of what you're supposed
|
||||
to do if it's not, and it makes the code very hard to understand. And you're
|
||||
only going to find out it can't be there through a crash anyhow. Or use a
|
||||
SafeRef. And it's going to just give you the option to crash. You're going to
|
||||
figure out what's wrong and make it no longer do that.
|
||||
|
||||
27:38 SHARON: I think wanting to guarantee the lifetime of some other things
|
||||
seems like a pretty common thing that you might come across. So I'm sure there
|
||||
are many cases for many people to be adding SafeRefs to make their code a bit
|
||||
safer, and also ensure that if something does go wrong, it's not leading to a
|
||||
memory bug that could be exploited in who knows how long. Because we don't
|
||||
always hear about those. If it crashes, and they can reliably crash, at least
|
||||
you know it's there. You can fix it. If it's not, we're hoping that one of our
|
||||
VRP vulnerability researchers find it and report it, but that doesn't always
|
||||
happen. So if we can know about these things, that's good. So another new type
|
||||
in base that people might have been seeing recently is a `raw_ptr` which is
|
||||
maybe why earlier we were saying let's call them native pointers, not raw
|
||||
pointers. Because the difference between `raw_ptr` and raw pointer, very easy
|
||||
to mix those up. So why don't you tell us a bit about `raw_ptr`s?
|
||||
|
||||
28:40 DANA: So `raw_ptr` is really cool. It's a non-owning smart pointer. So
|
||||
that's kind of WeakPtr or SafeRef. These are also non-owning. And it's actually
|
||||
very similar in inspiration to what WeakPtr is. So it has a side channel where
|
||||
it can see if the thing It's pointing to is alive or gone. So for WeakPtr, it
|
||||
talks to the WeakPtrFactory and says am I deleted? And for `raw_ptr`, what it
|
||||
does is it keeps a reference count, kind of like `scoped_refptr`, but it's a
|
||||
weak reference count. It's not owning. And it keeps this reference count in the
|
||||
memory allocator. So Chrome has its own memory allocator for new and delete
|
||||
called PartitionAlloc. And that lets us do some interesting stuff. And this is
|
||||
one of them. And so what happens is as long as there is `raw_ptr` around, this
|
||||
reference count is non-zero. So even if you go and you delete the object, the
|
||||
allocator knows there is some pointer to it. It's still out there. And so it
|
||||
doesn't free it. It holds it. And it poisons the memory, so that just means
|
||||
it's going to write some bit pattern over it, so it's not really useful
|
||||
anymore. It's basically re-initialized the memory. And so later, if you go and
|
||||
use this `raw_ptr`, you get access to just dead memory. It's there, but it's
|
||||
not useful anymore. You're not going to be able to create security bugs in the
|
||||
same way. Because when we first started talking about a Use-After-Free - you
|
||||
have your goat, you free it, a cow is there, and now your pointer is pointing
|
||||
at the wrong thing - you can't do that because as long as there's this
|
||||
`raw_ptr` to your goat, the goat can be gone, but nothing else is going to come
|
||||
back here. It's still taken by that poisoned memory until all the `raw_ptr`s
|
||||
are gone. So that's their job, to protect us from a Use-After-Free being
|
||||
exploitable. It doesn't necessarily crash when you use it incorrectly, you just
|
||||
get to use this bad memory inside of it. If you try to use it as a pointer,
|
||||
then you're using a bad pointer, you're going to probably crash. But it's a
|
||||
little bit different than a WeakPtr, which is going to deterministically crash
|
||||
as soon as you try to use it when it's gone. It's really just a protection or a
|
||||
mitigation against security exploits through Use-After-Free. And then we
|
||||
recently just added `raw_ref`, which is really the same as `raw_ptr`, except
|
||||
addressing null ability. So smart pointers in C++ have historically all allowed
|
||||
a null state. That's representative of what native pointers did in C and C++.
|
||||
And so this is kind of just bringing this along in this obvious, historical
|
||||
way. But if you look at other languages that have been able to break with
|
||||
history and make their own choices kind of fresh, we see that they make choices
|
||||
like not having null pointers, not having null smart pointers. And that
|
||||
increases the readability and the understanding of your code greatly. So just
|
||||
like for WeakPtr, how we said, we just check if it's there or not. And if it's
|
||||
not, we return, and so on. It's every time you have a WeakPtr, if you were
|
||||
thinking of a timeline, every time you touch a WeakPtr, your timeline splits.
|
||||
And so you get this exponential timeline of possible states that your
|
||||
software's in. That's really intense. Whereas every time you cannot do that,
|
||||
say this can't be null, so instead of WeakPtr, you're using SafeRef. This can't
|
||||
be not here or null, actually. WeakPtr can't just be straight up null. This is
|
||||
always present. Then you don't have a split in your timeline, and that makes it
|
||||
a lot easier to understand what your software is doing. And so for `raw_ptr`,
|
||||
it followed this historical precedent. It lets you have a null value inside of
|
||||
it. And `raw_ref` is our kind of modern answer to this new take on nullability.
|
||||
And so `raw_ref` is a reference wrapper, meaning it holds a reference inside of
|
||||
it, conceptually, meaning it just can't be null. That is just basically - it's
|
||||
a pointer, but it can't be null.
|
||||
|
||||
33:24 SHARON: So these do sound the most straightforward to use. So basically,
|
||||
if you're not sure - or your class members at least - any time you would use a
|
||||
native pointer or an ampersand, basically you should always just put those in
|
||||
either a `raw_ptr` or a `raw_ref`, right?
|
||||
|
||||
33:45 DANA: Yeah, that's what our style guide recommends, with one nuance. So
|
||||
because `raw_ptr` and `raw_ref` interact with the memory allocator, they have
|
||||
the ability to be like, turn on or off dynamically at runtime. And there's a
|
||||
performance hit on keeping this reference count around. And so at the moment,
|
||||
they are not turned on in the renderer process because it's a really
|
||||
performance-critical place. And the impact of security bugs, there is a little
|
||||
less than in the browser process where you just immediately get access to the
|
||||
whole system. And so we're working on turning it on there. But if you're
|
||||
writing code that's only in the renderer process, then there's no point to use
|
||||
it. And we don't recommend that you use it. But the default rule is yes. Don't
|
||||
use a native pointer, don't use a native reference. As a field to an object,
|
||||
use a `raw_ptr`, use a `raw_ref`. Prefer something with less states, always,
|
||||
because you get less branches in your timeline. And then you can make it cost
|
||||
if you don't want it to be able to rebound to an object, if you don't want the
|
||||
pointer to change. Or you can make it mutable if you wanted to be able to.
|
||||
|
||||
34:58 SHARON: So you did mention that these types are ref counted, but earlier
|
||||
you said that you should avoid ref counting things. So
|
||||
|
||||
35:04 DANA: Yes.
|
||||
|
||||
35:11 SHARON: So what's the balance there? Is it because with a
|
||||
`scoped_refptr`, you're a bit more involved in the ref counting, or is it just,
|
||||
this is we've done it for you, you can use it. This is OK.
|
||||
|
||||
35:19 DANA: No, this is a really good question. Thank you for asking that. So
|
||||
there's two kinds of ref counts going on here. I tried to kind of allude to it,
|
||||
but it's great to make it clear. So `scoped_refptr` is a strong ref count,
|
||||
meaning the ref count owns the object. So the destructor runs, the object is
|
||||
gone and deleted when that ref count goes to 0. `raw_ref` and `raw_ptr` are a
|
||||
witchcraft count. They could be pointing to something owned in a
|
||||
`scoped_refptr` even. So they can exist at the same time. You can have both
|
||||
kind of ref counts going at the same time. A weak ref count, in this case, is
|
||||
holding the memory alive so that it doesn't get re-used. But it's not keeping
|
||||
the object in that memory alive. And so from a programming state point-of-view,
|
||||
the weak refs don't matter. They're helping protect you from security bugs.
|
||||
They're helping to make - when things go wrong, when a bug happens, they're
|
||||
helping to make it less impactful. But they don't change your program in a
|
||||
visible way. Whereas, strong references do. That destrutor's is based on when
|
||||
the ref count goes to 0 for a strong reference. So that's the difference
|
||||
between these two.
|
||||
|
||||
36:46 SHARON: So when you say don't use ref counting, you mean don't use strong
|
||||
ref counting.
|
||||
|
||||
36:46 DANA: I do, yes.
|
||||
|
||||
36:51 SHARON: And if you want to learn more about the raw pointer, `raw_ptr`,
|
||||
`raw_ref`, that's all part of the MiraclePtr project, and there's a talk about
|
||||
that from BlinkOn. I'll link that below also. So in terms of other base types,
|
||||
there's a new one that's called `base::expected`. I haven't even really seen
|
||||
this around. So can you tell us a bit more about how we use that, and what
|
||||
that's for?
|
||||
|
||||
37:09 DANA: `base::expected` is a backport from C++ 23, I want to say. So the
|
||||
proposal for `base::expected` actually cites a Rust type as inspiration, which
|
||||
is called `std::result` in Rust. And it's a lot like `optional`, so it's used
|
||||
for return values. And it's more or less kind of a replacement for exceptions.
|
||||
So Chrome doesn't compile with exceptions enabled even, so we've never relied
|
||||
on exceptions to report errors. But we have to do complicated things, like with
|
||||
`optional` to return a bool or an enum. And then maybe some value. And so this
|
||||
kind of compresses all that down into a single type, but it's got more state
|
||||
than just an option. So `expected` gives you two choices. It either returns
|
||||
your value, like `optional` can, or it returns an error. And so that's the
|
||||
difference between `optional` and `expected`. You can give a full error type.
|
||||
And so this is really useful when you want to give more context on what went
|
||||
wrong, or why you're not returning the value. So it makes a lot of sense in
|
||||
stuff like File IO. So you're opening a file, and it can fail for various
|
||||
reasons, like I don't have permission, it doesn't exist, whatever. And so in
|
||||
that case, the way you would express that in a modern way would be to return
|
||||
`base::expected` of your file handle or file class. And as an error, some
|
||||
enumerator, perhaps, or even an object that has additional state beyond just I
|
||||
couldn't open the file. But maybe a string about why you couldn't open the file
|
||||
or something like this. And so it gives you a way to return a structured error
|
||||
result.
|
||||
|
||||
39:05 SHARON: That's found useful in lots of cases. So all of these types are
|
||||
making up for basically what is lacking in C++, which is memory safety. C++, it
|
||||
does a lot. It's been around for a long time. Most of Chrome is written in it.
|
||||
But there are all these memory issues. And a lot of our security bugs are a
|
||||
result of this. So you are working on bringing Rust to Chromium. Why is that a
|
||||
good next step? Why does that solve these problems we're currently facing?
|
||||
|
||||
39:33 DANA: So Rust has some very cool properties to it. Its first property
|
||||
that is really important to this conversation is the way that it handles
|
||||
pointers, which in Rust would be treated pretty much exclusively as references.
|
||||
And what Rust does is it requires you to tell the compiler the relationships
|
||||
between the lifetimes of your references. And the outcome of this additional
|
||||
knowledge to the compiler is memory safety. And so what does that mean? It
|
||||
means that you can't write a Use-After-Free bug in Rust unless you're going
|
||||
into the unsafe part of the language, which is where scariness exists. But you
|
||||
don't need to go there to write a normal program. So we'll ignore it. And so
|
||||
what that means is you can't write the bug. And so that doesn't just mean I
|
||||
also like to believe I can write C++ without a bug. That's not true. But I
|
||||
would love to believe that. But it means that later, when I come back and
|
||||
refactor my code, or someone comes who's never seen this before and fixes some
|
||||
random bug somewhere related to it, they can't introduce a Use-After-Free
|
||||
either. Because if they do, the compiler is like, hey - it's going to outlive
|
||||
it. You can't use it. Sorry. And so there's this whole class of bugs that you
|
||||
never have to debug, you never ship, they never affect users. And so this is a
|
||||
really nice promise, really appealing for a piece of software like Chrome,
|
||||
where our basic purpose is to handle arbitrary and adversarial data. You want
|
||||
to be able to go on some web page, maybe it's hostile, maybe not. You just get
|
||||
a link. You want to be able to click that link and trust that even if it's
|
||||
really hostile and wanting to destroy you, it can't. Chrome is that safety net
|
||||
for you. And so Rust is that kind of safety net for our code, to say no matter
|
||||
how you change it over time, it's got your back. You can't introduce this kind
|
||||
of bug.
|
||||
|
||||
42:03 SHARON: So the first project sounds really cool. If people want to learn
|
||||
more or get involved - if you're into the whole languages, memory kind of thing
|
||||
- where can people go to learn more?
|
||||
|
||||
42:09 DANA: So if you're interested in helping out with our Rust experiment,
|
||||
then you can look for us in the Rust channel on Slack. If you're interested in
|
||||
C++ language stuff, you can find us in the CXX channel on Slack, as well. As
|
||||
well as the same CXX@chromium.org mailing list. And there is, of course, the
|
||||
rust-dev@chromium.org mailing list if you want to use email to reach us as
|
||||
well.
|
||||
|
||||
42:44 SHARON: Thank you very much, Dana. There will be notes from all of this
|
||||
also linked in the description box. And thank you very much for this first
|
||||
episode.
|
||||
|
||||
42:52 DANA: Thanks, Sharon This was fun.
|
453
docs/transcripts/wuwt-e02-dchecks.md
Normal file
453
docs/transcripts/wuwt-e02-dchecks.md
Normal file
@ -0,0 +1,453 @@
|
||||
# What’s Up With DCHECKs
|
||||
|
||||
This is a transcript of [What's Up With
|
||||
That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)
|
||||
Episode 2, a 2022 video discussion between [Sharon (yangsharon@chromium.org)
|
||||
and Peter (pbos@chromium.org)](https://www.youtube.com/watch?v=MpwbWSEDfjM).
|
||||
|
||||
The transcript was automatically generated by speech-to-text software. It may
|
||||
contain minor errors.
|
||||
|
||||
---
|
||||
|
||||
You've seen DCHECKs around and been asked to use them in code review, but what
|
||||
are they? What's the difference between a CHECK and a DCHECK? How do you use
|
||||
them? Here to answer that is special guest is Peter, who works on UI and
|
||||
improving crash reports.
|
||||
|
||||
Notes:
|
||||
- https://docs.google.com/document/d/146LoJ1E3N3E6fb4zDh92HPQc6yhRpNI7DSKlJjaYlLw/edit
|
||||
|
||||
Links:
|
||||
- [What's Up With Pointers](https://www.youtube.com/watch?v=MpwbWSEDfjM)
|
||||
|
||||
---
|
||||
|
||||
00:00 SHARON: Hello, and welcome to What's Up With That?, the series that
|
||||
demystifies all things Chrome. I'm your host, Sharon. And today, we're talking
|
||||
about DCHECKs. You've seen them around. You've probably been told to add one in
|
||||
code review before. But what are they? What are they for, and what do they do?
|
||||
Our guest today is Peter, who works on desktop UI and Core UI. He's also
|
||||
working on improving Chrome's crash reports, which includes DCHECKs. Today
|
||||
he'll help us answer, what's up with DCHECKs? Welcome, Peter.
|
||||
|
||||
00:30 PETER: Thanks for having me.
|
||||
|
||||
00:32 SHARON: Yeah. Thanks for being here. So the most obvious question to
|
||||
start with, what is a DCHECK?
|
||||
|
||||
00:39 PETER: So a CHECK and a DCHECK are both sort of things that make sure
|
||||
that what you think is true is true. Right? So this should never be called with
|
||||
an empty vector. You might add a CHECK for it, or you might add a DCHECK for
|
||||
it. And it's sort of similar to a search, which you may have hit during earlier
|
||||
programming outside of Chrome. And what it means is when this line gets hit, we
|
||||
check and see if it's true. And if it's not true, we crash. DCHECKs differ from
|
||||
CHECKs in that they are traditionally only in debug builds, or local
|
||||
development builds, or on our try-bots. So they have zero overhead when Chrome
|
||||
hits stable, because the CHECK just won't be there.
|
||||
|
||||
01:24 SHARON: OK. So like if the D stands for Debug. That make sense.
|
||||
|
||||
01:28 PETER: Yeah. I want debug to turn into developer, because now we have
|
||||
them by default if you're no longer - if you're doing a release build, and
|
||||
you're not turning them off, and you're not doing an official build, you get
|
||||
them.
|
||||
|
||||
01:42 SHARON: OK. Well, you heard it here first, or maybe you heard it before.
|
||||
I heard it here first. So you mentioned asserts. So something that I've seen a
|
||||
couple times in Chrome, and also is part of the standard library, is
|
||||
`static_assert`. So how is that similar or different to DCHECKs? And why do we
|
||||
use or not use them?
|
||||
|
||||
02:00 PETER: Right. So `static_assert`s are - and you're going to have to ask
|
||||
C++ experts, who can probably take some of the sharp edges off of this - but
|
||||
it's basically, if you can assert something in compile time, then you can use a
|
||||
`static_assert`, which means that you don't have to hit a code path where it's
|
||||
wrong. It sort of has to always hold true. And whenever you can use a
|
||||
`static_assert`, use a `static_assert`, because it's free. And basically, you
|
||||
can't compile the program if it's not true.
|
||||
|
||||
02:31 SHARON: OK. That's good to know, because I definitely thought that was
|
||||
one of the C++ standard library things we should avoid, because we have a
|
||||
similar thing in Chromium. But I guess that's not the case.
|
||||
|
||||
02:41 PETER: Yeah. Assert is the one that is - OK, so this is a little
|
||||
complicated, right? `static_assert` is a language feature, not a library
|
||||
feature. And someone will tell me that I'm wrong about something about this.
|
||||
Asserts are just sort of a poorer version of DCHECKs. So they won't go through
|
||||
our crash handling. It won't print the pretty stacks, et cetera.
|
||||
`static_assert`s, on the other hand, are a compile time feature. And we don't,
|
||||
as far as I know, have our own wrapper around it. We just use `static_assert`.
|
||||
So what you would maybe use this for is like if you have a constant - like, say
|
||||
you have an array, and the code makes an assumption that some constant is the
|
||||
size of this array, you can assert that in compile time, and that would be a
|
||||
good use of a `static_assert`.
|
||||
|
||||
03:26 SHARON: OK. Cool. So you mentioned that some things have changed with how
|
||||
DCHECKs work. So can you give us a brief overview of the history of DCHECKs -
|
||||
what they used to be, people who have been using them for a while, how might
|
||||
they have changed from the idea of what they have as a DCHECK in their mind?
|
||||
|
||||
03:43 PETER: Sure. So this is as best I know. I'm just sort of extrapolating
|
||||
from what I've seen. And what I think originally was true is that a CHECK used
|
||||
to be this logging statement, where you essentially compile the file name and
|
||||
the line number. And if this ever hits, then we'll log some stuff and then
|
||||
crash. Right? Which comes with a little bit of overhead, especially on size,
|
||||
that you basically take the file name and line number for every instance, and
|
||||
that generates a bunch of strings and numbers that essentially add to Chrome's
|
||||
binary size. I don't know how many steps between that and where we currently
|
||||
are. But right now, our CHECKs are just, if condition is false, crash, which
|
||||
means that you won't, out of the CHECK, get file name and line number. We'll
|
||||
get those out of debugging symbols. And you also won't get any of the logging
|
||||
messages that you can add to the end of a CHECK, which means that your debug
|
||||
info will be poorer, but it will be cheaper to use. So they've gotten from
|
||||
being pretty heavy CHECKs to being really cheap.
|
||||
|
||||
05:01 SHARON: OK. So that kind of leads us into the question that I think most
|
||||
people want to have answered, which is, when should I use a DCHECK? When should
|
||||
I use a CHECK? When should I use neither?
|
||||
|
||||
05:13 PETER: I would say that historically, we've said CHECKs are expensive.
|
||||
Don't use them unless you sort of have to. And I don't think that holds true
|
||||
anymore. So basically, unless you are in really performance-critical code, then
|
||||
use a CHECK. If there's anything that you care about where the program state
|
||||
will be unpredictable from this point on if it's not true, CHECK it. It's not
|
||||
that expensive. Right? We have a lot of code where we push a string onto a
|
||||
vector, and that never gets flagged in code review. And it's probably like 10
|
||||
times more expensive, if not 100 times more expensive, than adding a CHECK. The
|
||||
exception to that is if you're in a really hot loop where you don't want to
|
||||
dereference a pointer, then a CHECK might add some cost. And the other is if
|
||||
the condition that you're trying to validate is really expensive. It's not the
|
||||
CHECK itself that's expensive. It's the thing you're evaluating. And if that's
|
||||
expensive, then you might not afford doing a CHECK. If you don't know that it's
|
||||
expensive, it's probably not expensive.
|
||||
|
||||
06:20 SHARON: Can you give us an example of something expensive to evaluate for
|
||||
a CHECK?
|
||||
|
||||
06:24 PETER: Right. So say that you have something in video code that for every
|
||||
video frame, for every pixel validates the alpha value as opaque, or something.
|
||||
That would probably make video conferencing a little bit worse performance.
|
||||
Another thing would just be if you have to traverse a graph on every frame, and
|
||||
it will sort of jump all over memory to see if some reachability problem in
|
||||
your graph is true, that's going to be a lot more expensive. But CHECKing that
|
||||
index is less than some vector bounds, I think that should fall under cheap.
|
||||
And -
|
||||
|
||||
07:02 SHARON: OK.
|
||||
|
||||
07:02 PETER: culturally, we've tried to avoid doing a lot of these. And I think
|
||||
it's just hurting us.
|
||||
|
||||
07:09 SHARON: OK. So since most places we should use CHECKs, are there any
|
||||
places where a DCHECK would be better then? Or any time you would have normally
|
||||
previously used a DCHECK, you should just make that a check?
|
||||
|
||||
07:23 PETER: So we have a new construct that's called `EXPENSIVE_DCHECK`s, or
|
||||
if `EXPENSIVE_DCHECK`s are on, I think we should add a corresponding macro for
|
||||
`EXPENSIVE_DCHECK`. And then you should be able to just say, either it's
|
||||
expensive and has to be a DCHECK, so use `EXPENSIVE_DCHECK`; otherwise, use
|
||||
CHECK. And my hunch would be like 95% of what we have as DCHECKs would probably
|
||||
serve us better as CHECKs. But your code owner and reviewer might disagree with
|
||||
that. And it's not yet documented policy that we say CHECKs are cheap; just add
|
||||
a billion of them. But I would like to get there eventually.
|
||||
|
||||
08:04 SHARON: OK. So if you put in a CHECK, and your reviewer tells them this
|
||||
should be a DCHECK, the person writing the CL can point them to this video, and
|
||||
then they can discuss from there.
|
||||
|
||||
08:13 PETER: I mean, yeah, you can either say Peter disagrees with you, or I
|
||||
can get further along this and say we make policy that CHECKs are cheap, so
|
||||
they are preferable. So a lot of foot-shooters with DCHECKs is that you expect
|
||||
this property to hold true, but you never effectively CHECK it. And that can
|
||||
lead to all sorts of bad stuff, right? Like if you're trying to DCHECK that
|
||||
some origin for some frame makes some assumptions of site iso - I don't know
|
||||
site isolation well enough to say this. But basically, if you're DCHECKing that
|
||||
the code that you're running runs under some sort of permissions, then that is
|
||||
effectively unchecked in stable, right? And we do care about those properties,
|
||||
and it would be really good if we crashed rather than leaked information
|
||||
between sites.
|
||||
|
||||
09:12 SHARON: Right.
|
||||
|
||||
09:14 PETER: Yeah.
|
||||
|
||||
09:16 SHARON: So that seems like a good tie-in for the fact that within some
|
||||
security people, they don't have the most positive impression of DCHECKs, shall
|
||||
we say? So a couple examples of this, for listeners who maybe aren't familiar
|
||||
with this, is one person previously on security saying DCHECKs are pronounced
|
||||
as "code that's not tested". Someone else I told about this episode - I said,
|
||||
we're going to talk about DCHECKs - they immediately said, is it going to be
|
||||
about why DCHECKs are bad? So amongst the Chrome security folks, they are not a
|
||||
huge fan of DCHECKs. Can you tell us maybe why that is?
|
||||
|
||||
09:51 PETER: So if we go back a little bit in time, it used to be that DCHECKs
|
||||
were only built for developers if they do a debug build. And Chrome has gotten
|
||||
so big that you don't want to do a debug build or the UI is incredibly slow.
|
||||
Unfortunately, it's sort of not that great an experience to work in a debug
|
||||
build. So people work in a release build. That doesn't mean that they don't
|
||||
care about the things they put under DCHECK. It just means they want to go on
|
||||
with their lives and not wait x minutes for the browser to launch, or however
|
||||
bad it is nowadays. And that means that they, unfortunately, lose coverage for
|
||||
the DCHECKs. So this means that if your code is not exercised well under tests,
|
||||
then this is completely not enforced. But it's slightly better than a comment,
|
||||
in that you're really expecting this thing to hold true, and that's clearly an
|
||||
expectation. But how good is the expectation if you don't look at it? So last
|
||||
year, I believe, we made it so that DCHECKs are on by default if you're not
|
||||
doing an official build. And this included release builds. So now, it's like at
|
||||
least if you're doing development and you hit this condition, it's going to
|
||||
explode, which is really good, because then you can find a lot of issues, and
|
||||
we can prevent a lot of issues from ever happening in the first place. It is
|
||||
really hard for you, as a developer, to make the assumption that if this
|
||||
invariant is ever false, I will find it during development, and it will never
|
||||
happen in the wild. And DCHECKs are essentially either, I will find this
|
||||
locally before I submit it, or all bets are off; or it is I don't care that
|
||||
much if this thing doesn't hold true, which is sort of a weird assertion to
|
||||
make. So I think we're in this little awkward in-between state. And this
|
||||
in-between state, remember, mostly exists as a performance optimization from
|
||||
when CHECKs used to be a lot more expensive, in terms of code size. So did I
|
||||
cover most of this?
|
||||
|
||||
12:06 SHARON: Yeah. I think, based on that, I think it's pretty easy to see why
|
||||
people who are more concerned about security are not a fan of this.
|
||||
|
||||
12:13 PETER: I mean, if you care about it, especially if it causes privacy or
|
||||
security or user-harm sort of things, just CHECK. Just CHECK, right? If it
|
||||
makes your code animate a thing slightly weirder, like it will just jump to the
|
||||
end position instead of going through your fence load, whatever. Maybe you can
|
||||
make that a DCHECK. Maybe it doesn't matter. Like it's wrong, but it's not that
|
||||
bad. But most of the cases, you DCHECK something, where it's like the program
|
||||
is going to be in some indeterminate state, and we actually care about if it's
|
||||
ever false. So maybe we can afford to make it a CHECK. Maybe we should look
|
||||
more about our sort of vector pushbacks than we should look at our CHECKs, and
|
||||
then just have more CHECKs. More CHECKs. Because it's also like when things
|
||||
break, it's a lot cheaper to debug a DCHECK than your program is in some
|
||||
indeterminate state, because it was allowed to pass through a DCHECK that you
|
||||
thought was - and when you read the code, unless you're used to reading it as
|
||||
DCHECKs - oh, that just didn't get enforced - it's sort of hard to try to
|
||||
figure out why the thing was doing the wrong thing in the first place.
|
||||
|
||||
13:22 SHARON: OK. How is this as a summary? When in doubt, CHECK it out.
|
||||
|
||||
13:27 PETER: I like that. I like that. And you might get pushback by reviewers,
|
||||
who aren't on my side of the fence yet. And then you can decide on which hill
|
||||
you want to die on, at least until we've made policy to just not complain about
|
||||
DCHECKs, or not complain about CHECKs.
|
||||
|
||||
13:45 SHARON: All right. That sounds good. So you mentioned stuff failing in
|
||||
the wild. And for people who might not know, do you want to just briefly
|
||||
explain what failing in the wild means?
|
||||
|
||||
13:54 PETER: OK. So there's two things. Just failing in the wild just means
|
||||
that when this thing rolls out to Canary, Dev, Beta, Stable, if you have a
|
||||
CHECK that will crash and generate a crash report as if you had a memory bug,
|
||||
but it crashes in a deterministic way, at a deterministic spot - so you can
|
||||
find out exactly what assumption was violated. Say that this should never be
|
||||
called with a null pointer. Then you can say, look at this line where it
|
||||
crashed. It clearly got hit with a null pointer. And then you can try to figure
|
||||
out, from the stack, why that happened, rather than after you post this pointer
|
||||
to a task, it crashes somewhere completely irrelevant from the actual call
|
||||
site. Well, so in the wild specifically means it generates a crash report so
|
||||
you can look at it, or in the wild means it crashes at a user computer rather
|
||||
than - in the wildness outside of development. And as for the other part of in
|
||||
the wild, it's that we have started running non-crashy DCHECKs for a percentage
|
||||
of Windows Canary. And we're looking to expand that. And we're gathering
|
||||
information, basically, about which assertions or invariants that we have are
|
||||
violated in practice in the wild, even though we don't think that they should
|
||||
be. And that will sort of also culturally move the needle so that we do care
|
||||
about DCHECKs. And when we care about DCHECKs, sort of similarly to how we care
|
||||
about CHECKs, is it really that important to make the big distinction between
|
||||
the two? Except for the case where you have really expensive DCHECKs, they
|
||||
might still be worth keeping separate. And those will be things like, if you do
|
||||
things for - say that you zero out memory or something for every memory block
|
||||
that you allocate and free, or you do things for every audio sample, or for
|
||||
every video frame pixel, those sort of things. And then we can sort of keep
|
||||
expensive stuff gated out from CHECKs. And then maybe we don't need this
|
||||
in-between where people don't know whether they can trust a DCHECK or not.
|
||||
|
||||
16:04 SHARON: So you mentioned that certain release builds now have DCHECKs
|
||||
enabled. So for those in the wild versus regular CHECKs in the wild, if those
|
||||
happen to fail, do the reports for those look the same? Are they in the same
|
||||
place? Can they be treated the same?
|
||||
|
||||
16:20 PETER: Yeah. Well, they are uploaded to the same crash-reporting thing.
|
||||
They show up under a special branch. And you likely will get bugs filed to you
|
||||
if they hit very frequently, just like you would with crashes. There's a sort
|
||||
of slight difference, in that they say dump without crashing. And that's just
|
||||
sort of a rollout strategy for us. Because if we made DCHECK builds incredibly
|
||||
crashy, because they hit more than CHECKs, then we can never roll this thing
|
||||
out. Or it gets a lot scarier for us to put this on 5% of a new platform that
|
||||
we haven't tested. But as it is right now, the first DCHECK that gets hit for
|
||||
every process gets a crash dump uploaded.
|
||||
|
||||
17:07 SHARON: OK. So I've been definitely told to use dump without crashing at
|
||||
certain points in CLs, where it's like, OK, we think that this shouldn't
|
||||
happen. But if it does, we don't necessarily want to crash the browser because
|
||||
of it. With the changes you've mentioned to DCHECKs happening, should those
|
||||
just be CHECKs instead now or should those still be dump without crashing?
|
||||
|
||||
17:29 PETER: So if you want dump without crashing, and you made those a DCHECK,
|
||||
then you would only have coverage in the Canary channels that we are testing.
|
||||
Right? So if you want to get dump reports from the platforms that we're not
|
||||
currently testing, including all the way up to Stable, you probably still want
|
||||
to keep that a dump without crashing. You want to make sure that you're not
|
||||
using the sort of - you want to make sure that you triage these, because you
|
||||
don't want to keep these generating crash dumps n forever. You should still
|
||||
treat them as if they were crashes. And I think the same thing should hold true
|
||||
for DCHECKs. You should only add them for an invariant that you care about
|
||||
being violated, right? So as it is violated, you should either figure out why
|
||||
your invariant was wrong, or you should try to fix the breakage. And you can
|
||||
probably add more information to logging to figure out why that happened.
|
||||
|
||||
18:41 SHARON: So when you have a CHECK, and it crashes in the wild, you get a
|
||||
stack trace. And that's what you have to work on to figure out what went wrong
|
||||
for debugging. Right? So what are some things that you can do, as a developer,
|
||||
to make these CHECKs a bit more useful for you - ways to incorporate other
|
||||
information that you can use to help yourself debug?
|
||||
|
||||
19:01 PETER: So some of the stuff that we have is we have something called
|
||||
crash keys, which are essentially, you can write a piece of string data,
|
||||
essentially - there's probably some other data types - and if you write those
|
||||
before you're running dump without crashing, or before you hit a CHECK, or
|
||||
before you hit a DCHECK, then those will be uploaded along the crash dump. And
|
||||
if you talk to someone who knows where to find them, you can basically go in
|
||||
under a crash report, and then under field product data, or something like
|
||||
that, you should be able to find your key-value pair. And if you have
|
||||
information in there, you'll be able to look at it. The other thing that I like
|
||||
to do, which is probably the more obvious thing, is if you have somewhat of a
|
||||
hypothesis that this thing should only fail if a or b or c is not true, then
|
||||
you can add CHECKs for those. Like, if a CHECK is failing, you can add more
|
||||
CHECKs to see why the CHECK was failing. In general, you're not going to get as
|
||||
much out of a mini-dump that you want. You're not going to have the full heap
|
||||
available to you, because that would be a mega-dump. You can usually find
|
||||
whatever is on the stack if you go in with a debugger. And I know that you
|
||||
wanted to lead me into talking about CHECK\_GT and CHECK\_EQ, which are
|
||||
essentially, if you want to check that x is greater than y, then you should use
|
||||
CHECK\_GT(x,y). The problem with those, in this sort of context, is that,
|
||||
similarly to CHECKs - so CHECK\_GT gets compiled into, basically, if not x is
|
||||
greater than y, crash. So unfortunately, the values of x and y are optimized
|
||||
out when you're doing an official build.
|
||||
|
||||
21:02 SHARON: So this makes me think of some stuff we mentioned in the last
|
||||
episode, which was with Dana. Check it out if you haven't. But one of the types
|
||||
we mentioned there was SafeRef, which enforces a certain condition. And if that
|
||||
fails - so in the case of a SafeRef, it ensures that the value you have there
|
||||
is not null. And if that's ever not true, then you do get a crash similar to if
|
||||
a CHECK fails. So in general, would you say it's better practice to enforce and
|
||||
make sure your assumptions are held in these other, more structural ways than
|
||||
relying on CHECKs instead?
|
||||
|
||||
21:41 PETER: So let me see if I can get at what you actually want out of that
|
||||
one. So if we look at - there's a RawRef type, right? So what's good with the
|
||||
RawRef is that you have a type that annotates that this thing cannot possibly
|
||||
be null. So if you assign to it, and you're assigning a null pointer, your
|
||||
program is going to crash, and you don't need to think about whether you throw
|
||||
a null pointer in or not. If you keep passing a RawRef around, then that's
|
||||
essentially you passing around a non-null pointer. And therefore, you don't
|
||||
have to check that it's not null pointer in every step of the way. You only
|
||||
need to do it when you're - I mean, the type will do it for you, but it only
|
||||
needs to happen when you're converting from a pointer to a ref, essentially, or
|
||||
a RawRef. And what's so good about that is now you have the - previously, you
|
||||
might just CHECK that this isn't called with null pointer or whatever. But then
|
||||
you would do that for four or five arguments. And you'd be like, null pointer
|
||||
CHECKs are this part of the function body. And then it just gets super-noisy.
|
||||
But if you're using the RawRef types, then the semantics of the type will
|
||||
enforce that for you. And you don't have to think about that when reading the
|
||||
code, because usually when you read the code, you're going to be like, it's a
|
||||
pointer. Can it be null or not? What does it point to? And this thing will at
|
||||
least tell you, it can't be null. And you still have the question of, what does
|
||||
it point to? And that's fine. So I like enforcing this through types more than
|
||||
checking those assumptions, and then checking inside of what happens. If you
|
||||
were assigned to this RawRef, then it's going to crash in the constructor if
|
||||
you have a null pointer. And then based on that stack trace, if we have good
|
||||
stack data, you're going to know at what line you created the RawRef. And
|
||||
therefore, it's equivalent to checking for not null pointer, because you can
|
||||
trust the type to do the checking. And since I know Dana made this, I can
|
||||
probably with 200% certainty say that it's a CHECK and not a DCHECK. But we do
|
||||
have a couple of other places where you have a WeakPtr that shouldn't be
|
||||
dereferenced on the wrong sequence. And those are complicated words. And that,
|
||||
unfortunately, is a DCHECK. So we're hitting some sort of - I don't know if
|
||||
that CHECK is actually expensive, or if it should be a CHECK, or if it could be
|
||||
a CHECK. I think, especially, if you're in core types, the size overhead of
|
||||
adding a CHECK is negligible, because all of the users of it benefit from that
|
||||
CHECK. So unless it's incredibly -
|
||||
|
||||
24:28 SHARON: What do you mean by core types?
|
||||
|
||||
24:30 PETER: Say that you make a `scoped_refptr` something, that ref pointer is
|
||||
used everywhere. So if you CHECKed in the destructor, then you're validating
|
||||
all of the clients of your scope ref pointer. So for one CHECK, you get the
|
||||
price of a lot of CHECKing. Whereas if in your client code you're validating
|
||||
some parameters of an API call that only gets called once, then that's one
|
||||
CHECK you add for one case. But if you're re-use, then your CHECK gets a lot
|
||||
more value. And it's also easier to get parameters wrong sometimes if you have
|
||||
500 clients that are calling your API. You can't trust all of them to get it
|
||||
right. Whereas if you're just developing your feature, and it's only used by
|
||||
your feature, then you can be a little bit more certain with how it's being
|
||||
called. I would say, still add CHECKs, because code evolves over time. It's
|
||||
sort of like how you can add unit tests to make sure that no one breaks your
|
||||
code in the future. If you add CHECKs, then no one can break your code in the
|
||||
future.
|
||||
|
||||
25:37 SHARON: Mm-hmm. OK. So you mentioned a few things about how CHECKs and
|
||||
DCHECKs are changing. [AUDIO OUT] what is currently in the works, and what is
|
||||
the long-term goal and plan for CHECKs and DCHECKs.
|
||||
|
||||
25:53 PETER: So currently what's in the work is we've made sure that some
|
||||
libraries that we use, like Abseil and WebRTC, which is a first-party
|
||||
third-party library, that they both use Chrome's crashing report system, which
|
||||
means that you get more predictable crash stacks because it's using the
|
||||
immediate crash macro. But also, you get the fatal logging field that I talked
|
||||
about. That gets logged as part of crash dumps. So you hopefully have more
|
||||
glanceable, actionable crash reports whenever a CHECK is violated inside of
|
||||
Abseil, or in WebRTC, as it were. And then upcoming is we want to make sure
|
||||
that we keep an eye out for our DCHECKs on other platforms, such as Mac. I know
|
||||
that there's some issues with getting that fatal log field in the GPU process,
|
||||
and I'm working on fixing that as well. So hopefully, it just means more
|
||||
reports for the things you care about and easier to action on reports. That's
|
||||
what we're hoping.
|
||||
|
||||
27:03 SHARON: If people think that this sounds really cool, want to have some
|
||||
more involvement, or want to ask more questions, what's a good place for them
|
||||
to do that?
|
||||
|
||||
27:11 PETER: I like Slack as a thing for this. So the #cxx channel on Slack,
|
||||
the #base channel on Slack, the #halp channel on Slack is really good. #halp is
|
||||
really, I think, unintimidating. You can just throw whatever question you have
|
||||
in there, and I happen to be around there. If you can find out what my last
|
||||
name is through sheer force of will, you can send me an email to my Chromium
|
||||
username. What else would we have? I think if they want to get involved, just
|
||||
add CHECKs to your code. That's a really good way to do it. Just make sure that
|
||||
your code does what you expect it to in more cases.
|
||||
|
||||
27:48 SHARON: Maybe if you have a CL, and you're just doing some drive-by
|
||||
cleanup, you can turn some DCHECKs into CHECKs also?
|
||||
|
||||
27:56 PETER: If your reviewer is cool with that, I'm cool with that. Otherwise,
|
||||
you can just try to hope for us making that policy that we use CHECKs - if it's
|
||||
something we care about, we use a CHECK instead of a DCHECK, unless we have a
|
||||
really good reason to use a DCHECK. And that would be performance.
|
||||
|
||||
28:15 SHARON: That sounds good. And one last question is, what do you want
|
||||
people to take away as their main takeaway from this discussion?
|
||||
|
||||
28:26 PETER: I think validating code assumptions is really valuable. So you
|
||||
think that you're pretty smart when you're writing something, or you remember -
|
||||
I mean, you're sometimes kind of smart when you're writing something. And
|
||||
you're like, this can't possibly be wrong. And in practice, looking at crash
|
||||
reports, these things are wrong all the time. So please validate any
|
||||
assumptions that you make. It's also, I would say, better than a comment,
|
||||
because it's a comment that doesn't get outdated without you noticing it. So, I
|
||||
think, validate your assumptions to make sure that your code is more robust.
|
||||
And validate properties you care about. And don't be afraid to use CHECKs.
|
||||
|
||||
29:13 SHARON: All right. That sounds like a good summary. Thank you very much
|
||||
for being here, Peter. It was great to learn about DCHECKs.
|
||||
|
||||
29:18 PETER: Yeah. Thanks for having me.
|
||||
|
||||
29:24 SHARON: Action. Hello.
|
||||
|
||||
29:26 PETER: Oh. Take four.
|
||||
|
||||
29:29 SHARON: [LAUGHS] Take four. And action.
|
488
docs/transcripts/wuwt-e03-content.md
Normal file
488
docs/transcripts/wuwt-e03-content.md
Normal file
@ -0,0 +1,488 @@
|
||||
# What’s Up With //content
|
||||
|
||||
This is a transcript of [What's Up With
|
||||
That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)
|
||||
Episode 3, a 2022 video discussion between [Sharon (yangsharon@chromium.org)
|
||||
and John (jam@chromium.org)](https://www.youtube.com/watch?v=SD3cjzZl25I).
|
||||
|
||||
The transcript was automatically generated by speech-to-text software. It may
|
||||
contain minor errors.
|
||||
|
||||
---
|
||||
|
||||
What lives in the content directory? What is the content layer? How does it fit
|
||||
into Chrome and the web at large? Here to answer all that and more is today’s
|
||||
special guest, John, who not only is a Content owner, but actually split the
|
||||
codebase to create the Content layer.
|
||||
|
||||
Notes:
|
||||
- https://docs.google.com/document/d/1EJnG5gK8rQwHkdZTKl8vIwx9oScP8TaKBgwzBafIh9M/edit
|
||||
|
||||
Links:
|
||||
- [//content/README.md](https://crsrc.org/c/content/README.md)
|
||||
- [//content/public/README.md](https://crsrc.org/c/content/public/README.md)
|
||||
- [What's Up With Pointers](https://www.youtube.com/watch?v=MpwbWSEDfjM)
|
||||
|
||||
---
|
||||
|
||||
00:00 SHARON: Hello, and welcome to "What's Up with That", the series that
|
||||
demystifies all things Chrome. I'm your host, Sharon, and today, we're talking
|
||||
about content. What lives in the content directory? What is the content layer?
|
||||
How does it fit into Chrome and the web at large? Here to answer all of that
|
||||
and more is today's special guest, John. He's not only a content owner, but
|
||||
actually split the code base to create the content layer. Since then, a theme
|
||||
of his work has been Chrome's architecture, and how to make it usable by
|
||||
others. He's been involved far and wide across Chrome, but today, we're
|
||||
focusing on content. John, welcome to the program.
|
||||
|
||||
00:33 JOHN: Hi, everyone, and thanks for setting this up, Sharon. My name's
|
||||
John, and I'm happy to try to shed some light and history on this part of the
|
||||
Chrome codebase. I've had the pleasure of working on a lot of different parts
|
||||
of Chrome over a number of years I've worked on it. A theme of my work has been
|
||||
on the architecture of Chrome and making it reusable by other products. And one
|
||||
of the projects has been splitting up the codebase and helping create this
|
||||
content layer.
|
||||
|
||||
01:02 SHARON: So, can you tell us what the content layer is? Because content is
|
||||
a very overloaded term, and we're going to say it a lot today. So you mentioned
|
||||
the content layer. Can you tell us what that is?
|
||||
|
||||
01:10 JOHN: Yes. The content layer is a part of the Chrome codebase that's
|
||||
responsible for the multiprocess sandbox implementation of our platform.
|
||||
|
||||
01:24 SHARON: And another term that I had heard a lot tossed around before I
|
||||
really understood what was going on was the content public API. So is that the
|
||||
same as the content layer, or is that different?
|
||||
|
||||
01:36 JOHN: It's part of it. So the content component is very large, and so,
|
||||
we've surrounded it by this small public API. So that you hide the
|
||||
implementation details and the private directories, and then, embedders just
|
||||
only have access to a small public layer.
|
||||
|
||||
01:56 SHARON: How did we end up with this content layer? Can you give us a bit
|
||||
of history of how we came up with it? And also, maybe why it's called content?
|
||||
|
||||
02:02 JOHN: Sure. The history is - in the beginning, Chrome, like all software
|
||||
projects begins nice and easy to understand. But over time, as you add a lot
|
||||
more features to go from zero users to billions of users, it becomes harder to
|
||||
understand. Small files, small classes become much larger small functions kind
|
||||
of get numerous hooks to talk to every feature, because they want to know when
|
||||
something happens. And so, this idea started that let's separate the product.
|
||||
Things that make Google Chrome what it is from the platform, which is what any
|
||||
browser, any minimal browser doing the latest HTML specs would need to
|
||||
implement them in a sandbox, a multiprocess way. And so, content was the lower
|
||||
part, and that's how it started.
|
||||
|
||||
02:58 SHARON: How did we get the name content?
|
||||
|
||||
02:58 JOHN: The name is like a pun. And when we started Chrome, one of the
|
||||
ideas was, we'll focus on content and not Chrome, and so, the browser will get
|
||||
out of the way. Chrome is a term used to refer to all the user interface parts
|
||||
of the browser. And so, we said, it's going to be content and not Chrome. And
|
||||
so, when you open Chrome, you just see a very small UI. Most of what you see is
|
||||
the content. And so, when we split the directory, it was originally called
|
||||
Source Chrome, and so, the content part, that's the pun. That's where it came
|
||||
from.
|
||||
|
||||
03:34 SHARON: That's fun. Earlier, you mentioned embedders of content. Can you
|
||||
tell us what an embedder of content is? And this is part of why I was very
|
||||
excited about this episode, because I was working on a team where we were
|
||||
embedders of content for a long time. Well over a year, and it took me a long
|
||||
time to really understand what that was. Because, as you mentioned now,
|
||||
Chrome's grown a lot. You work on a very specific thing understanding these
|
||||
more general concepts of what is content? What is a content embedder are less
|
||||
important to what you do day-to-day. But can you tell us what an embedder of
|
||||
content is?
|
||||
|
||||
04:13 JOHN: Sure. An embedder of content is simply anybody who chooses to use
|
||||
that code to build a browser on top of it. And so, in the beginning, right when
|
||||
we did this, the goal was just to have one embedder. Or not the goal, what we
|
||||
had was just one embedder. It was Chrome. But then, right away, we were like,
|
||||
you know what? It would be nice for people who work on content and not the
|
||||
feature part to build a smaller binary. It builds faster. It debugs faster,
|
||||
runs faster. And so, we built this minimal example also to other people called
|
||||
content shell. And then, we started running tests against that, and that was
|
||||
the first - or the second embedder of content. And then since then, what was
|
||||
unexpected, what we started for code health reasons turned out to be very
|
||||
useful for other projects to restart - or start building their browser from.
|
||||
And so, things like Android webview, which was using its own fork of web kit,
|
||||
then started using content. That was one first-party example. But then, other
|
||||
projects came along. Things like Electron and content-embedded framework, all
|
||||
started building not just products on top of it, but other frameworks.
|
||||
|
||||
05:30 SHARON: That was really surprising to learn about, because it seems
|
||||
unsurprising that you would build another browser based on Chromium. And people
|
||||
have heard about this when Edge switched over to Chromium. But to learn that
|
||||
things like Electron are built around content seem really surprising, because
|
||||
that's very different from what a browser is.
|
||||
|
||||
05:52 JOHN: But they have common needs. They have some HTML data, and they want
|
||||
to render it and do so in a safe, and stable, and secure way. And that's not
|
||||
their value add, working on that code. So it's better for them to use something
|
||||
else.
|
||||
|
||||
06:11 SHARON: That makes sense. You also mentioned that Chrome is dependent on
|
||||
content. And when I first started working on Chrome as an intern, I had - it
|
||||
told to me so many times because I couldn't remember that Chrome can depend on
|
||||
content, but not the other way around. So can you tell us a bit about this
|
||||
layering, and why it's there?
|
||||
|
||||
06:31 JOHN: I should also start by saying, content is not just - when we say
|
||||
content, often what we mean, you embed content. You embed content in everything
|
||||
that sits below it in the layer tree. So that includes things like Blink, our
|
||||
rendering engine. V8, our JavaScript Engine. Net, our networking library, and
|
||||
so on. And there's also you can talk to the content public APIs, but also,
|
||||
sometimes, you talk to the Blink API and the files, and V8, and so on.
|
||||
|
||||
07:07 SHARON: So you have this many layer API or product? And, at the bottom,
|
||||
we have things like Net, Blink, and those probably have dependencies on them
|
||||
that I don't know about. And on top of that, we have content, and then, on top
|
||||
of that, we have Chrome?
|
||||
|
||||
07:23 JOHN: Right. And so, Chrome as an embedder content can include directory
|
||||
in the content public API. But since content can have multiple embedders, it
|
||||
can't include Chrome. If content reached out directly to Chrome, then other
|
||||
people wouldn't be able to use it. Because if you try to bring in this code, it
|
||||
includes files from a directory that you're not using. So, instead, the content
|
||||
public API, it has APIs going two different directions. One direction is going
|
||||
into content, and then, one direction are these abstract interfaces that go out
|
||||
from content. And any embedder has to implement them. And so, these usually end
|
||||
up in terms like client or delegate. And these are implemented by Chrome, and
|
||||
that's how content is able to call back to it. But then, any other, of course,
|
||||
product or embedder can also implement these same interfaces.
|
||||
|
||||
08:23 SHARON: You mentioned link and also some things called delegate and
|
||||
whatever. So we have a lot of things called something something host in
|
||||
content. Can you talk a bit about what the relationship between content and
|
||||
Blink is? Because there's a lot of mirroring in terms of how they might be set
|
||||
up, and how they relate to each other.
|
||||
|
||||
08:37 JOHN: So Blink was the rendering engine that originally started as Web
|
||||
Kit. And we forked, and we named it Blink a number of years ago. And that did
|
||||
not have any concept of processes. So it was something that you call it in one
|
||||
process, and it does its job. And you give it whatever data it needs, and it
|
||||
gives you back the rendered data. And you can poke at it or whatever you want
|
||||
to do with it. But you needed to wrap that with some - you needed a bunch of
|
||||
code around it to make it multi-process. And also, to figure out when it needs
|
||||
something that's not available in the sandbox that it runs in, you have to
|
||||
provide that data. And so, this is where the content layer comes in. It's the
|
||||
one that wraps the rendering engine and uses the networking library and other
|
||||
things to be able to create a fully working browser.
|
||||
|
||||
09:33 SHARON: More about processes. So it's easy to think, maybe, that the
|
||||
content - the relationship between the content layer and the browser process.
|
||||
So can you just talk a bit about how processes work in content? And what the
|
||||
content API provides in terms of accessing these processes?
|
||||
|
||||
09:54 JOHN: So the content code runs in - it's the initial process that runs.
|
||||
Content starts up, and then - and so, it's in the browser process. But it also
|
||||
creates the render processes for where Blink runs. It creates a GPU process
|
||||
that talks to the GPU and where a bunch of the compositing happens. It creates
|
||||
a network process where we do networking. It creates other processes, things
|
||||
like audio on some platforms, storage process to isolate storage. And then, a
|
||||
lot of short lived processes for security and stability reasons. And so, you
|
||||
can have processes that run content code, but, sometimes, an embedder wants to
|
||||
run its own code in a different process. So it could re-use the same helpers
|
||||
that content has for creating a process, and we'll use that. And then, I think
|
||||
I didn't fully answer your previous question yet, which was the host part. So,
|
||||
often, you'll have classes in Blink that are running in the renderer process,
|
||||
and you need an equivalent class to drive it from the browser process. And
|
||||
that's where we often have the host suffix. So it'd be like a class for -
|
||||
|
||||
11:11 SHARON: Can you give an example of -
|
||||
|
||||
11:11 JOHN: Yes. So, for example, every renderer process has a class in content
|
||||
browser called render process host. And then, every tab object in Blink will
|
||||
have this class called render view, and then, in content browser, it will have
|
||||
this class called render view host.
|
||||
|
||||
11:36 SHARON: Those are classes that, depending on what you work on, you might
|
||||
see pop up quite a bit. And there's a lot of them. They're all called render
|
||||
something host, and it's a bit tough to keep them straight. But that makes
|
||||
sense as to why they're called render and - why render and host are in the
|
||||
names for them. So you just listed a bunch of different process types. The GPU
|
||||
process, the browser process, render processes. And, usually, whenever we have
|
||||
different processes, we have some security boundary between them. Can you talk
|
||||
a bit about how security and the content layer overlap? Is the content API a
|
||||
security boundary? What happens if someone calls it maliciously? What could go
|
||||
wrong if they do and do it successfully?
|
||||
|
||||
12:26 JOHN: So the security boundaries in any browser built on top of content
|
||||
is the processes. We separate things to not just have render processes per tab,
|
||||
but there are multiple render processes per tab thanks to the amazing work of
|
||||
the Site Isolation project. And that's what split up different iframes into
|
||||
different processes. And so, how they talk, all these processes talk through
|
||||
IPC, and our current IPC system's called Mojo. And so, any time you talk, you
|
||||
use Mojo between processes. You're usually talking from between processes of
|
||||
different privileges. And so, one could be sandboxed and the other one not
|
||||
sandboxed. Or one could be sandboxed, and the other one only partially
|
||||
sandboxed. So you have to scrutinize any time you use these Mojo calls to make
|
||||
sure that they can't inadvertently lead to a security vulnerability. Now, even
|
||||
those, as hard as you can, people could still misuse code. Or, also, embedders
|
||||
like Chrome or other content embedders can add their own IPCs. So content
|
||||
obviously doesn't know about the IPCs from other layers, and so, it's possible
|
||||
that it could be an embedder of content that has security vulnerability in
|
||||
their own Mojo calls. And so, content doesn't know about them, so it can't do
|
||||
anything about them. You could write insecure code in content. You can also
|
||||
write in secure code in an embedder, and if someone finds a vulnerability - so
|
||||
let's say someone finds a vulnerability in Blink, and maybe they're only
|
||||
running their code in a minimal content shell. Maybe they can't find any other
|
||||
Mojo calls that they can abuse to be able to get access to the browser process.
|
||||
But maybe someone else, an embedder, is a more full-featured browser. It has
|
||||
more IPC service, and that could be more of an attack surface for that - to
|
||||
start with that Blink vulnerability and then to hop into the browser process.
|
||||
|
||||
14:38 SHARON: And if you gain control of the browser process, that's a very
|
||||
highly privileged process.
|
||||
|
||||
14:44 JOHN: Because that has full access to your system. So that's the point
|
||||
where you can leave persistent changes to the user system, which is pretty bad.
|
||||
|
||||
14:55 SHARON: That sounds not great. So if you're an average, say, Chrome
|
||||
engineer, that could be anyone. This is probably not too much of a concern. All
|
||||
the stuff we mentioned, this is good to know. How would a Chrome engineer who
|
||||
doesn't directly work on content or in the content directory interact with the
|
||||
content layer?
|
||||
|
||||
15:20 JOHN: Well, they might need a signal from Blink, for example. That's
|
||||
often how someone will do that. They'll be working on a feature in the browser,
|
||||
and everything works great. But then, they'll be like, I just need something
|
||||
from Blink. But it's not there. And so, sometimes, they'll have to add an IPC
|
||||
between processes, and that might interact. They'll be like, how do I get it?
|
||||
It's in Blink. It's in the render view class. so I need an interface that talks
|
||||
between each render view host and each render view. And that's how they might
|
||||
get - well, that would be how they get interaction with the multiprocessor part
|
||||
of it. But if someone is just working on something only in a browser process,
|
||||
they might still be trying to get information about the current tab. And that's
|
||||
represented by a web content's class and content. So they'll look in content
|
||||
public browser, and they'll see web contents. And there will be a lot of
|
||||
interfaces that hang off it. So they'll be looking at it, going through a trail
|
||||
of interfaces and classes to be able to get more information on what's going on
|
||||
in the current tab.
|
||||
|
||||
16:29 SHARON: Can you give us a quick overview of the Web Content class?
|
||||
Because it is one, massive, and two, called something like web contents. Which
|
||||
suggests it's important because content plus the web, and it's also something
|
||||
you see all over the place. So can you just give us a quick overview of what
|
||||
that class does? What it's for? What it represents?
|
||||
|
||||
16:46 JOHN: Yes. Things now are a lot more complicated than before, but if you
|
||||
go back in a time machine and see how these things started, you can roughly
|
||||
think in initial Chrome. Every tab had a class to represent the content in that
|
||||
tab, and that was called web contents. And then, it was called web contents
|
||||
because we had other classes. We used to be able to put native stuff in a tab.
|
||||
And so, that would be called tab contents. But that's gone now, and we just
|
||||
have web contents. So that's where the name comes from. And then even, for
|
||||
example, there was render process host, which I mentioned earlier. And then,
|
||||
each tab, each web contents roughly translate into one render process. And so,
|
||||
now, it's a bit more complicated. There are examples where you can have web
|
||||
contents inside of web contents, and that's more esoteric that most people
|
||||
don't have to deal with. And then, so that's what web contents is for. It will
|
||||
do things like take input and feed it to the page. Every time there's a
|
||||
permission prompt, you usually go through that. If a page wants to access to a
|
||||
microphone, or video, and so on. It keeps track of this navigation going on.
|
||||
What's the current URL? What's the pending URL? It uses other classes to drive
|
||||
all that stuff as you send out the network request and get it back. And that's
|
||||
not inside of web contents itself, but it's driven by other helper classes.
|
||||
|
||||
18:28 SHARON: I tend to think of content as being the home of navigation, which
|
||||
I think is a decent way to think about it and also is maybe biased because of
|
||||
the stuff I've been working on. But you have Chrome, and navigation, and
|
||||
content, and all the stuff here. And then, separately, you have the actual web,
|
||||
the internet. And that has things like actual websites. And there are web
|
||||
standards, and there's things like HTML. And these two things somehow have to
|
||||
intersect. But being on the Chrome side, working on Chrome, apart from writing
|
||||
some browser tests, maybe, you never really interact with any of the more web
|
||||
things. JavaScript, you don't really touch. That's more Blink and HTML only in
|
||||
a test kind of thing. So how do these web standards - there's navigation web
|
||||
standards and all that. How do we actually make sure that they're implemented
|
||||
in Chrome? And where does that happen?
|
||||
|
||||
19:32 JOHN: So that happens all over the code, but there's a few critical
|
||||
directories. If you look at net at a low level, a lot of IETF - and some
|
||||
aspects will be implemented there at that layer. Either net or in the network
|
||||
service, which is a code that runs inside the network process. Then you've got
|
||||
V8, of course, our JavaScript engine, and that has to follow the ECMAScript
|
||||
standards. And then, there's a lot of the platform standards. Either some of
|
||||
them only don't need multiple processes to be - to implement them, so they'll
|
||||
just be completely inside Blink. But some of them require multiple processes,
|
||||
things that need access to devices and so on. And so, that implementation will
|
||||
be split across Blink and content browser. But then, how do you ensure that,
|
||||
not only do you implement this correctly, but also that you don't regress it?
|
||||
So there's a whole slew of tests. There's the Blink tests, which used to be
|
||||
called the layout tests. And those run across the simple, simple test cases for
|
||||
many features to make sure that each one works. And there's also this cool
|
||||
thing where we share now a lot of these tests with other embedders, and that
|
||||
way, you run the same test in every browser. And so, when you write a test, you
|
||||
don't have to write it n times. You can just write it once. So that's how we
|
||||
ensure that we meet the specs.
|
||||
|
||||
21:10 SHARON: That makes sense. Because I've been pointed - when I was looking
|
||||
into a class. What does this do? I've been linked to, say, one of the HTML
|
||||
specs or web specs. But the whole time, I'm just thinking, how do we make
|
||||
sure - or who's checking that we're actually implementing this and correctly?
|
||||
But these tests seem like a good way to do it and also ensure some level of
|
||||
consistency across browsers. Assuming you know whether or not the browser you
|
||||
use chooses to run these tests or not, I guess.
|
||||
|
||||
21:41 JOHN: And as an engineer on a project like that, the first time you'll
|
||||
hit them is when you're breaking them. You'll make a change, and I think this
|
||||
is fine. And then, you send it to the commit queue, and you break some layout
|
||||
tests. What's happening to me today? And then, you have to drill into it. And
|
||||
the nice thing about layout test is because each one is small, you - it's
|
||||
faster to figure out what you broke because it's just like, hopefully, you only
|
||||
broke a small number of tests.
|
||||
|
||||
22:06 SHARON: For sure, and it's a good example of why we have all these tests,
|
||||
is to make sure things don't break. So that is pretty much all the questions I
|
||||
have written down. Is there anything else generally content layer, content
|
||||
public API-ish related that is interesting that maybe we didn't get a chance to
|
||||
cover?
|
||||
|
||||
22:31 JOHN: Yes. The most common questions is people will be like, well, does
|
||||
this belong in content or not? So I can have a chance to point people towards
|
||||
their README files and content/README that describes what's supposed to go in
|
||||
or not. And then, there's also a content/public/README that describes the
|
||||
guidelines we have for the API to make it consistent.
|
||||
|
||||
22:59 SHARON: I've definitely seen those questions before. You're updating one
|
||||
of the content public APIs. Does this belong? While we're here, can you give us
|
||||
a quick breakdown heuristic of what things generally would belong in the
|
||||
content public API versus you put it up for review, and the reviewer's like,
|
||||
no. This does not belong in content public?
|
||||
|
||||
23:24 JOHN: So sometimes, for example, for convenience, maybe the Chrome layer
|
||||
wants to call other parts of Chrome layer, but they don't have a direct
|
||||
connection. Or maybe a Chrome layer wants to talk to a different component. And
|
||||
so, they'll be like, we'll add something to the content API, and then, that
|
||||
way, Chrome can talk to this other part of Chrome or this other component
|
||||
through content as a shortcut. We don't allow that, and the reason for that is
|
||||
anybody who's gone through the content public directory, it's already huge. And
|
||||
so, we feel that if Chrome wants to talk to Chrome or to another layer, they
|
||||
should have their own API to each other directly instead of hopping through
|
||||
content. Just because the content API's already very large, very complex, hard
|
||||
to understand. So we don't want to add things that are absolutely not necessary
|
||||
to it. And another thing we try to do is to not add multiple ways of doing
|
||||
something. We only add something to the content API when there's no other way
|
||||
of getting this data from inside content, or there's no other way of getting
|
||||
this data from them better to content. But if there's something similar that
|
||||
can do the same thing, we push back on that.
|
||||
|
||||
24:39 SHARON: And also, test-only things? Are those generally OK, or do you
|
||||
want to generally avoid those?
|
||||
|
||||
24:45 JOHN: Well, yes. test-only methods, we try really hard - not just for the
|
||||
public API, but inside, because we don't want to bloat the binary. But we do
|
||||
have content public tests, which is - gives you a lot more leeway to poke at
|
||||
things in your browser test, for example, or your unit tests. Another thing is,
|
||||
we also have guidelines for how the API should be. We don't have, really,
|
||||
concrete classes. It's mostly abstract interfaces. And so, there's a bunch of
|
||||
rules there, and they're all listed in content/public/README. Just so people
|
||||
know the guidelines we have for interfaces there.
|
||||
|
||||
25:28 SHARON: On the Chrome binary point, how much is the size of the binary
|
||||
dependent on the size of the content public API? Is that a big part of the
|
||||
binary, or is it small enough where, sure, we want to keep it from being
|
||||
unnecessarily large but not too much of an issue?
|
||||
|
||||
25:48 JOHN: The size is not going to come as much from the content/public API
|
||||
but just from the entire content and all its dependencies. And those are in the
|
||||
tens of megabytes. So, sometimes, for example, if you're bundling the content
|
||||
layer, you're not going to be a small binary. You'll just start off in the 30
|
||||
megabyte range or 40 megabyte range once you put everything together.
|
||||
|
||||
26:12 SHARON: And I guess that's something you have to be more conscious of if
|
||||
you're working in content versus another directory even in Chrome. is that you
|
||||
have to be wary of your dependencies more so than anywhere else. Not only for
|
||||
Chrome, but also, any other embedders who might want to use content.
|
||||
|
||||
26:31 JOHN: Yes. And so, for example, if someone's trying to add something in
|
||||
Chrome, we also ask, does this have to be in content? Of can this be part of
|
||||
Chrome, so that not every embedder has to pay that cost if they don't need it?
|
||||
Maybe we'll have an interface, and the embedder can plug the data in through
|
||||
that way but still not have it in content. Another problem, of course, with
|
||||
having data inside content is that not all embedders update at the same speed.
|
||||
So if you're putting something in content, it can quickly go stale, the
|
||||
content, whatever the data is if you're not updating quickly.
|
||||
|
||||
27:08 SHARON: That make sense. So we mentioned a bit of what content is, a bit
|
||||
of the history of it. Can you tell us anything about what are upcoming changes
|
||||
that might happen in content? What is the future of the content directory, the
|
||||
layer, the API?
|
||||
|
||||
27:28 JOHN: Well, it's always changing. It's not static, driven by the needs of
|
||||
the product. And so, you look at big changes happening today like MPArch to
|
||||
support various use cases that we didn't have, or we never thought about
|
||||
initially. And that's where the web contents, inside web content, some of that
|
||||
comes in. There are big changes like banning, for example, pointers and
|
||||
replacing them with a raw pointer. So we can try to address some of the
|
||||
security problems we have with Use-After-Frees. So that's where, when you look
|
||||
at the content code or the Chrome code in general, too, you might see a little
|
||||
bit different than that average C++ project that you see. You'll be like, I'm
|
||||
getting errors if I try to have a raw pointer, and that's why.
|
||||
|
||||
28:15 SHARON: Check out episode one for more on that. We'll link it below.
|
||||
Anything else random content-related or otherwise you would like to share with
|
||||
us?
|
||||
|
||||
28:27 JOHN: I think the only other thing I would add is familiarize yourself
|
||||
with the READMEs in content/README and content/public/README before making
|
||||
changes. That will make the author and reviewer's time more efficient. And if
|
||||
you're working on content and below, you can build Content Shell instead of
|
||||
Chrome. That would be faster to build and debug and hopefully make you more
|
||||
productive.
|
||||
|
||||
28:52 SHARON: Good tips. Hopefully, our viewers follow them. They would never
|
||||
try to change a content/public API without reading the READMEs first. Well,
|
||||
thank you so much, John, for sitting down and chatting with me about content.
|
||||
This was great, and, hopefully, people find it useful.
|
||||
|
||||
29:14 JOHN: And thank you for hosting me, Sharon.
|
||||
|
||||
29:23 SHARON: Did you start working on Chrome from the very start, or just -
|
||||
obviously, pre-launch. Because, I think, based on your profile pictures, the
|
||||
picture of that comic book that released when Chrome did - which I was lucky
|
||||
enough to get a copy of when I was an intern. Shout-out Peter. But that
|
||||
obviously suggests you were a major contributor before the public launch of
|
||||
Chrome. So were you working on Chrome from the very beginning?
|
||||
|
||||
29:47 JOHN: I was not. It took about six months. I tried to join from the
|
||||
beginning, but I couldn't join right at the beginning. So my sneaky way was I
|
||||
found another project under that same director who was running Chrome, and
|
||||
then, once that project finished in six months, then I jumped into Chrome.
|
||||
|
||||
30:09 SHARON: And do you ever think about how crazy it is from this thing that
|
||||
you worked on, effectively, from the start before the public launch? To what it
|
||||
is now where Chrome is one of the foundational pieces of the internet at large?
|
||||
Any time the internet gets run period, probably something in Chrome is running
|
||||
like the next stack, if not, obviously, the browser? Do you ever think about
|
||||
that, and how crazy that is? And your place in that?
|
||||
|
||||
30:38 JOHN: Yes. It's amazing how far Chrome has come, and it's really humbling
|
||||
to see it be the number one browser, the most widely-used browser. Because when
|
||||
we were working on Chrome at the beginning, we were just trying to guess what
|
||||
market share it would have. And people would be like, it'll be 10%, and we're
|
||||
like, no way. Even the people working on it, we didn't think that was going to
|
||||
be possible. So to see users really enjoy using it, and for us to keep
|
||||
demonstrating value by sticking to our four principles, security and stability,
|
||||
simplicity and speed. And seeing people not just adopt Chrome as a product, but
|
||||
Chromium as a platform is - it's beyond our wildest dreams. And it's a
|
||||
responsibility that we have every time we make a change to Chrome to all these
|
||||
users and developers using it. You were asking earlier, how does it feel to be
|
||||
here from the start? There's almost a sense of feeling super lucky. But also
|
||||
this humbling feeling where we started in Chrome when it was really small, and
|
||||
our knowledge built up incrementally as it got more complicated. But so, it's
|
||||
like, well, what if I was to jump in Chrome today? It seems like way too many -
|
||||
the code is so complicated now compared to before. This almost responsibility
|
||||
we have as being in Chrome for a long time to share knowledge, to help people
|
||||
pick it up. Because we would ourselves struggle if we were to jump in now.
|
||||
|
||||
32:22 SHARON: Yes. As those people, we certainly did struggle. But people are
|
||||
pretty smart, I think, and they can figure it out. But that doesn't mean you
|
||||
can't make it easier for the people in the future figuring it out. Or even
|
||||
people who - you just work on a different part. If I were to do anything in
|
||||
Blink, I'm just like -
|
||||
|
||||
32:44 JOHN: Same. I've been on it for a long time. I don't touch Blink.
|
||||
|
||||
32:50 SHARON: Yes. Yes.
|
968
docs/transcripts/wuwt-e04-tests.md
Normal file
968
docs/transcripts/wuwt-e04-tests.md
Normal file
@ -0,0 +1,968 @@
|
||||
# What’s Up With Tests
|
||||
|
||||
This is a transcript of [What's Up With
|
||||
That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)
|
||||
Episode 4, a 2022 video discussion between [Sharon (yangsharon@chromium.org)
|
||||
and Stephen
|
||||
(smcgruer@chromium.org)](https://www.youtube.com/watch?v=KePsimOPSro).
|
||||
|
||||
The transcript was automatically generated by speech-to-text software. It may
|
||||
contain minor errors.
|
||||
|
||||
---
|
||||
|
||||
Testing is important! What kinds of tests do we have in Chromium? What are they
|
||||
all about? Join in as Stephen, who led Chrome's involvement in web platform
|
||||
tests, tells us all about them.
|
||||
|
||||
Notes:
|
||||
- https://docs.google.com/document/d/1SRoNMdPn78vwZVX7YzcdpF4cJdHTIV6JLGiVC2dJUaI/edit
|
||||
|
||||
---
|
||||
|
||||
00:00 SHARON: Hello, everyone, and welcome to "What's Up With That," the series
|
||||
that demystifies all things Chrome. I'm your host, Sharon. And today we're
|
||||
talking testing. Within Chrome, there are so many types of tests. What are they
|
||||
all? What's the difference? What are the Chromium-specific quirks? Today's
|
||||
guest is Stephen. He previously led Chrome's involvement in web platform tests.
|
||||
Since then, he's worked on rendering, payments, and interoperability. As a fun
|
||||
aside, he's one of the first people I met who worked on Chrome and is maybe
|
||||
part of why I'm here today. So welcome, Stephen.
|
||||
|
||||
00:33 STEPHEN: Well, thank you very much for having me, Sharon, I'm excited to
|
||||
be here.
|
||||
|
||||
00:33 SHARON: Yeah, I'm excited to have you here. So today, we're in for maybe
|
||||
a longer episode. Testing is a huge topic, especially for something like
|
||||
Chrome. So grab a snack, grab a drink, and let's start. We'll start with what
|
||||
are all of the things that we have testing for in Chrome. What's the purpose of
|
||||
all these tests we have?
|
||||
|
||||
00:51 STEPHEN: Yeah. It's a great question. It's also an interesting one
|
||||
because I wanted to put one caveat on this whole episode, which is that there
|
||||
is no right answer in testing. Testing, even in the literature, never mind in
|
||||
Chromium itself, is not a solved problem. And so you'll hear a lot of different
|
||||
opinions. People will have different thoughts. And I'm sure that no matter how
|
||||
hard we try, by the end of this episode, our inbox will be filled with angry
|
||||
emails from people being like, no, you are wrong. So all of the stuff we're
|
||||
saying here today is my opinion, albeit I'll try and be as useful as possible.
|
||||
But yeah, so why do we test was the question, right? So there's a lot of
|
||||
different reasons that we write tests. Obviously, correctness is the big one.
|
||||
You're writing some code, you're creating a feature, you want it to be correct.
|
||||
Other reasons we write them, I mean, tests can be useful as a form of
|
||||
documentation in itself. If you're ever looking at a class and you're like,
|
||||
what does - why is this doing this, why is the code doing this, the test can
|
||||
help inform that. They're also useful - I think a topic of this podcast is sort
|
||||
of security. Tests can be very useful for security. Often when we have a
|
||||
security bug, we go back and we write what are called regression tests, so at
|
||||
least we try and never do that security failure again. And then there are other
|
||||
reasons. We have tests for performance. We have tests for - our launch process
|
||||
uses tests. There's lots and lots of reasons we have tests.
|
||||
|
||||
02:15 SHARON: Now that you've covered all of the different reasons why we test,
|
||||
how do we do each of these types of tests in Chromium? What are the test types
|
||||
we have?
|
||||
|
||||
02:27 STEPHEN: Yeah. So main test types we have in Chromium, unit tests,
|
||||
browser tests, what we call web tests, and then there's a bunch of more
|
||||
specialized ones, performance tests, testing on Android, and of course manual
|
||||
testing.
|
||||
|
||||
02:43 SHARON: We will get into each of these types now, I guess. The first type
|
||||
of test you mentioned is unit tests. Why don't you tell us a quick rundown of
|
||||
what unit tests are. I'm sure most people have encountered them or heard of
|
||||
them before. But just a quick refresher for those who might not.
|
||||
|
||||
02:55 STEPHEN: Yeah, absolutely. So as the name implies, a unit test is all
|
||||
about testing a unit of code. And what that is not very well defined. But you
|
||||
can usually think of it as just a class, a file, a small isolated component
|
||||
that doesn't have to talk to all the other bits of the code to work. Really,
|
||||
the goal is on writing something that's testing just the code under test - so
|
||||
that new method you've added or whatever. And it should be quick and easy to
|
||||
run.
|
||||
|
||||
03:22 SHARON: So on the screen now we have an example of a pretty typical unit
|
||||
test we see in Chrome. So there's three parts here. Let's go through each of
|
||||
them. So the first type - the first part of this is `TEST_P`. What is that
|
||||
telling us?
|
||||
|
||||
03:38 STEPHEN: Yeah. So that is - in Chromium we use a unit testing framework
|
||||
called Google test. It's very commonly used for C++. You'll see it all over the
|
||||
place. You can go look up documentation. The test macros, that's what this is,
|
||||
are essentially the hook into Google test to say, hey, the thing that's coming
|
||||
here is a test. There's three types. There is just test, which it just says
|
||||
here is a function. It is a test function. `TEST_F` says that you basically
|
||||
have a wrapper class. It's often called a test fixture, which can do some
|
||||
common setup across multiple different tests, common teardown, and that sort of
|
||||
thing. And finally, `TEST_P` is what we call a parameterized test. And what
|
||||
this means is that the test can take some input parameters, and it will run the
|
||||
same test with each of those values. Very useful for things like when you want
|
||||
to test a new flag. What happens if the flag is on or off?
|
||||
|
||||
04:34 SHARON: That's cool. And a lot of the things we're mentioning for unit
|
||||
test also apply to browser test, which we'll cover next. But the
|
||||
parameterization is an example of something that carries over to both. So
|
||||
that's the first part. That's the `TEST_P`, the macro. What's the second part,
|
||||
PendingBeaconHostTest? What is that?
|
||||
|
||||
04:54 STEPHEN: Yeah. So that is the fixture class, the test container class I
|
||||
was talking about. So in this case, we're assuming that in order to write a
|
||||
beacon test, whatever that is, they have some set up, some teardown they need
|
||||
to do. They might want to encapsulate some common functionality. So all you
|
||||
have to do to write one of these classes is, you declare a C++ class and you
|
||||
subclass from the Google test class name.
|
||||
|
||||
05:23 SHARON: So this is a `TEST_P`, but you mentioned that this is a fixture.
|
||||
So are fixture tests a subset of parameterized tests?
|
||||
|
||||
05:35 STEPHEN: Parameterized tests are a subset of fixture test, is that the
|
||||
right way around to put it? All parameterized tests are fixtures tests. Yes.
|
||||
|
||||
05:41 SHARON: OK.
|
||||
|
||||
05:41 STEPHEN: You cannot have a parameterized test that does not have a
|
||||
fixture class. And the reason for that is how Google test actually works under
|
||||
the covers is it passes those parameters to your test class. You will have to
|
||||
additionally extend from the `testing::WithParamInterface`. And that says, hey,
|
||||
I'm going to take parameters.
|
||||
|
||||
06:04 SHARON: OK. But not all fixture tests are parameterized tests.
|
||||
|
||||
06:04 STEPHEN: Correct.
|
||||
|
||||
06:04 SHARON: OK. And the third part of this, SendOneOfBeacons. What is that?
|
||||
|
||||
06:10 STEPHEN: That is your test name. Whatever you want to call your test,
|
||||
whatever you're testing, put it here. Again, naming tests is as hard as naming
|
||||
anything. A lot of yak shaving, finding out what exactly you should call the
|
||||
test. I particularly enjoy when you see test names that themselves have
|
||||
underscores in them. It's great.
|
||||
|
||||
06:30 SHARON: Uh-huh. What do you mean by yak shaving?
|
||||
|
||||
06:35 STEPHEN: Oh, also known as painting a bike shed? Bike shed, is that the
|
||||
right word? Anyway, generally speaking -
|
||||
|
||||
06:40 SHARON: Yeah, I've heard -
|
||||
|
||||
06:40 STEPHEN: arguing about pointless things because at the end of the day,
|
||||
most of the time it doesn't matter what you call it.
|
||||
|
||||
06:46 SHARON: OK, yeah. So I've written this test. I've decided it's going to
|
||||
be parameterized. I've come up with a test fixture for it. I have finally named
|
||||
my test. How do I run my tests now?
|
||||
|
||||
06:57 STEPHEN: Yeah. So all of the tests in Chromium are built into different
|
||||
test binaries. And these are usually named after the top level directory that
|
||||
they're under. So we have `components_unittests`, `content_unittests`. I think
|
||||
the Chrome one is just called `unit_tests` because it's special. We should
|
||||
really rename that. But I'm going to assume a bunch of legacy things depend on
|
||||
it. Once you have built whichever the appropriate binary is, you can just run
|
||||
that from your `out` directory, so `out/release/components_unittests`, for
|
||||
example. And then that, if you don't pass any flags, will run every single
|
||||
components unit test. You probably don't want to do that. They're not that
|
||||
slow, but they're not that fast. So there is a flag `--gtest_filter`, which
|
||||
allows you to filter. And then it takes a test name after that. The format of
|
||||
test names is always test class dot test name. So for example, here
|
||||
PendingBeaconHostTest dot SendOneOfBeacons.
|
||||
|
||||
08:04 SHARON: Mm-hmm. And just a fun aside for that one, if you do have
|
||||
parameterized tests, it'll have an extra slash and a number at the end. So
|
||||
normally, whenever I use it, I just put a star before and after. And that
|
||||
generally does - covers the cases.
|
||||
|
||||
08:17 STEPHEN: Yeah, absolutely.
|
||||
|
||||
08:23 SHARON: Cool. So with the actual test names, you will often see them
|
||||
prefixed with either `MAYBE_` or `DISABLED_`, or before the test, there will be
|
||||
an ifdef with usually a platform and then depending on the cases, it'll prefix
|
||||
the test name with something. So I think it's pretty clear what these are
|
||||
doing. Maybe is a bit less clear. Disabled pretty clear what that is. But can
|
||||
you tell us a bit about these prefixes?
|
||||
|
||||
08:51 STEPHEN: Yeah, absolutely. So this is our way of trying to deal with that
|
||||
dreaded thing in testing, flake. So when a test is flaky, when it doesn't
|
||||
produce a consistent result, sometimes it fails. We have in Chromium a whole
|
||||
continuous integration waterfall. That is a bunch of bots on different
|
||||
platforms that are constantly building and running Chrome tests to make sure
|
||||
that nothing breaks, that bad changes don't come in. And flaky tests make that
|
||||
very hard. When something fails, was that a real failure? And so when a test is
|
||||
particularly flaky and is causing sheriffs, the build sheriffs trouble, they
|
||||
will come in and they will disable that test. Basically say, hey, sorry, but
|
||||
this test is causing too much pain. Now, as you said, the `DISABLED_` prefix,
|
||||
that's pretty obvious. If you put that in front of a test, Google test knows
|
||||
about it and it says, nope, will not run this test. It will be compiled, but it
|
||||
will not be run. `MAYBE_` doesn't actually mean anything. It has no meaning to
|
||||
Google test. But that's where you'll see, as you said, you see these ifdefs.
|
||||
And that's so that we can disable it on just one platform. So maybe your test
|
||||
is flaky only on Mac OS, and you'll see basically, oh, if Mac OS, change the
|
||||
name from maybe to disabled. Otherwise, define maybe as the normal test name.
|
||||
|
||||
10:14 SHARON: Makes sense. We'll cover flakiness a bit later. But yeah, that's
|
||||
a huge problem. And we'll talk about that for sure. So these prefixes, the
|
||||
parameterization and stuff, this applies to both unit and browser tests.
|
||||
|
||||
10:27 STEPHEN: Yeah.
|
||||
|
||||
10:27 SHARON: Right? OK. So what are browser tests? Chrome's a browser. Browser
|
||||
test, seems like there's a relation.
|
||||
|
||||
10:34 STEPHEN: Yeah. They test the browser. Isn't it obvious? Yeah. Browser
|
||||
tests are our version - our sort of version of an integration or a functional
|
||||
test depending on how you look at things. What that really means is they're
|
||||
testing larger chunks of the browser at once. They are integrating multiple
|
||||
components. And this is somewhere that I think Chrome's a bit weird because in
|
||||
many large projects, you can have an integration test that doesn't bring your
|
||||
entire product up and in order to run. Unfortunately, or fortunately, I guess
|
||||
it depends on your viewpoint, Chrome is so interconnected, it's so
|
||||
interdependent, that more or less we have to bring up a huge chunk of the
|
||||
browser in order to connect any components together. And so that's what browser
|
||||
tests are. When you run one of these, there's a massive amount of machinery in
|
||||
the background that goes ahead, and basically brings up the browser, and
|
||||
actually runs it for some definition of what a browser is. And then you can
|
||||
write a test that pokes at things within that running browser.
|
||||
|
||||
11:42 SHARON: Yeah. I think I've heard before multiple times is that browser
|
||||
tests launch the whole browser. And that's -
|
||||
|
||||
11:47 STEPHEN: More or less true. It's - yeah.
|
||||
|
||||
11:47 SHARON: Yes. OK. Does that also mean that because you're running all this
|
||||
stuff that all browser tests have fixtures? Is that the case?
|
||||
|
||||
11:59 STEPHEN: Yes, that is the case. Absolutely. So there is only - I think
|
||||
it's - oh my goodness, probably on the screen here somewhere. But it's
|
||||
`IN_PROC_BROWSER_TEST_F` and `IN_PROC_BROWSER_TEST_P`. There is no version that
|
||||
doesn't have a fixture.
|
||||
|
||||
12:15 SHARON: And what does the in proc part of that macro mean?
|
||||
|
||||
12:15 STEPHEN: So that's, as far as I know - and I might get corrected on this.
|
||||
I'll be interested to learn. But it refers to the fact that we've run these in
|
||||
the same process. Normally, the whole Chromium is a multi-process architecture.
|
||||
For the case of testing, we put that aside and just run everything in the same
|
||||
process so that it doesn't leak, basically.
|
||||
|
||||
12:38 SHARON: Yeah. There's flags when you run them, like `--single-process`.
|
||||
And then there's `--single-process-test`. And they do slightly different
|
||||
things. But if you do run into that, probably you will be working with people
|
||||
who can answer and explain the differences between those more. So something
|
||||
that I've seen quite a bit in browser and unit tests, and only in these, are
|
||||
run loops. Can you just briefly touch on what those are and what we use them
|
||||
for in tests?
|
||||
|
||||
13:05 STEPHEN: Oh, yeah. That's a fun one. I think actually previous on an
|
||||
episode of this very program, you and Dana talked a little bit around the fact
|
||||
that Chrome is not a completely synchronous program, that we do we do task
|
||||
splitting. We have a task scheduler. And so run loops are part of that,
|
||||
basically. They're part of our stack for handling asynchronous tasks. And so
|
||||
this comes up in testing because sometimes you might be testing something
|
||||
that's not synchronous. It takes a callback, for example, rather than returning
|
||||
a value. And so if you just wrote your test as normal, you call the function,
|
||||
and you don't - you pass a callback, but then your test function ends. Your
|
||||
test function ends before that callback ever runs. Run loop gives you the
|
||||
ability to say, hey, put this callback into some controlled run loop. And then
|
||||
after that, you can basically say, hey, wait on this run loop. I think it's
|
||||
often called quit when idle, which basically says keep running until you have
|
||||
no more tasks to run, including our callback, and then finish. They're
|
||||
powerful. They're very useful, obviously, with asynchronous code. They're also
|
||||
a source of a lot of flake and pain. So handle with care.
|
||||
|
||||
14:24 SHARON: Yeah. Something a tip is maybe using the `--gtest_repeat` flag.
|
||||
So that one lets you run your test however number of times you've had to do it.
|
||||
|
||||
14:30 STEPHEN: Yeah.
|
||||
|
||||
14:36 SHARON: And that can help with testing for flakiness or if you're trying
|
||||
to debug something flaky. In tests, we have a variety of macros that we use. In
|
||||
the unit test and the browser tests, you see a lot of macros, like `EXPECT_EQ`,
|
||||
`EXPECT_GT`. These seem like they're part of maybe Google test. Is that true?
|
||||
|
||||
14:54 STEPHEN: Yeah. They come from Google test itself. So they're not
|
||||
technically Chromium-specific. But they basically come in two flavors. There's
|
||||
the `EXPECT_SOMETHING` macros. And there's the `ASSERT_SOMETHING` macros. And
|
||||
the biggest thing to know about them is that expect doesn't actually cause - it
|
||||
causes a test to fail, but it doesn't stop the test from executing. The test
|
||||
will continue to execute the rest of the code. Assert actually throws an
|
||||
exception and stops the test right there. And so this can be useful, for
|
||||
example, if you want to line up a bunch of expects. And your code still makes
|
||||
sense. You're like, OK, I expect to return object, and it's got these fields.
|
||||
And I'm just going to expect each one of the fields. That's probably fine to
|
||||
do. And it may be nice to have output that's like, no, actually, both of these
|
||||
fields are wrong. Assert is used when you're like, OK, if this fails, the rest
|
||||
of the test makes no sense. Very common thing you'll see. Call an API, get back
|
||||
some sort of pointer, hopefully a smart pointer, hey. And you're going to be
|
||||
like, assert that this pointer is non-null because if this pointer is null,
|
||||
everything else is just going to be useless.
|
||||
|
||||
15:57 SHARON: I think we see a lot more expects than asserts in general
|
||||
anecdotally from looking at the test. Do you think, in your opinion, that
|
||||
people should be using asserts more generously rather than expects, or do we
|
||||
maybe want to see what happens - what does go wrong if things continue beyond a
|
||||
certain point?
|
||||
|
||||
16:15 STEPHEN: Yeah. I mean, general guidance would be just keep using expect.
|
||||
That's fine. It's also not a big deal if your test actually just crashes. It's
|
||||
a test. It can crash. It's OK. So use expects. Use an assert if, like I said,
|
||||
that the test doesn't make any sense. So most often if you're like, hey, is
|
||||
this pointer null or not and I'm going to go do something with this pointer,
|
||||
assert it there. That's probably the main time you'd use it.
|
||||
|
||||
16:45 SHARON: A lot of the browser test classes, like the fixture classes
|
||||
themselves, are subclass from other base classes.
|
||||
|
||||
16:53 STEPHEN: Mm-hmm.
|
||||
|
||||
16:53 SHARON: Can you tell us about that?
|
||||
|
||||
16:53 STEPHEN: Yeah. So basically, we have one base class for browser tests. I
|
||||
think its `BrowserTestBase`, I think it's literally called, which sits at the
|
||||
bottom and does a lot of the very low level setup of bringing up a browser. But
|
||||
as folks know, there's more than one browser in the Chromium project. There is
|
||||
Chrome, the Chrome browser that is the more full-fledged version. But there's
|
||||
also content shell, which people might have seen. It's built out of content.
|
||||
It's very simple browser. And then there are other things. We have a headless
|
||||
mode. There is a headless Chrome you can build which doesn't show any UI. You
|
||||
can run it entirely from the command line.
|
||||
|
||||
17:32 SHARON: What's the difference between headless and content shell?
|
||||
|
||||
17:39 STEPHEN: So content shell does have a UI. If you run content shell, you
|
||||
will actually see a little UI pop up. What content shell doesn't have is all of
|
||||
those features from Chrome that make Chrome Chrome, if you will. So I mean,
|
||||
everything from bookmarks, to integration with having an account profile, that
|
||||
sort of stuff is not there. I don't think content shell even supports tabs. I
|
||||
think it's just one page you get. It's almost entirely used for testing. But
|
||||
then, headless, sorry, as I was saying, it's just literally there is no UI
|
||||
rendered. It's just headless.
|
||||
|
||||
18:13 SHARON: That sounds like it would make -
|
||||
|
||||
18:13 STEPHEN: And so, yeah. And so - sorry.
|
||||
|
||||
18:13 SHARON: testing faster and easier. Go on.
|
||||
|
||||
18:18 STEPHEN: Yeah. That's a large part of the point, as well as when you want
|
||||
to deploy a browser in an environment where you don't see the UI. So for
|
||||
example, if you're running on a server or something like that. But yeah. So for
|
||||
each of these, we then subclass that `BrowserTestBase` in order to provide
|
||||
specific types. So there's content browser test. There's headless browser test.
|
||||
And then of course, Chrome has to be special, and they called their version in
|
||||
process browser test because it wasn't confusing enough. But again, it's sort
|
||||
of straightforward. If you're in Chrome, `/chrome`, use
|
||||
`in_process_browser_test`. If you're in `/content`, use `content_browsertest`.
|
||||
It's pretty straightforward most of the time.
|
||||
|
||||
18:58 SHARON: That makes sense. Common functions you see overridden from those
|
||||
base classes are these set up functions. So they're set, set up on main thread,
|
||||
there seems to be a lot of different set up options. Is there anything we
|
||||
should know about any of those?
|
||||
|
||||
19:13 STEPHEN: I don't think that - I mean, most of it's fairly
|
||||
straightforward. I believe you should mostly be using setup on main thread. I
|
||||
can't say that for sure. But generally speaking, setup on main thread, teardown
|
||||
on main thread - or is it shutdown main thread? I can't remember - whichever
|
||||
the one is for afterwards, are what you should be usually using in a browser
|
||||
thread. You can also usually do most of your work in a constructor. That's
|
||||
something that people often don't know about testing. I think it's something
|
||||
that's changed over time. Even with unit tests, people use the setup function a
|
||||
lot. You can just do it in the constructor a lot of the time. Most of
|
||||
background initialization has already happened.
|
||||
|
||||
19:45 SHARON: I've definitely wondered that, especially when you have things in
|
||||
the constructor as well as in a setup method. It's one of those things where
|
||||
you just kind of think, I'm not going to touch this because eh, but -
|
||||
|
||||
19:57 STEPHEN: Yeah. There are some rough edges, I believe. Set up on main
|
||||
thread, some things have been initialized that aren't around when your class is
|
||||
being constructed. So it is fair. I'm not sure I have any great advice unless -
|
||||
other than you may need to dig in if it happens.
|
||||
|
||||
20:19 SHARON: One last thing there. Which one gets run first, the setup
|
||||
functions or the constructor?
|
||||
|
||||
20:19 STEPHEN: The constructor always happens first. You have to construct the
|
||||
object before you can use it.
|
||||
|
||||
20:25 SHARON: Makes sense. This doesn't specifically relate to a browser test
|
||||
or unit test, but it does seem like it's worth mentioning, which is the content
|
||||
public test API. So if you want to learn more about content and content public,
|
||||
check out episode three with John. But today we're talking about testing. So
|
||||
we're talking about content public test. What is in that directory? And how
|
||||
does that - how can people use what's in there?
|
||||
|
||||
20:48 STEPHEN: Yeah. It's basically just a bunch of useful helper functions and
|
||||
classes for when you are doing mostly browser tests. So for example, there are
|
||||
methods in there that will automatically handle navigating the browser to a URL
|
||||
and actually waiting till it's finished loading. There are other methods for
|
||||
essentially accessing the tab strip of a browser. So if you have multiple tabs
|
||||
and you're testing some cross tab thing, methods in there to do that. I think
|
||||
that's probably where the content browser test - like base class lives there as
|
||||
well. So take a look at it. If you're doing something that you're like, someone
|
||||
should write - it's the basic - it's the equivalent of base in many ways for
|
||||
testing. It's like, if you're like, someone should have written a library
|
||||
function for this, possibly someone has already. And you should take a look.
|
||||
And if they haven't, you should write one.
|
||||
|
||||
21:43 SHARON: Yeah. I've definitely heard people, code reviewers, say when you
|
||||
want to add something that seems a bit test only to content public, put that in
|
||||
content public test because that doesn't get compiled into the actual release
|
||||
binaries. So if things are a bit less than ideal there, it's a bit more
|
||||
forgiving for a place for that.
|
||||
|
||||
22:02 STEPHEN: Yeah, absolutely. I mean, one of the big things about all of our
|
||||
test code is that you can actually make it so that it's in many cases not
|
||||
compiled into the binary. And that is both useful for binary size as well as
|
||||
you said in case it's concerning. One thing you can do actually in test, by the
|
||||
way, for code that you cannot avoid putting into the binary - so let's say
|
||||
you've got a class, and for the reasons of testing it because you've not
|
||||
written your class properly to do a dependency injection, you need to access a
|
||||
member. You need to set a member. But you only want that to happen from test
|
||||
code. No real code should ever do this. You can actually name methods blah,
|
||||
blah, blah for test or for testing. And this doesn't have any - there's no code
|
||||
impact to this. But we have pre-submits that actually go ahead and check, hey,
|
||||
are you calling this from code that's not marked as test code? And it will then
|
||||
refuse to - it will fail to pre-submit upload if that happens. So it could be
|
||||
useful.
|
||||
|
||||
23:03 SHARON: And another thing that relates to that would be the friend test
|
||||
or friend something macro that you see in classes. Is that a gtest thing also?
|
||||
|
||||
23:15 STEPHEN: It's not a gtest thing. It's just a C++ thing. So C++ has the
|
||||
concept of friending another class. It's very cute. It basically just says,
|
||||
this other class and I, we can access each other's internal states. Don't
|
||||
worry, we're friends. Generally speaking, that's a bad idea. We write classes
|
||||
for a reason to have encapsulation. The entire goal of a class is to
|
||||
encapsulate behavior and to hide the implementation details that you don't want
|
||||
to be exposed. But obviously, again, when you're writing tests, sometimes it is
|
||||
the correct thing to do to poke a hole in the test and get at something. Very
|
||||
much in the schools of thought here, some people would be like, you should be
|
||||
doing dependency injection. Some people are like, no, just friend your class.
|
||||
It's OK. If folks want to look up more, go look up the difference between open
|
||||
box and closed box testing.
|
||||
|
||||
24:00 SHARON: For those of you who are like, oh, this sounds really cool, I
|
||||
will learn more.
|
||||
|
||||
24:00 STEPHEN: Yeah, for my test nerds out there.
|
||||
|
||||
24:06 SHARON: [LAUGHS] Yeah, Stephen's got a club. Feel free to join.
|
||||
|
||||
24:06 STEPHEN: Yeah. [LAUGHTER]
|
||||
|
||||
24:11 SHARON: You get a card. Moving on to our next type of test, which is your
|
||||
wheelhouse, which is web tests. This is something I don't know much about. So
|
||||
tell us all about it.
|
||||
|
||||
24:22 STEPHEN: [LAUGHS] Yeah. This is my - this is where hopefully I'll shine.
|
||||
It's the area I should know most about. But web tests are - they're an
|
||||
interesting one. So I would describe them is our version of an end-to-end test
|
||||
in that a web test really is just an HTML file, a JavaScript file that is when
|
||||
you run it, you literally bring up - you'll remember I said that browser tests
|
||||
are most of a whole browser. Web tests bring up a whole browser. It's just the
|
||||
same browser as content shell or Chrome. And it runs that whole browser. And
|
||||
the test does something, either in HTML or JavaScript, that then is asserted
|
||||
and checked. And the reason I say that I would call them this, I have heard
|
||||
people argue that they're technically unit tests, where the unit is the
|
||||
JavaScript file and the entire browser is just, like, an abstraction that you
|
||||
don't care about. I guess it's how you view them really. I view the browser as
|
||||
something that is big and flaky, and therefore these are end-to-end tests. Some
|
||||
people disagree.
|
||||
|
||||
25:22 SHARON: In our last episode, John touched on these tests and how that
|
||||
they're - the scope and that each test covers is very small. But how you run
|
||||
them is not. And I guess you can pick a side that you feel that you like more
|
||||
and go with that. So what are examples of things we test with these kind of
|
||||
tests?
|
||||
|
||||
25:49 STEPHEN: Yeah. So the two big categories of things that we test with web
|
||||
tests are basically web APIs, so JavaScript APIs, provided by the browser to do
|
||||
something. There are so many of those, everything from the fetch API for
|
||||
fetching stuff to the web serial API for talking to devices over serial ports.
|
||||
The web is huge. But anything you can talk to via JavaScript API, we call those
|
||||
JavaScript tests. It's nice and straightforward. The other thing that web tests
|
||||
usually encompass are what are called rendering tests or sometimes referred to
|
||||
as ref tests for reference tests. And these are checking the actual, as the
|
||||
first name implies, the rendering of some HTML, some CSS by the browser. The
|
||||
reason they're called reference tests is that usually the way you do this to
|
||||
check whether a rendering is correct is you set up your test, and then you
|
||||
compare it to some image or some other reference rendering that you're like,
|
||||
OK, this should look like that. If it does look like that, great. If it
|
||||
doesn't, I failed.
|
||||
|
||||
26:54 SHARON: Ah-ha. And are these the same as - so there's a few other test
|
||||
names that are all kind of similar. And as someone who doesn't work in them,
|
||||
they all kind of blur together. So I've also heard web platform tests. I've
|
||||
heard layout tests. I've heard Blink tests, all of which do - all of which are
|
||||
JavaScript HTML-like and have some level of images in them. So are these all
|
||||
the same thing? And if not, what's different?
|
||||
|
||||
27:19 STEPHEN: Yeah. So yes and no, I guess, is my answer. So a long time ago,
|
||||
there were layout tests basically. And that was something we inherited from the
|
||||
WebKit project when we forked there, when we forked Chromium from WebKit all
|
||||
those years ago. And they're exactly what I've described. They were both
|
||||
JavaScript-based tests and they were also HTML-based tests for just doing
|
||||
reference renderings. However, web platform test came up as an external project
|
||||
actually. Web platform test is not a Chromium project. It is external upstream.
|
||||
You can find them on GitHub. And their goal was to create a set of - a test
|
||||
suite shared between all browsers so that all browsers could test - run the
|
||||
same tests and we could actually tell, hey, is the web interoperable? Does it
|
||||
work the same way no matter what browser you're on? The answer is, no. But
|
||||
we're trying. And so inside of Chromium we said, that's great. We love this
|
||||
idea. And so what we did was we actually import web platform test into our
|
||||
layout tests. So web platform test now becomes a subdirectory of layout tests.
|
||||
OK?
|
||||
|
||||
28:30 SHARON: OK. [LAUGHS]
|
||||
|
||||
28:30 STEPHEN: To make things more confusing, we don't just import them, but we
|
||||
also export them. We run a continuous two-way sync. And this means that
|
||||
Chromium developers don't have to worry about that upstream web platform test
|
||||
project most of the time. They just land their code in Chromium, and a magic
|
||||
process happens, and it goes up into the GitHub project. So that's where we
|
||||
were for many years - layout tests, which are a whole bunch of legacy tests,
|
||||
and then also web platform tests. But fairly recently - and I say that knowing
|
||||
that COVID means that might be anything within the last three years because who
|
||||
knows where time went - we decided to rename layout test. And partly, the name
|
||||
we chose was web tests. So now you have web tests, of which web platform tests
|
||||
are a subset, or a - yeah, subset of web test. Easy.
|
||||
|
||||
29:20 SHARON: Cool.
|
||||
|
||||
29:20 STEPHEN: [LAUGHS]
|
||||
|
||||
29:20 SHARON: Cool. And what about Blink tests? Are those separate, or are
|
||||
those these altogether?
|
||||
|
||||
29:27 STEPHEN: I mean, if they're talking about the JavaScript and HTML, that's
|
||||
going to just be another name for the web tests. I find that term confusing
|
||||
because there is also the Blink tests target, which builds the infrastructure
|
||||
that is used to run web tests. So that's probably what you're referring, like
|
||||
`blink_test`. It is the target that you build to run these tests.
|
||||
|
||||
29:50 SHARON: I see. So `blink_test` is a target. These other ones, web test
|
||||
and web platform tests, are actual test suites.
|
||||
|
||||
29:57 STEPHEN: Correct. Yes. That's exactly right.
|
||||
|
||||
30:02 SHARON: OK. All right.
|
||||
|
||||
30:02 STEPHEN: Simple.
|
||||
|
||||
30:02 SHARON: Yeah. So easy. So you mentioned that the web platform tests are
|
||||
cross-browser. But a lot of browsers are based on Chromium. Is it one of the
|
||||
things where it's open source and stuff but majority of people contributing to
|
||||
these and maintaining it are Chrome engineers?
|
||||
|
||||
30:23 STEPHEN: I must admit, I don't know what that stat is nowadays. Back when
|
||||
I was working on interoperability, we did measure this. And it was certainly
|
||||
the case that Chromium is a large project. There were a lot of tests being
|
||||
contributed by Chromium developers. But we also saw historically - I would like
|
||||
to recognize Mozilla, most of all, who were a huge contributor to the web
|
||||
platform test project over the years and are probably the reason that it
|
||||
succeeded. And we also - web platform test also has a fairly healthy community
|
||||
of completely outside developers. So people that just want to come along. And
|
||||
maybe they're not able to or willing to go into a browser, and actually build a
|
||||
browser, and muck with code. But they could write a test for something. They
|
||||
can find a broken behavior and be like, hey, there's a test here, Chrome and
|
||||
Firefox do different things.
|
||||
|
||||
31:08 SHARON: What are examples of the interoperability things that you're
|
||||
testing for in these cross-browser tests?
|
||||
|
||||
31:17 STEPHEN: Oh, wow, that's a big question. I mean, really everything and
|
||||
anything. So on the ref test side, the rendering test, it actually does matter
|
||||
that a web page renders the same in different browsers. And that is very hard
|
||||
to achieve. It's hard to make two completely different engines render some HTML
|
||||
and CSS exactly the same way. But it also matters. We often see bugs where you
|
||||
have a lovely - you've got a lovely website. It's got this beautiful header at
|
||||
the top and some content. And then on one browser, there's a two-pixel gap
|
||||
here, and you can see the background, and it's not a great experience for your
|
||||
users. So ref tests, for example, are used to try and track those down. And
|
||||
then, on the JavaScript side, I mean really, web platform APIs are complicated.
|
||||
They're very powerful. There's a reason they are in the browser and you cannot
|
||||
do them in JavaScript. And that is because they are so powerful. So for
|
||||
example, web USB to talk to USB devices, you can't just do that from
|
||||
JavaScript. But because they're so powerful, because they're so complicated,
|
||||
it's also fairly easy for two browsers to have slightly different behavior. And
|
||||
again, it comes down to what is the web developer's experience. When I try and
|
||||
use the web USB API, for example, am I going to have to write code that's like,
|
||||
if Chrome, call it this way, if Fire - we don't want that. That is what we do
|
||||
not want for the web. And so that's the goal.
|
||||
|
||||
32:46 SHARON: Yeah. What a team effort, making the whole web work is. All
|
||||
right. That's cool. So in your time working on these web platform tests, do you
|
||||
have any fun stories you'd like to share or any fun things that might be
|
||||
interesting to know?
|
||||
|
||||
33:02 STEPHEN: Oh, wow. [LAUGHS] One thing I like to bring up - I'm afraid it's
|
||||
not that fun, but I like to repeat it a lot of times because it's weird and
|
||||
people get tripped up by it - is that inside of Chromium, we don't run web
|
||||
platform tests using the Chrome browser. We run them using content shell. And
|
||||
this is partially historical. That's how layout tests run. We always ran them
|
||||
under content shell. And it's partially for I guess what I will call
|
||||
feasibility. As I talked about earlier, content shell is much simpler than
|
||||
Chrome. And that means that if you want to just run one test, it is faster, it
|
||||
is more stable, it is more reliable I guess I would say, than trying to bring
|
||||
up the behemoth that is Chrome and making sure everything goes correctly. And
|
||||
this often trips people up because in the upstream world of this web platform
|
||||
test project, they run the test using the proper Chrome binary. And so they're
|
||||
different. And different things do happen. Sometimes it's rendering
|
||||
differences. Sometimes it's because web APIs are not always implemented in both
|
||||
Chrome and content shell. So yeah, fun fact.
|
||||
|
||||
34:19 SHARON: Oh, boy. [LAUGHTER]
|
||||
|
||||
34:19 STEPHEN: Oh, yeah.
|
||||
|
||||
34:19 SHARON: And we wonder why flakiness is a problem. Ah. [LAUGHS]
|
||||
|
||||
34:19 STEPHEN: Yeah. It's a really sort of fun but also scary fact that even if
|
||||
we put aside web platform test and we just look at layout test, we don't test
|
||||
what we ship. Layout test running content shell, and then we turn around and
|
||||
we're like, here's a Chrome binary. Like uh, those are different. But, hey, we
|
||||
do the best we can.
|
||||
|
||||
34:43 SHARON: Yeah. We're out here trying our best. So that all sounds very
|
||||
cool. Let's move on to our next type of test, which is performance. You might
|
||||
have heard the term telemetry thrown around. Can you tell us what telemetry is
|
||||
and what these performance tests are?
|
||||
|
||||
34:54 STEPHEN: I mean, I can try. We've certainly gone straight from the thing
|
||||
I know a lot about into the thing I know very little about. But -
|
||||
|
||||
35:05 SHARON: I mean, to Stephen's credit, this is a very hard episode to find
|
||||
one single guest for. People who are working extensively usually in content
|
||||
aren't working a ton in performance or web platform stuff. And there's no one
|
||||
who is - just does testing and does every kind of testing. So we're trying our
|
||||
best. [INAUDIBLE]
|
||||
|
||||
35:24 STEPHEN: Yeah, absolutely. You just need to find someone arrogant enough
|
||||
that he's like, yeah, I'll talk about all of those. I don't need to know the
|
||||
details. It's fine. But yeah, performance test, I mean, the name is self
|
||||
explanatory. These are tests that are trying to ensure the performance of
|
||||
Chromium. And this goes back to the four S's when we first started Chrome as a
|
||||
project - speed, simplicity, security, and I've forgotten the fourth S now.
|
||||
Speed, simplicity, security - OK, let's not reference the four S's then.
|
||||
[LAUGHTER] You have the Comet. You tell me.
|
||||
|
||||
36:01 SHARON: Ah. Oh, I mean, I don't read it every day. Stability. Stability.
|
||||
|
||||
36:08 STEPHEN: Stability. God damn it. Let's literally what the rest of this is
|
||||
about. OK, where were we?
|
||||
|
||||
36:13 SHARON: We're leaving this in, don't worry. [LAUGHTER]
|
||||
|
||||
36:19 STEPHEN: Yeah. So the basic idea of performance test is to test
|
||||
performance because as much as you can view behavior as a correctness thing, in
|
||||
Chromium we also consider performance a correctness thing. It is not a good
|
||||
thing if a change lands and performance regresses. So obviously, testing
|
||||
performance is also hard to do absolutely. There's a lot of noise in any sort
|
||||
of performance testing. An so, we do it essentially heuristically,
|
||||
probabilistically. We run whatever the tests are, which I'll talk about in a
|
||||
second. And then we look at the results and we try and say, hey, OK, is there a
|
||||
statistically significant difference here? And there's actually a whole
|
||||
performance sheriffing rotation to try and track these down. But in terms of,
|
||||
yeah, you mentioned telemetry. That weird word. You're like, what is a
|
||||
telemetry test? Well, telemetry is the name of the framework that Chromium
|
||||
uses. It's part of the wider catapult project, which is all about different
|
||||
performance tools. And none of the names, as far as I know, mean anything.
|
||||
They're just like, hey, catapult, that's a cool name. I'm sure someone will
|
||||
explain to me now the entire history behind the name catapult and why it's
|
||||
absolutely vital. But anyway, so telemetry basically is a framework that when
|
||||
you give it some input, which I'll talk about in a second, it launches a
|
||||
browser, performs some actions on a web page, and records metrics about those
|
||||
actions. So the input, the test essentially, is basically a collection of go to
|
||||
this web page, do these actions, record these metrics. And I believe in
|
||||
telemetry that's called a story, the story of someone visiting a page, I guess,
|
||||
is the idea. One important thing to know is that because it's sort of insane to
|
||||
actually visit real websites, they keep doing things like changing - strange.
|
||||
We actually cache the websites. We download a version of the websites once and
|
||||
actually check that in. And when you go run a telemetry test, it's not running
|
||||
against literally the real Reddit.com or something. It's running against a
|
||||
version we saved at some point.
|
||||
|
||||
38:31 SHARON: And how often - so I haven't really heard of anyone who actually
|
||||
works on this and that we can't - you don't interact with everyone. But how -
|
||||
as new web features get added and things in the browser change, how often are
|
||||
these tests specifically getting updated to reflect that?
|
||||
|
||||
38:44 STEPHEN: I would have to plead some ignorance there. It's certainly also
|
||||
been my experience as a browser engineer who has worked on many web APIs that
|
||||
I've never written a telemetry test myself. I've never seen one added. My
|
||||
understanding is that they are - a lot of the use cases are fairly general with
|
||||
the hope that if you land some performance problematic feature, it will regress
|
||||
on some general test. And then we can be like, oh, you've regressed. Let's
|
||||
figure out why. Let's dig in and debug. But it certainly might be the case if
|
||||
you are working on some feature and you think that it might have performance
|
||||
implications that aren't captured by those tests, there is an entire team that
|
||||
works on the speed of Chromium. I cannot remember their email address right
|
||||
now. But hopefully we will get that and put that somewhere below. But you can
|
||||
certainly reach out to them and be like, hey, I think we should test the
|
||||
performance of this. How do I go about and do that?
|
||||
|
||||
39:41 SHARON: Yeah. That sounds useful. I've definitely gotten bugs filed
|
||||
against me for performance stuff. [LAUGHS] Cool. So that makes sense. Sounds
|
||||
like good stuff. And in talking to some people in preparation for this episode,
|
||||
I had a few people mention Android testing specifically. Not any of the other
|
||||
platforms, just Android. So do you want to tell us why that might be? What are
|
||||
they doing over there that warrants additional mention?
|
||||
|
||||
40:15 STEPHEN: Yeah. I mean, I think probably the answer would just be that
|
||||
Android is such a huge part of our code base. Chrome is a browser, a
|
||||
multi-platform browser, runs on multiple desktop platforms, but it also runs on
|
||||
Android. And it runs on iOS. And so I assume that iOS has its own testing
|
||||
framework. I must admit, I don't know much about that at all. But certainly on
|
||||
Android, we have a significant amount of testing framework built up around it.
|
||||
And so there's the option, the ability for you to test your Java code as well
|
||||
as your C++ code.
|
||||
|
||||
40:44 SHARON: That makes sense. And yeah, with iOS, because they don't use
|
||||
Blink, I guess there's - that reduces the amount of test that they might need
|
||||
to add, whereas on Android they're still using Blink. But there's a lot of
|
||||
differences because it is mobile, so they're just, OK, we actually can test
|
||||
those things. So let's go more general now. At almost every stage, you've
|
||||
mentioned flakiness. So let's briefly run down, what is flakiness in a test?
|
||||
|
||||
41:14 STEPHEN: Yes. So flakiness for a test is just - the definition is just
|
||||
that the test does not consistently produce the same output. When you're
|
||||
talking about flakiness, you actually don't care what the output is. A test
|
||||
that always fails, that's fine. It always fails. But a test that passes 90% of
|
||||
the time and fails 10%, that's not good. That test is not consistent. And it
|
||||
will cause problems.
|
||||
|
||||
41:46 SHARON: What are common causes of this?
|
||||
|
||||
41:46 STEPHEN: I mean, part of the cause is, as I've said, we write a lot of
|
||||
integration tests in Chromium. Whether those are browser tests, or whether
|
||||
those are web tests, we write these massive tests that span huge stacks. And
|
||||
what comes implicitly with that is timing. Timing is almost always the
|
||||
problem - timing and asynchronicity. Whether that is in the same thread or
|
||||
multiple threads, you write your test, you run it on your developer machine,
|
||||
and it works. And you're like, cool, my test works. But what you don't realize
|
||||
is that you're assuming that in some part of the browser, this function ran,
|
||||
then this function run. And that always happens in your developer machine
|
||||
because you have this CPU, and this much memory, and et cetera, et cetera. Then
|
||||
you commit your code, you land your code, and somewhere a bot runs. And that
|
||||
bot is slower than your machine. And on that bot, those two functions run in
|
||||
the opposite order, and something goes horribly wrong.
|
||||
|
||||
42:50 SHARON: What can the typical Chrome engineer writing these tests do in
|
||||
the face of this? What are some practices that you generally should avoid or
|
||||
generally should try to do more often that will keep this from happening in
|
||||
your test?
|
||||
|
||||
43:02 STEPHEN: Yeah. So first of all, write more unit tests, write less browser
|
||||
tests, please. Unit tests are - as I've talked about, they're small. They're
|
||||
compact. They focus just on the class that you're testing. And too often, in my
|
||||
opinion - again, I'm sure we'll get some nice emails stating I'm wrong - but
|
||||
too often, in my opinion people go straight to a browser test. And they bring
|
||||
up a whole browser just to test functionality in their class. This sometimes
|
||||
requires writing your class differently so that it can be tested by a unit
|
||||
test. That's worth doing. Beyond that, though, when you are writing a browser
|
||||
test or a web test, something that is more integration, more end to end, be
|
||||
aware of where timing might be creeping in. So to give an example, in a browser
|
||||
test, you often do things like start by loading some web contents. And then you
|
||||
will try and poke at those web contents. Well, so one thing that people often
|
||||
don't realize is that loading web contents, that's not a synchronous process.
|
||||
Actually knowing when a page is finished loading is slightly difficult. It's
|
||||
quite interesting. And so there are helper functions to try and let you wait
|
||||
for this to happen, sort of event waiters. And you should - unfortunately, the
|
||||
first part is you have to be aware of this, which is just hard to be. But the
|
||||
second part is, once you are aware of where these can creep in, make sure
|
||||
you're waiting for the right events. And make sure that once those events have
|
||||
happened, you are in a state where the next call makes sense.
|
||||
|
||||
44:28 SHARON: That makes sense. You mentioned rewriting your classes so they're
|
||||
more easily testable by a unit test. So what are common things you can do in
|
||||
terms of how you write or structure your classes that make them more testable?
|
||||
And just that seems like a general good software engineering practice to do.
|
||||
|
||||
44:50 STEPHEN: Yeah, absolutely. So one of the biggest ones I think we see in
|
||||
Chromium is to not use singleton accessors to get at state. And what I mean by
|
||||
that is, you'll see a lot of code in Chromium that just goes ahead and threw
|
||||
some mechanism that says, hey, get the current web contents. And as you, I
|
||||
think, you've talked about on this program before, web contents is this massive
|
||||
class with all these methods. And so if you just go ahead and get the current
|
||||
web contents and then go do stuff on that web contents, whatever, when it comes
|
||||
to running a test, well, it's like, hold on. That's trying to fetch a real web
|
||||
contents. But we're writing a unit test. What does that even look like? And so
|
||||
the way around this is to do what we call dependency injection. And I'm sure as
|
||||
I've said that word, a bunch of listeners or viewers have just recoiled in
|
||||
fear. But we don't lean heavily into dependency injection in Chromium. But it
|
||||
is useful for things like this. Instead of saying, go get the web contents,
|
||||
pass a web contents into your class. Make a web contents available as an input.
|
||||
And that means when you create the test, you can use a fake or a mock web
|
||||
contents. We can talk about difference between fakes and mocks as well. And
|
||||
then, instead of having it go do real things in real code, you can just be
|
||||
like, no, no, no. I'm testing my class. When you call it web contents do a
|
||||
thing, just return this value. I don't care about web contents. Someone else is
|
||||
going to test that.
|
||||
|
||||
46:19 SHARON: Something else I've either seen or been told in code review is to
|
||||
add delegates and whatnot.
|
||||
|
||||
46:25 STEPHEN: Mm-hmm.
|
||||
|
||||
46:25 SHARON: Is that a good general strategy for making things more testable?
|
||||
|
||||
46:25 STEPHEN: Yeah. It's similar to the idea of doing dependency injection by
|
||||
passing in your web contents. Instead of passing in your web contents, pass in
|
||||
a class that can provide things. And it's sort of a balance. It's a way to
|
||||
balance, if you have a lot of dependencies, do you really want to add 25
|
||||
different inputs to your class? Probably not. But you define a delegate
|
||||
interface, and then you can mock out that delegate. You pass in that one
|
||||
delegate, and then when delegate dot get web content is called, you can mock
|
||||
that out. So very much the same goal, another way to do it.
|
||||
|
||||
47:04 SHARON: That sounds good. Yeah, I think in general, in terms of Chrome
|
||||
specifically, a lot of these testing best practices, making things testable,
|
||||
these aren't Chrome-specific. These are general software engineering-specific,
|
||||
C++-specific, and those you can look more into separately. Here we're mostly
|
||||
talking about what are the Chrome things. Right?
|
||||
|
||||
47:24 STEPHEN: Yeah.
|
||||
|
||||
47:24 SHARON: Things that you can't just find as easily on Stack Overflow and
|
||||
such. So you mentioned fakes and mocks just now. Do you want to tell us a bit
|
||||
about the difference there?
|
||||
|
||||
47:32 STEPHEN: I certainly can do it. Though I want to caveat that you can also
|
||||
just go look up those on Stack Overflow. But yeah. So just to go briefly into
|
||||
it, there is - in testing you'll often see the concept of a fake version of a
|
||||
class and also a mock version of a class. And the difference is just that a
|
||||
fake version of the class is a, what I'm going to call a real class that you
|
||||
write in C++. And you will probably write some code to be like, hey, when it
|
||||
calls this function, maybe you keep some state internally. But you're not using
|
||||
the real web contents, for example. You're using a fake. A mock is actually a
|
||||
thing out of the Google test support library. It's part of a - Google mock is
|
||||
the name of the sub-library, I guess, the sub-framework that provides this. And
|
||||
it is basically a bunch of magic that makes that fake stuff happen
|
||||
automatically. So you can basically say, hey, instead of a web contents, just
|
||||
mock that web contents out. And the nice part about mock is, you don't have to
|
||||
define behavior for any method you don't care about. So if there are, as we've
|
||||
discussed, 100 methods inside web contents, you don't have to implement them
|
||||
all. You can be like, OK, I only care about the do Foobar method. When that is
|
||||
called, do this.
|
||||
|
||||
48:51 SHARON: Makes sense. One last type of test, which we don't hear about
|
||||
that often in Chrome but does exist quite a bit in other areas, is manual
|
||||
testing. So do we actually have manual testing in Chrome? And if so, how does
|
||||
that work?
|
||||
|
||||
49:03 STEPHEN: Yeah, we actually do. We're slightly crossing the boundary here
|
||||
from the open Chromium into the product that is Google Chrome. But we do have
|
||||
manual tests. And they are useful. They are a thing. Most often, you will see
|
||||
this in two cases as a Chrome engineer. You basically work with the test team.
|
||||
As I said, all a little bit internal now. But you work with the test team to
|
||||
define a set of test cases for your feature. And these are almost always
|
||||
end-to-end tests. So go to this website, click on this button, you should see
|
||||
this flow, this should happen, et cetera. And sometimes we run these just as
|
||||
part of the launch process. So when you're first launching a new feature, you
|
||||
can be like, hey, I would love for some people to basically go through this and
|
||||
smoke test it, make sure that everything is correct. Some things we test every
|
||||
release. They're so important that we need to have them tested. We need to be
|
||||
sure they work. But obviously, all of the caveats about manual testing out
|
||||
there in the real world, they apply equally to Chromium or to Chrome. Manual
|
||||
testing is slow. It's expensive. We require people - specialized people that we
|
||||
have to pay and that they have to sit there, and click on things, and that sort
|
||||
of thing, and file bugs when it doesn't work. So wherever possible, please do
|
||||
not write manual tests. Please write automated testing. Test your code, please.
|
||||
But then, yeah, it can be used.
|
||||
|
||||
50:33 SHARON: In my limited experience working on Chrome, the only place that
|
||||
I've seen there actually be any level of dependency on manual test has been in
|
||||
accessibility stuff -
|
||||
|
||||
50:38 STEPHEN: Yeah.
|
||||
|
||||
50:38 SHARON: which kind of makes sense. A lot of that stuff is not
|
||||
necessarily - it is stuff that you would want to have a person check because,
|
||||
sure, we can think that the speaker is saying this, but we should make sure
|
||||
that that's the case.
|
||||
|
||||
50:57 STEPHEN: Exactly. I mean, that's really where manual test shines, where
|
||||
we can't integration test accessibility because you can't test the screen
|
||||
reader device or the speaker device. Whatever you're using, we can't test that
|
||||
part. So yes, you have to then have a manual test team that checks that things
|
||||
are actually working.
|
||||
|
||||
51:19 SHARON: That's about all of our written down points to cover. Do you have
|
||||
any general thoughts, things that you think people should know about tests,
|
||||
things that people maybe ask you about tests quite frequently, anything else
|
||||
you'd like to share with our lovely listeners?
|
||||
|
||||
51:30 STEPHEN: I mean, I think I've covered most of them. Please write tests.
|
||||
Write tests not just for code you're adding but for code you're modifying, for
|
||||
code that you wander into a directory and you say, how could this possibly
|
||||
work? Go write a test for it. Figure out how it could work or how it couldn't
|
||||
work. Writing tests is good.
|
||||
|
||||
51:50 SHARON: All right. And we like to shout-out a Slack channel of interest.
|
||||
Which one would be the - which one or ones would be a good Slack channel to
|
||||
post in if you have questions or want to get more into testing?
|
||||
|
||||
52:03 STEPHEN: Yeah. It's a great question. I mean, I always like to - I think
|
||||
it's been called out before, but the hashtag #halp channel is very useful for
|
||||
getting help in general. There is a hashtag #wpt channel. If you want to go ask
|
||||
about web platform tests, that's there. There's probably a hashtag #testing.
|
||||
But I'm going to admit, I'm not in it, so I don't know.
|
||||
|
||||
52:27 SHARON: Somewhat related is there's a hashtag #debugging channel.
|
||||
|
||||
52:27 STEPHEN: Oh.
|
||||
|
||||
52:27 SHARON: So if you want to learn about how to actually do debugging and
|
||||
not just do log print debugging.
|
||||
|
||||
52:34 STEPHEN: Oh, I was about to say, do you mean by printf'ing everywhere in
|
||||
your code?
|
||||
|
||||
52:41 SHARON: [LAUGHS] So there are a certain few people who like to do things
|
||||
in an actual debugger or enjoy doing that. And for a test, that can be a useful
|
||||
thing too - a tool to have. So that also might be something of interest. All
|
||||
right, yeah. And kind of generally, as you mentioned a lot of things are your
|
||||
opinion. And it seems like we currently don't have a style guide for tests or
|
||||
best practices kind of thing. So how can we -
|
||||
|
||||
53:13 STEPHEN: [LAUGHS] How can we get there? How do we achieve that?
|
||||
|
||||
53:19 SHARON: How do we get one?
|
||||
|
||||
53:19 STEPHEN: Yeah.
|
||||
|
||||
53:19 SHARON: How do we make that happen?
|
||||
|
||||
53:19 STEPHEN: It's a hard question. We do - there is documentation for
|
||||
testing, but it's everywhere. I think there's `/docs/testing`, which has some
|
||||
general information. But so often, there's just random READMEs around the code
|
||||
base that are like, oh, hey, here's the content public test API surface. Here's
|
||||
a bunch of useful information you might want to know. I hope you knew to look
|
||||
in this location. Yeah, it's a good question. Should we have some sort of
|
||||
process for - like you said, like a style guide but for testing? Yeah, I don't
|
||||
know. Maybe we should enforce that people dependency inject their code.
|
||||
|
||||
54:04 SHARON: Yeah. Well, if any aspiring test nerds want to really get into
|
||||
it, let me know. I have people who are also interested in this and maybe can
|
||||
give you some tips to get started. But yeah, this is a hard problem and
|
||||
especially with so many types of tests everywhere. I mean, even just getting
|
||||
one for each type of test would be useful, let alone all of them together. So
|
||||
anyway - well, that takes us to the end of our testing episode. Thank you very
|
||||
much for being here, Stephen. I think this was very useful. I learned some
|
||||
stuff. So that's cool. So hopefully other people did too. And, yeah, thanks for
|
||||
sitting and answering all these questions.
|
||||
|
||||
54:45 STEPHEN: Yeah, absolutely. I mean, I learned some things too. And
|
||||
hopefully we don't have too many angry emails in our inbox now.
|
||||
|
||||
54:52 SHARON: Well, there is no email list, so people can't email in if they
|
||||
have issues. [LAUGHTER]
|
||||
|
||||
54:58 STEPHEN: If you have opinions, keep them to yourself -
|
||||
|
||||
54:58 SHARON: Yeah. [INAUDIBLE]
|
||||
|
||||
54:58 STEPHEN: until Sharon invites you on her show.
|
||||
|
||||
55:05 SHARON: Yeah, exactly. Yeah. Get on the show, and then you can air your
|
||||
grievances at that point. [LAUGHS] All right. Thank you.
|
923
docs/transcripts/wuwt-e05-build-gn.md
Normal file
923
docs/transcripts/wuwt-e05-build-gn.md
Normal file
@ -0,0 +1,923 @@
|
||||
# What’s Up With BUILD.gn
|
||||
|
||||
This is a transcript of [What's Up With
|
||||
That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)
|
||||
Episode 5, a 2023 video discussion between [Sharon (yangsharon@chromium.org)
|
||||
and Nico (thakis@chromium.org)](https://www.youtube.com/watch?v=NcvJG3MqquQ).
|
||||
|
||||
The transcript was automatically generated by speech-to-text software. It may
|
||||
contain minor errors.
|
||||
|
||||
---
|
||||
|
||||
Building Chrome is an integral part of being a Chrome engineer. What actually
|
||||
happens when you build Chrome, and what exactly happens when you run those
|
||||
build commands? Today, we have Nico, who was responsible for making Ninja the
|
||||
Chrome default build system, to tell us more.
|
||||
|
||||
Notes:
|
||||
- https://docs.google.com/document/d/1iDFqA3cZAUo0TUFA69cu5wEKL4HjSoIGfcoLIrH3v4M/edit
|
||||
|
||||
---
|
||||
|
||||
00:00 SHARON: Hello, and welcome to "What's Up With That," the series that
|
||||
demystifies all things Chrome. I'm your host, Sharon, and today, we're talking
|
||||
about building Chrome. How do you go from a bunch of files on your computer to
|
||||
running a browser? What are all the steps involved? Our special guest today is
|
||||
Nico. He's responsible for making Ninja, the Chrome default build system, and
|
||||
he's worked on Clang and all sorts of areas of the Chrome build. If you don't
|
||||
know what some of those things are, don't worry. We'll get into it. Welcome,
|
||||
Nico.
|
||||
|
||||
00:29 NICO: Hello, Sharon, and hello, internet.
|
||||
|
||||
00:29 SHARON: Hello. We have lots to cover, so let's get right into it. If I
|
||||
want to build Chrome at a really quick overview, what are all the steps that I
|
||||
need to do?
|
||||
|
||||
00:41 NICO: It's very easy. First, you download `depot_tools` and add that to
|
||||
your path. Then you run fetch Chromium. Then you type `cd source`, run `gclient
|
||||
sync`, `gn gen out/GN`, and `ninja -C out/GN chrome`. And that's it.
|
||||
|
||||
00:53 SHARON: Wow. Sounds so easy. All right. We can wrap that up. See you guys
|
||||
next time. OK. All right. Let's take it from the start, then, and go over in
|
||||
more detail what some of those things are. So the first thing you mentioned is
|
||||
`depot_tools`. What is that?
|
||||
|
||||
01:11 NICO: `depot_tools` is just a collection of random utilities for - like,
|
||||
back in the day, for managing subversion repositories, nowadays for pulling
|
||||
things from git. It contains Ninja and GN. Just adds a bunch of stuff to your
|
||||
path that you need for working on Chrome.
|
||||
|
||||
01:25 SHARON: OK. Is this a Chrome-specific thing, or is this used elsewhere,
|
||||
too?
|
||||
|
||||
01:33 NICO: In theory, it's fairly flexible. In practice, I think it's mostly
|
||||
used by Chromium projects.
|
||||
|
||||
01:39 SHARON: OK, all right. And there, you mentioned Ninja and GN. And for
|
||||
people - I think most people who are watching this have built Chrome at some
|
||||
point. But what is the difference between Ninja and GN? Because you have your
|
||||
build files, which are generally called Build.gn, and then you run a command
|
||||
that has Ninja in it. So are those the same thing? Are those related?
|
||||
|
||||
01:57 NICO: Yes. So GN is short for Generate Ninja. So Ninja is a build system.
|
||||
It's similar to Make. It basically gets a list of source files and a list of
|
||||
build outputs. And then when you run Ninja, Ninja figures out which build steps
|
||||
do I have to run, and then it runs them. So it's kind of like Make but simpler
|
||||
and faster. And then GN - and Ninja doesn't have any conditionals or anything,
|
||||
so GN is - just a built - it describes the build. And then it generates Ninja
|
||||
files.
|
||||
|
||||
02:34 SHARON: OK.
|
||||
|
||||
02:34 NICO: So if you want to do, like, add these files only if you're building
|
||||
for Windows, this is something you can do, say, in GN. But then it only
|
||||
generates a Windows-specific Ninja file.
|
||||
|
||||
02:46 SHARON: All right. And in terms of when you mention OS, so there's a
|
||||
couple places that you can specify different arguments for how you build
|
||||
Chrome. So you have your gclient sync - sorry, your gclient file, and then you
|
||||
have a separate args.gn. And in both of these places, you can specify different
|
||||
arguments. And for example, the operating system you use - that can be
|
||||
specified in both places. There's an OS option in both. So what is the purpose
|
||||
of the gclient file, and what is the purpose of the args.gn file?
|
||||
|
||||
03:25 NICO: Yes. So gclient reads the steps file that is at the root of the
|
||||
directory, and the DEPS file basically specifies dependencies that Chrome pulls
|
||||
in. It's kind of similar to git submodules, but it predates git, so we don't
|
||||
use git submodules also for other reasons. And so if you run gclient sync, that
|
||||
reads the DEPS file, the Chrome root, and that downloads a couple hundred
|
||||
repositories that Chrome depends on. And then it executes a bunch of so-called
|
||||
hooks, which are just Python scripts, which also download a bunch of more
|
||||
stuff. And the hooks and the dependencies are operating system dependent, so
|
||||
gclient needs to know the operating system. But the build also needs to know
|
||||
the operating system. And GN args are basic things that are needed for the
|
||||
builds. So the OS is something that's needed in both places, but many GN args
|
||||
gclient doesn't need to know about. For example, if you enable DCHECKs, like
|
||||
Peter discussed a few episodes ago, that's a GN-only thing.
|
||||
|
||||
04:26 SHARON: All right. That sounds good. So let's see. When you actually -
|
||||
OK. So when you run Chrome and you - say you build Chrome, right? A typical
|
||||
example of a command to do that would be, say, `autoninja -C out/default
|
||||
content`, right? And let's just go through each part of that and say what each
|
||||
of those things is doing and what happens there. Because I think that's just an
|
||||
example from one of the starter docs. That's just the copy and paste command
|
||||
that they give you. So autoninja seems like it's based on Ninja. What is the
|
||||
auto they're doing for us?
|
||||
|
||||
05:15 NICO: Yeah. So autoninja is also one of the things that's just
|
||||
`depot_tools`. It's a very - or it used to be a very thin wraparound Ninja. Now
|
||||
it's maybe a little thicker, but it's optional. You don't have to use autoninja
|
||||
if you don't want to. But what it does is basically - like, it helps - So
|
||||
Chrome contains a lot of code. So we have this system called Goma, which can
|
||||
run all the C++ compilations in a remote data center. And if you do use the
|
||||
system, then you want to build with a very high build parallelism. You want to,
|
||||
say, `-j 1000` or what and run, like, a thousand bit processes in parallel. But
|
||||
if you're building locally, you don't want to do that. So what autoninja
|
||||
basically does - it looks at your args.gn file, sees if you have enabled Goma,
|
||||
and if so, it runs Ninja with many processes, and else, it runs it with just
|
||||
one process per core, or something like that. So that's originally all that
|
||||
autoninja does. Nowadays, I think it also uploads a bunch of stuff. But you can
|
||||
just run which autoninja, and that prints some path, and you can just open that
|
||||
in the editor and read it. I think it's still short enough to fairly quickly
|
||||
figure out what it does.
|
||||
|
||||
06:17 SHARON: OK. What does `-C` do? Because I think I've been using that this
|
||||
whole time because I copied and pasted it from somewhere, and I've just always
|
||||
had it.
|
||||
|
||||
06:28 NICO: It says - it changes the current directory where Ninja runs, like
|
||||
in Make. So it basically says, change the current directory to out/GN, or
|
||||
whatever you build directory is, and then run the build from there. So for
|
||||
Chrome, the build always - the current directory during the build is always the
|
||||
build directory. And then Ninja looks for a file called build.ninja in the
|
||||
current directory, so GN writes build.ninja to out/GN, or whatever you build
|
||||
directory is. And then Ninja finds it there and reads it and does its thing.
|
||||
|
||||
06:57 SHARON: All right. So the next part of this would be out/default, or out
|
||||
slash something else. So what are out directories, and how do we make use of
|
||||
them?
|
||||
|
||||
07:11 NICO: An out directory - it's just a build directory. That's where all
|
||||
the build artifacts go to, all the generated objects files, executables, random
|
||||
things that are generated during the build. So it can be any directory, really.
|
||||
You can make up any directory name that you like. You can build your Chrome in,
|
||||
I don't know, fluffy/kitten, or whatever. But I think most people use out just
|
||||
because it's in the global `.gitignore` file already. Then you want to use
|
||||
something that's two directories deep so that the path from the directory to
|
||||
the source is always `../..`. And that makes sure that this is deterministic.
|
||||
We try to have a so-called deterministic build, where you get exactly the same
|
||||
binary when you build Chrome at the same revision, independent of the host
|
||||
machine, more or less. And the path from the build directory to the source file
|
||||
is something that goes into debug info. So if you want to have the same build
|
||||
output as everyone else, you want a build directory path that's two directories
|
||||
deep. And the names of those two directories doesn't really matter. So what
|
||||
some people do is they use out/debug for the debug builds and out/release for
|
||||
their release builds. But it's really up to you.
|
||||
|
||||
08:26 SHARON: Right. Other common ones are, like, yeah. ASan is a common one,
|
||||
different -
|
||||
|
||||
08:33 NICO: Right.
|
||||
|
||||
08:33 SHARON: OSes. Right. So you mentioned having a deterministic build. And
|
||||
assuming you're on the same version of Chrome, at the same checkout,
|
||||
tip-of-tree, or whatever as someone else, I would have expected that all of the
|
||||
builds are just deterministic, but maybe that's because of work that people
|
||||
like you and the build team have done. But what are things that could cause
|
||||
that to be nondeterministic? Because you have all the same files. Where is the
|
||||
actual nondeterminism coming from? Or is it just different configurations and
|
||||
setups you have on your local machine?
|
||||
|
||||
09:09 NICO: Yeah, that's a great question. I always thought this would be very
|
||||
easy to - but turns out it mostly isn't. We wrote a very long blog post that we
|
||||
can link to it from the show notes about this. But there's many things that can
|
||||
go wrong. Like for example, in C++, there's the preprocessor macro `__DATE__`,
|
||||
which embeds the current date into the build output. So if you do that, then
|
||||
you're time dependent already. By default, I think you end up with absolute
|
||||
paths to everything in debug information. So if you build under
|
||||
`/home/sharon/blah`, then that's already different from all the people who are
|
||||
not called Sharon. Then there's - we run tools as part of the build that
|
||||
produce output. For example, the protobuf compiler or whatnot. And so if that
|
||||
binary iterates over some map, some hash map, and that doesn't have
|
||||
deterministic iteration order, then the output might be different. And there's
|
||||
a long, long, long, long, long list of things. Making the build deterministic
|
||||
was a very big project, and there's still a few open things.
|
||||
|
||||
10:08 SHARON: OK, cool. So I guess it's - yeah, it's not true nondeterminism,
|
||||
maybe, but there's enough factors that go into it that to a typical person
|
||||
interacting with it, it does seem -
|
||||
|
||||
10:21 NICO: Yeah, but there's also true nondeterminism. Like, every now and
|
||||
then, when we update the compiler, the compiler will write different object
|
||||
files on every run just because the compiler internally iterates about some -
|
||||
over some hash map. And then we have to complain upstream, and then they fix
|
||||
it.
|
||||
|
||||
10:34 SHARON: OK. Oh, wow. OK. That's very cool. Well, thank you for dealing
|
||||
with this kind of stuff so people like us don't have to worry about it. OK. And
|
||||
the last part of our typical build thing is content. So what is content in this
|
||||
context? If you want to learn about content more in general, check out
|
||||
episode 3. But in this case, what does that mean?
|
||||
|
||||
10:58 NICO: So just a build target. So I think people - at least I usually
|
||||
build some executable. I usually build, I don't know, `base_unittests` or
|
||||
`unit_tests` or Chrome or content shell or what. And it's just - so in the
|
||||
Ninja files, there's basically - there's many, many lines that go, if you want
|
||||
to build this file, you need to have these inputs and then run this command. If
|
||||
you want to build this file, instead, you need these other files. You need to
|
||||
run this other command. So for example, if you want to build `base_unittests`,
|
||||
you need a couple thousand object files, and then you need to run the linkers,
|
||||
what's in there. And so if you tell Ninja - the last thing you give it -
|
||||
basically, it tells Ninja, what do you want to build? So if you say, `ninja -C
|
||||
out/GN content_shell` or what, then Ninja is like, let's look at the line that
|
||||
says `content_shell`. And then it checks - I need these files, so it builds all
|
||||
the prerequisites, which usually means compiling a whole bunch of files. And
|
||||
then it runs the final command and runs the linker. So Ninja basically decides
|
||||
what it needs to do and then invokes other commands to do the actual work.
|
||||
|
||||
12:08 SHARON: OK, makes sense. So say I run the build - so say I built the
|
||||
target Chrome, which is the one that actually is an executable, and that's
|
||||
what - if you run that, the browser is built from it. So say I've built the
|
||||
Chrome build target. How do I run that now?
|
||||
|
||||
12:31 NICO: Well, it's written - so normally, the thing you give to Ninja is
|
||||
actually a file name. And the `-C` change current directory. So if you say, `-C
|
||||
out/release chrome`, then this creates the file `out/release/chrome`. It just
|
||||
creates that file in the out directory. So to run that, you just run
|
||||
`out/release/chrome`, and hopefully it'll start up and work.
|
||||
|
||||
12:54 SHARON: Great. Sounds so easy. So you mentioned earlier something called
|
||||
Goma, which had remote data centers and stuff. Is this something that's
|
||||
available to people who don't work at Google, or is this one of the
|
||||
Google-specific things? Because I think so far, everything mentioned is anyone,
|
||||
anywhere can do all this. Is that the case with Goma, also?
|
||||
|
||||
13:14 NICO: Yeah. For the other things - so Ninja is actually something that
|
||||
started in Chrome land, but that's been fairly widely adopted across the world.
|
||||
Like, that's used by many projects. But yeah, Goma - I think it's kind of like
|
||||
distcc. Like, it's a distributed compiler thing. I think the source code for
|
||||
both the client and the server are open source. And we can link to that. But
|
||||
the access to the service, I think, isn't public. So they have to work at
|
||||
Google or at a partner company. I think we hand out access to a few partners.
|
||||
And as far as I know, there's a few independent implementations of the
|
||||
protocol, so other people also use something like Goma. But as far as I know,
|
||||
these other services also aren't public.
|
||||
|
||||
13:53 SHARON: OK. Right. Yeah, because I think one of the main things is - I
|
||||
mean, as someone who did an internship on Chrome, after, I was like, I'll
|
||||
finish some of these remaining to do items once I go back to school, right? And
|
||||
then I started to build Chrome on my laptop, just a decent laptop, but still a
|
||||
laptop, and I was like, no, I guess I won't be doing that.
|
||||
|
||||
14:17 NICO: No, it's doable. You just need to be patient and strategic. Like, I
|
||||
used to do that every now and then. You have to start the build at night, and
|
||||
then when you get up, it's done. And if you only change one or two CC files,
|
||||
it's reasonably fast. It's just, full builds take a very long time.
|
||||
|
||||
14:29 SHARON: Yeah, well, yeah. There was enough stuff going on that I was
|
||||
like, OK. We maybe won't do this. Right. Going back to another thing you
|
||||
mentioned is the compiler and Clang. So can you tell us a bit more about Clang
|
||||
and how compiling fits into the build process?
|
||||
|
||||
14:50 NICO: Yeah, sure. I mean, compiling just means - almost all of Chrome
|
||||
currently is written in C++, and compiling just means taking a CC file, like a
|
||||
C++ file, and turning it into - turning that into an object file. And there are
|
||||
a whole bunch of C++ compilers. And back in the day, we used to use many, many
|
||||
different C++ compilers, and they're all slightly different, so that was a
|
||||
little bit painful. And then the C++ language started changing more frequently,
|
||||
like with C++ 11, 14, 17, 20, and so on. And so that was a huge drain on
|
||||
productivity. Updating compilers was always a year-long project, and we had to
|
||||
update, like, seven different compilers, one on Android, iOS, Windows, macOS,
|
||||
Android, Fuchsia, whatnot. So over time, we moved to - we moved from using
|
||||
basically the system compiler to using a hermetically built Clang that we
|
||||
download as a gclient DEPS hook. So when you run gclient sync, that downloads a
|
||||
prebuilt Clang binary. And we use that Clang binary to build Chrome on all
|
||||
operating systems. So if one file builds for you on your OS, then chances are
|
||||
it'll build on all the other OSes because it's built by the same compiler. And
|
||||
that also enables things like cross builds, so you can build Chrome for Windows
|
||||
on Linux if you want to because your compiler is right there.
|
||||
|
||||
16:11 SHARON: Oh, cool. All right. I didn't know that. Is there any reason,
|
||||
historically, that Clang beat out these other compilers as the compiler of
|
||||
choice?
|
||||
|
||||
16:24 NICO: Yes. So it's basically - I think when we looked at this - so Clang
|
||||
is basically the native compiler on macOS and iOS, and GCC is kind of the
|
||||
system compiler on Linux, I suppose. But Clang has always had very good GCC
|
||||
compatibility. And then on Windows, the default choice is Visual Studio. And we
|
||||
still want to link against the normal Microsoft library, so we need a compiler
|
||||
that's ABI-compatible with the Microsoft ABI. And GCC couldn't do that. And
|
||||
Clang also couldn't do that, but we thought if we teach Clang to do that, then
|
||||
Clang basically can target all the operating systems we care about. And so we
|
||||
made Clang work on Windows, also with others. But there was a team funded by
|
||||
Chrome that worked on that for a few years. And also, Clang has pretty good
|
||||
tooling interface. So for code search, we also use Clang. So we now use the
|
||||
same code to compile Chrome and to index Chrome for code search.
|
||||
|
||||
17:28 SHARON: Oh, cool. I didn't know that either, so very interesting. OK.
|
||||
We're just going to keep going back. And as you mention more things, we'll
|
||||
cover that, and then go back to something you previously mentioned. So next on
|
||||
the list is gclient sync. So I think for everyone who's ever worked on Chrome,
|
||||
ever, especially at the start, you're like, I'll build Chrome. You build your
|
||||
target, and you get these weird errors. And you look at it, and you think, oh,
|
||||
this isn't some random weird spot that I definitely didn't change. What's going
|
||||
on? And you ask a senior team member, and they say to you, did you run gclient
|
||||
sync? And you're like, oh, I did not. And then you run it, and suddenly, things
|
||||
are OK. So what else is going - you mentioned a couple of things that happen.
|
||||
So what exactly does gclient sync do?
|
||||
|
||||
18:13 NICO: Yeah. So as I - that's this file at the source root called DEPS,
|
||||
D-E-P-S, all capital letters. And when you update - if you git pull the Chrome
|
||||
repository, then that also updates the DEPS file. And then this DEPS file
|
||||
contains a long list of revisions of dependent projects. And then when you run
|
||||
gclient sync, it basically syncs all these other git repositories that are
|
||||
mentioned in the DEPS file. And after that, it runs so-called hooks, which like
|
||||
do things download a new Clang compiler and download a bunch of other binaries
|
||||
from something called the CIPD, for example, GN. But yeah, basically makes sure
|
||||
that all the dependencies that are in Chrome but that aren't in the Chrome
|
||||
repository are also up to date. That's what it does.
|
||||
|
||||
19:06 SHARON: OK. Do you have a rough ballpark guess of how many dependencies
|
||||
that includes?
|
||||
|
||||
19:13 NICO: Its operating system dependent. I think on Android we have way
|
||||
more, but it's on the order of 200. Like, 150 to 250.
|
||||
|
||||
19:25 SHARON: Sounds like a lot. Very cool. OK. In terms of - speaking of other
|
||||
dependencies, one of the top-level directories in Chrome is `//third_party`,
|
||||
and that seems in the same kind of direction. So how does stuff in
|
||||
`//third_party` work in terms of building? Can you just build them as targets?
|
||||
What kind of stuff is in there? What can you and can you not build? Like, for
|
||||
example, Blink is one of the things in `//third_party`, and lots of people -
|
||||
that's a big part of it, right? But a lot of things in there are less active
|
||||
and probably less big of a part of Chrome. So does `//third_party` just build
|
||||
anything else, or what's different about it?
|
||||
|
||||
20:09 NICO: And that's a great question. So Blink being in `//third_party` is a
|
||||
bit of a historical artifact. Like, most things - almost all of the things in
|
||||
`//third_party` is basically code that's third-party code. That's code that we
|
||||
didn't write ourselves. And Chrome's secret mission is to depend on every other
|
||||
library out there in the world. No, we depend on things like libpng for reading
|
||||
PNG files, libjpeg for reading all of - libjpeg-turbo these days, I guess, for
|
||||
reading JPEG files, libxml for reading XML, and so on. And, well, that's many
|
||||
dependencies. I won't list them all. And some of these third-party dependencies
|
||||
are just listed in the DEPS file that we talked about. And so they basically -
|
||||
like, when gclient sync runs, it pulls the code from some git repository that
|
||||
contains the third-party code and puts it into your source tree. And for other
|
||||
third-party code, we actually check in the code into the Chrome main repository
|
||||
instead of DEPSing it in. There are trade-offs, which approach to choose. We do
|
||||
both from time to time. But yeah. Almost no third-party dependency has a GN
|
||||
file upstream, so usually what you do is you have to write your own BUILD.gn
|
||||
file for the third-party dependency you want to add. And then after that, it's
|
||||
fairly normal. So for a library, if you want to add a dependency on libfoo,
|
||||
usually what we do is you add - you create third-party libfoo, and you put
|
||||
BUILD.gn in there. And then you add a DEPS entry that syncs the actual code to
|
||||
a third-party libfoo source or something. Yes.
|
||||
|
||||
21:37 SHARON: All right. Sounds good. Again, you mentioned BUILD.gn files, and
|
||||
that's, as, expected a big part of how building works. And that's probably the
|
||||
part that most people have interacted more with, outside of just actually
|
||||
running whatever command it is to build Chrome. Because if you create, delete,
|
||||
rename any files, you have to update it in some BUILD.gn file. So can you walk
|
||||
us through the different things contained in a BUILD.gn file? What are all the
|
||||
different parts?
|
||||
|
||||
22:12 NICO: Sure. So there's a great presentation by Brett, who wrote GN, that
|
||||
we can link to. But in a summary, it's - BUILD.gn contains build targets, and
|
||||
the build target normally is like - it doesn't have to be, but usually, it's a
|
||||
list of CC files that belong together and that either make up a static library
|
||||
or a shared library on executable. So those are the main target types for CC
|
||||
code. But then you can also have custom build actions that run just arbitrary
|
||||
Python code, which, for example, if you compile a protobuf - proto files into
|
||||
CC and H - into C++ and header files, then we basically have a Python script
|
||||
that runs protoc, the proto compiler, to produce those. And so in that case,
|
||||
the action generates C++ files, and then those get built. But the other, simple
|
||||
answer is libraries or executables.
|
||||
|
||||
23:11 SHARON: OK. One part of GN files that has caused me personally some
|
||||
confusion and difficulty - which I think is maybe, depending on the part of
|
||||
Chrome you work on, less of an issue - is DEPS. So you have DEPS in your GN
|
||||
files, and there's also something called external DEPS. And then you have
|
||||
separate DEPS files that are just called capital D-E-P-S.
|
||||
|
||||
23:30 NICO: Yes. Yes, there, that's some redundant - that's, again, I guess for
|
||||
historical reasons. So in gclient, DEPS just means to build this target, you
|
||||
first have to build these other targets. Like, this target depends - uses this
|
||||
other code. And in different contexts, it kind of means different things. So
|
||||
for example - I think if an executable depends on some other target, then that
|
||||
external executable is linked - that other target is also linked in. If base
|
||||
unit test depends on the base library, which in a normal build is a static
|
||||
library - like in a normal build? Like in a release build, by default, it's a
|
||||
static library. And so if base unit test is built, it first creates a static
|
||||
library and then links to it. And then base itself might depend on a bunch of
|
||||
third-party things, libraries, which means when base unit tests is linked, it
|
||||
links base, but then it also links against basis dependencies. So that's one
|
||||
meaning of DEPS. Another meaning, like these capital DEPS files, that's
|
||||
completely distinct. Has nothing to do with GN, I'm sad to say. And that's just
|
||||
for enforcing layering. Those predate GN, and they are for enforcing layering
|
||||
at a fairly coarse level. They say, code in this directory may include code
|
||||
from this other directory but not from this third directory. For example, a
|
||||
third - like, Blink must not - may include stuff from base, but must not
|
||||
include anything from, I don't know, the Chrome layer or something.
|
||||
|
||||
25:18 SHARON: Right, the classic content Chrome layering, where Chrome -
|
||||
|
||||
25:18 NICO: Right. And I think -
|
||||
|
||||
25:18 SHARON: content, but -
|
||||
|
||||
25:18 NICO: Right. And there's a step called check-deps, and that checks the
|
||||
capital DEPS files.
|
||||
|
||||
25:24 SHARON: OK. Yeah, because before, I worked on some Fuchsia stuff, and
|
||||
because we're adding a lot of new things, you're messing around with different
|
||||
DEPS and stuff a lot more than I think if you worked in a typical part. Like,
|
||||
now, I mostly just work within content. Unlikely that you're changing any
|
||||
dependencies. But that was always a bit unclear because, for the most part, the
|
||||
targets have very similar names - not exactly the same, but very similar. And
|
||||
if you miss one, you get all these weird errors. And it was, yeah, generally
|
||||
quite confusing.
|
||||
|
||||
25:55 NICO: Yeah, that's pretty confusing. One thing of the capital DEPS things
|
||||
that they can do that the GN DEPS can't is if someone adds a DEPS on your
|
||||
library and they add an entry to their DEPS file, that means that now at code
|
||||
review time, you need to approve that they depend on you. And that's not
|
||||
something we can do at the GN level. And the advantage there is, I don't know,
|
||||
if you have some library and then 50 teams start depending on it without
|
||||
telling you, and now you're on the hook for keeping all these 50 things
|
||||
working, then with this system, you at least have to approve every time someone
|
||||
adds a dependency on you, you have to say, this is fine with me. Or you can
|
||||
say, actually, this is - we don't want this to be used by anyone else.
|
||||
|
||||
26:45 SHARON: Is there an ideal state where we don't have these DEPS files and
|
||||
maybe that functionality is built into the BUILD.gn files, or is this something
|
||||
that's probably going to be sticking around for a while?
|
||||
|
||||
26:52 NICO: That's a great question. I don't know. It seems weird, right? It's
|
||||
redundant. So I think the current system isn't ideal, but it's also not
|
||||
horrible enough that we have to fix it immediately. So maybe one day we'll get
|
||||
around to it.
|
||||
|
||||
27:10 SHARON: Yeah. I think I've mostly just worked on Chrome, so I've gotten
|
||||
pretty used to it. But a common complaint is people who work in Google internal
|
||||
things or other, bigger - the main build system of whatever company they work
|
||||
on, they come to Chrome and they're like, oh, everything's so confusing. But if
|
||||
you - you just got to get used to it, but -
|
||||
|
||||
27:27 NICO: Right. I think if you're confused by anything, it's great if you
|
||||
come to us and complain. Because you kind of become blind to these problems,
|
||||
right? I've been doing this for a long time. I'm used to all the foot guns. I
|
||||
know how to dodge them. And yeah. So if you're confused by anything, please
|
||||
tell me personally. And then if enough people complain about something, maybe
|
||||
we'll fix it.
|
||||
|
||||
27:55 SHARON: All right. Yeah. That's what you said. The outcome of that -
|
||||
we'll see. We'll see how that goes. We'll see how many complaints you suddenly
|
||||
get. Right. OK. So another thing I was interested in is right now there's a lot
|
||||
of work around Rust, getting more Rust things, introducing that, memory safety,
|
||||
that's good. We like it. What is involved from a build perspective for getting
|
||||
a whole other language into Chrome and into the build? Because we have most of
|
||||
the things C++. There's some Java in all of the Android stuff. And in some
|
||||
areas, you see - you'll see a list of - you'll see a file name, and then you'll
|
||||
see file name underscore and then all the different operating systems, right?
|
||||
And most those are some version of C++. The Mac ones are .mm. And you have Java
|
||||
ones for Android. But if you want to add an entirely different language and
|
||||
still be able to build Chrome, at a high level, what goes into that?
|
||||
|
||||
29:00 NICO: Yeah, there's also some Swift on iOS. It's many different things.
|
||||
So at first, you have to teach GN how to generate Ninja files for that
|
||||
language. So when a CC file is built, then basically the compiler writes out a
|
||||
file that says, here are all the header files I depend on. So if one of them
|
||||
gets touched, the compiler - or Ninja knows how to rebuild those. So you need
|
||||
to figure out how the Rust compiler or the Swift compiler track dependencies.
|
||||
You need to get that information out of the compiler into the build system
|
||||
somehow. And C++ is fairly easy to build. It's like a per-file basis. I think
|
||||
most languages are more on a module or package base, where you build a few
|
||||
files as a unit. Then you might want to think about, how can I make this work
|
||||
with Goma so that the compilation can work remotely instead of locally? So
|
||||
that's the build system part. Then also, especially for us, we want to use this
|
||||
for some performance critical things, so it needs to be very fast. And we use a
|
||||
bunch of toolchain optimization techniques to make Chrome very fast with
|
||||
three-letter acronyms, such as PGO and LTO and whatnot. And LTO in particular,
|
||||
that means a Link Time Optimization. That means the C++ or the Rust code is
|
||||
built - is compiled into something called "bitcode." And then all the bitcode
|
||||
files at link time are analyzed together so you can do cross-file in-lining and
|
||||
whatnot. And for that work, the bitcodes - all the bitcode versions need to be
|
||||
compatible, which means Clang and Rust need to be built against the same
|
||||
version of LLVM, which is some - it's some internal compiler machinery that
|
||||
defines the bitcode. So that means you have to - if you want to do
|
||||
cross-language LTO, you have to update your C++ compiler and your Rust compiler
|
||||
at the same time. And you have to build them at the same time. And when you
|
||||
update your LLVM revision, it must break neither the C++ compiler nor the Rust
|
||||
compiler. Yeah. And then you kind of want to build the Rust library from
|
||||
source, so you have bit code for all of that. So it's a fairly involved - but
|
||||
yeah, we've been doing a lot of work on that. Not me, but other people.
|
||||
|
||||
31:24 SHARON: Right. Sounds hard. And what does LTO stand for, since you used
|
||||
it?
|
||||
|
||||
31:30 NICO: Link Time Optimization.
|
||||
|
||||
31:30 SHARON: All right.
|
||||
|
||||
31:30 NICO: And there's a blog post on the Chromium blog about this that we can
|
||||
link to in the show notes that has a fairly understandable explanation what
|
||||
this does.
|
||||
|
||||
31:43 SHARON: Yeah, all right. That sounds good. So linking, that was my next
|
||||
question. As you build stuff, you sort out all of your just compile errors, you
|
||||
got all your spelling mistakes out. The next type of error you might get is
|
||||
linking error. So how does - can you tell us a bit more about linking in
|
||||
general and how that fits into the build process?
|
||||
|
||||
32:01 NICO: I mean, linking - like, for C++, the compiler basically produces
|
||||
one object file for every CC file. And then the linker takes, like, about
|
||||
50,000 to 100,000 object files and produces a single executable. And every
|
||||
object file has a list of functions that are defined in that object file and a
|
||||
list of functions that are undefined in that object file that it calls that are
|
||||
needed from elsewhere. And then the linker basically makes one long list of all
|
||||
the functions it finds. And at the end, all of them should be defined, and all
|
||||
the non-inline ones should be defined in exactly one object file. And if
|
||||
they're not - if that doesn't happen, then it emits an error, and else, it
|
||||
emits a binary. And the linker is kind of interesting because the only thing
|
||||
you really care about is that it does its job very quickly. But it has to read
|
||||
through gigabytes of data before it writes the executable. And currently, we
|
||||
use a linker called `ld`, which was also written by people on the Chrome team,
|
||||
and which is also fairly popular outside of Chrome nowadays. And so we wrote on
|
||||
ELF linker, which is the file format used on Linux and Android, and on COFF
|
||||
linker, which is the file system used on Windows, and our own Mach-O linker,
|
||||
which is the file system on Apple - macOS and iOS. And our linkers are way,
|
||||
way, way faster than the things that they replace. On Windows, we were, like,
|
||||
10 times faster than the Windows linker. And on Mac, we're, like, four times
|
||||
faster than the system linker and whatnot. The other linker vendors have caught
|
||||
up a little bit, but we - I feel like Chrome has really advanced the state and
|
||||
performance of linking binaries across the industry, which I think is really
|
||||
cool.
|
||||
|
||||
33:44 SHARON: Yeah, that is really cool. And in a kind of similar vein to the
|
||||
different OSes and all that kind of stuff is 32- versus 64-bit. There's some
|
||||
stuff happening. I've seen people talk about it. It seems pretty important. Can
|
||||
you just tell us a bit more about this in general?
|
||||
|
||||
34:04 NICO: Well, I guess most processors sold in the last decade or so are
|
||||
64-bit. So I think on some platforms, we only support 64-bit binaries, like -
|
||||
and the bit just means how wide is a pointer and has some implications on which
|
||||
instructions can the compiler use. But it's fairly transparent too, I think, at
|
||||
the C++ level. You don't have to worry about it all that much. On macOS, we
|
||||
only support 64-bit builds. Same on iOS. On Windows, we still have 32-bit and
|
||||
64-bit builds. On Linux, we don't publicly support 32-bit, but I think some
|
||||
people try to build it. But it's really on Windows where you have both 32-bit
|
||||
and 64-bit builds. But the default bits is 64-bit, and you can say, if you say
|
||||
target CPU equals x86, I think, in your args.gn, then you get a 32-bit build.
|
||||
But it should be fairly transparent to you as a developer, unless you write
|
||||
assembly.
|
||||
|
||||
35:02 SHARON: How big of an effort would it be to get rid of 32-bit on Windows?
|
||||
Because Windows is probably the biggest Chrome-using platform, and also,
|
||||
there's a lot of versions out there, right? So -
|
||||
|
||||
35:15 NICO: Oh, yeah.
|
||||
|
||||
35:15 SHARON: How doable?
|
||||
|
||||
35:15 NICO: I think that the biggest platform is probably Android. But yeah,
|
||||
Android is also 32-bit, at least on some devices at the moment. That's true. I
|
||||
don't know. I think we've looked into it and decided that we don't want to do
|
||||
that at the moment. But I don't know details.
|
||||
|
||||
35:33 SHARON: And you mentioned ARM. So is there any - how much does the Chrome
|
||||
build team - are they concerned with the architecture of these processors? Is
|
||||
that something that, at the level that you and the build team have to worry
|
||||
about, or is it far enough - a few layers down that that's -
|
||||
|
||||
35:47 NICO: It's something we have to worry about at the toolchain team. So we
|
||||
update the scaling compiler every two weeks or so, which means we pull in all -
|
||||
around 1,000 changes from upstream contributors that work on LVM spread across
|
||||
many companies. And we have to make sure this doesn't break from on 32-bit ARM,
|
||||
64-bit ARM, 32-bit Intel, 64-bit Intel, across seven different operating
|
||||
systems. And so fairly frequently, when we try to update Clang tests start
|
||||
failing on, I don't know, 32-bit Windows or on 64-bit iOS or some very specific
|
||||
configuration. And then we have to go and debug and dissect and figure out
|
||||
what's going on and work with upstream to get that fixed. So yeah. That's
|
||||
something we have to deal with at the toolchain team, but hopefully, it's -
|
||||
hopefully, like the normal Chrome developer is isolated from that for the most
|
||||
part.
|
||||
|
||||
36:45 SHARON: I think so. It's not - if I weren't asking all these other
|
||||
questions, it's something that almost never crosses my mind, right? So that
|
||||
means you're all doing a very good job of that. Thank you very much. Much
|
||||
appreciated. And jumping way back, you mentioned earlier indexing the code
|
||||
base, code search. So I make a change. I submit it. I upload it. It eventually
|
||||
ends up in code search. So how does that process work? And what goes into
|
||||
indexing? Because before, when I was working on Fuchsia all the Fuchsia code
|
||||
wasn't indexed, so you couldn't do the handy thing of clicking a thing and
|
||||
seeing where it was defined. You had to actually look it up. And once you got
|
||||
that, it was like, oh my gosh, so much better. So can you just tell us a bit
|
||||
more about that process?
|
||||
|
||||
37:30 NICO: Sure, yeah. The Chrome has a pretty good code search feature, I
|
||||
think, codesearch.chromium.org or cs.chromium.org. Basically, we have a bot
|
||||
that runs, I think, every six hours or so, pulls the latest code, bundles it
|
||||
up, sends it to some indexer service that then also uses Clang to analyze the
|
||||
code. Like, for C++, I think we also index Java. We probably don't index Rust
|
||||
yet, but eventually we will. And then it generates - for every word, it
|
||||
generates metadata that says, this is a class. This is an identifier. And so if
|
||||
you click on it, if you click on a function, you have the option of jumping to
|
||||
the definition of the function, to the declaration, to all the calls, all the
|
||||
overrides, and so on. And that updates ideally several times a day and is
|
||||
fairly up to date. And we built the index, I think, for most operating systems.
|
||||
So you can see this is called here on Linux, here on Windows, and what not.
|
||||
|
||||
38:32 SHARON: OK. Sounds good. Very useful stuff. And I don't know if this is
|
||||
part of the build team's jurisdiction, but when you are working on things
|
||||
locally, you have some git commands, and then you have some git-cl commands.
|
||||
|
||||
38:43 NICO: Mm-hmm.
|
||||
|
||||
38:48 SHARON: So the git commands are your typical ones - git pull, git rebase,
|
||||
git stash, that kind of thing. And then you have git-cl commands, which relate
|
||||
more to your actual CL in Gerrit. So git-cl upload, git-cl status. That'll show
|
||||
you all your local branches and if they have a Gerrit change associated with
|
||||
them. So what's the difference between git and git-cl commands?
|
||||
|
||||
39:18 NICO: I'm sorry. So this is basically a git feature. If you call git-foo,
|
||||
then git looks for git-foo on your path. So you can add arbitrary commands to
|
||||
git if you want to. And git-cl is just something that's in `depot_tools`.
|
||||
Again, there's git-cl in `depot_tools`, and you can open that and see what it
|
||||
does. And it'll redirect to `git_cl.py`, I think, which is a fairly long and
|
||||
hairy Python script. But yeah. It's basically Gerrit integration, as you say.
|
||||
So you can use that to send try jobs, `git cl try`. To upload, as you say, you
|
||||
can use `git cl issue` to associate your current branch with a remote Gerrit
|
||||
review, `git cl patch` to get a patch off Gerrit and patch it into your local
|
||||
thing, `git cl web` to open the current thing in a web browser. Yeah, git-cl is
|
||||
basically - git-cl help to see all the git-cl commands, or - yeah. If you have
|
||||
a change that touches, like, 1,000 files, you can run `git cl split`, and it'll
|
||||
upload 500 reviews. But that's usually too granular, and I wouldn't recommend
|
||||
doing that. But it's possible.
|
||||
|
||||
40:25 SHARON: Right. Do you have a - [DOORBELL DINGS]
|
||||
|
||||
40:25 NICO: Oops, sorry.
|
||||
|
||||
40:25 SHARON: commonly - yeah.
|
||||
|
||||
40:30 NICO: Oh, sorry. There was - the door just rang. Maybe you didn't hear
|
||||
it. Sorry.
|
||||
|
||||
40:30 SHARON: All right. It's all good. Do you have a lesser known git or
|
||||
git-cl command that you use a lot or -
|
||||
|
||||
40:41 NICO: Well, I -
|
||||
|
||||
40:41 SHARON: is your favorite? [LAUGHS]
|
||||
|
||||
40:46 NICO: It's not lesser known to me, so I wouldn't know. I don't know. I
|
||||
use `git cl upload` a lot.
|
||||
|
||||
40:53 SHARON: Right. Well, you have to use `git cl upload`, right?
|
||||
|
||||
40:53 NICO: I use -
|
||||
|
||||
40:53 SHARON: Well, you don't - maybe not but -
|
||||
|
||||
40:53 NICO: `git cl try` to send try jobs from my terminal, `git cl web` to see
|
||||
what's going on, `git cl patch` a lot to patch stuff in locally. If I'm doing a
|
||||
code review and I want to play with it, I patch it in, build a local, and see
|
||||
how things are working.
|
||||
|
||||
41:12 SHARON: Yeah. When I patch in a thing, I go from the cl page on Gerrit
|
||||
and then click the down patch thing, but -
|
||||
|
||||
41:21 NICO: No, even `git cl patch -b` and then some branch name, and then you
|
||||
just patch - paste the Gerrit review URL.
|
||||
|
||||
41:28 SHARON: Oh, cool.
|
||||
|
||||
41:28 NICO: So it's just, yeah, Control-L to focus the URL bar. Control-C
|
||||
Alt-Tab `git cl patch -b blah`, Paste, Enter, and then you have a local branch
|
||||
with the thing.
|
||||
|
||||
41:36 SHARON: All right. Yeah, a lot of these things, once you learn about
|
||||
them - at first you're like, whoa, and then you use them, and then they're not
|
||||
lesser known to you, but you tell other people also a common - so another one
|
||||
would be `git cl archive`, which will -
|
||||
|
||||
41:47 NICO: Oh, yeah, yeah.
|
||||
|
||||
41:47 SHARON: get rid of any local branches associated with a closed Gerrit
|
||||
branch, so that's very handy, too.
|
||||
|
||||
41:53 NICO: Yes.
|
||||
|
||||
41:53 SHARON: So it's always fun to learn about things like that.
|
||||
|
||||
41:59 NICO: Are you fairly tidy with your branches? How many open branches do
|
||||
you usually have?
|
||||
|
||||
41:59 SHARON: [LAUGHS] I used to be more tidy. When I tried to do a cleanup
|
||||
thing, I had more branches. I think right now I've got around 20-something
|
||||
branches. I like having not very many. I think to some people, that's a lot. To
|
||||
some people, that's not very many. I mean, ideally, I have under five, right?
|
||||
[LAUGHS] But -
|
||||
|
||||
42:18 NICO: I don't know. I usually have a couple 10, sometimes. Have a bunch
|
||||
of machines. I think on some of them it's over 100, but yeah. Every now and
|
||||
then, I run `git cl archive` and it removes half of them, but -
|
||||
|
||||
42:29 SHARON: Yes. All right, cool. Is there anything that we didn't cover so
|
||||
far that you would like to share? So things that maybe you get asked all the
|
||||
time, things that people could do better when it comes to build-related things?
|
||||
Things that you can do that make the build better or don't make it worse, that
|
||||
kind of thing? Or just anything else you would like to get out there?
|
||||
|
||||
42:58 NICO: I guess one thing that's maybe implicitly stated, but currently not
|
||||
explicitly documented, as far as I know, but I'm hoping to change that, is - so
|
||||
Chrome tries to have a quiet build. Like, if you build this zero build output,
|
||||
except that one Ninja file, Ninja line that's changing, right? There's, well,
|
||||
another code basis - I think it's fairly common - that there's many screenfulls
|
||||
of warning that scroll by. And we very explicitly try not to do that because if
|
||||
the build emits lots of warnings, then people just learn to ignore warnings. So
|
||||
we think something should either be a serious problem that people need to know
|
||||
about, then it should be an error, or it should be not interesting. Then it
|
||||
should be just quiet. So if you add a build step that adds a random script, the
|
||||
script shouldn't print anything, just about progress. Shouldn't say, doing
|
||||
this, doing this, doing this. Should either print something and say something's
|
||||
wrong and fail those build step or not say anything. So that's one thing.
|
||||
|
||||
43:51 SHARON: That's - yeah, that's true.
|
||||
|
||||
43:51 NICO: And the other thing -
|
||||
|
||||
43:51 SHARON: Like, you only really get a bunch of terminal output if you have
|
||||
a compile or a linker error, whatever.
|
||||
|
||||
43:57 NICO: Right.
|
||||
|
||||
43:57 SHARON: I hadn't ever considered that. If you build something and it
|
||||
works, you get very few lines of output. And I hadn't ever thought that was
|
||||
intentional before, but you're right in that if it was a ton, you would just
|
||||
not look at any of it. So yeah, that's very cool.
|
||||
|
||||
44:09 NICO: Yeah. And on that same note, we don't do deprecation warnings
|
||||
because we don't do any warnings. So if people - like, people like deprecating
|
||||
things, but people don't like tidying up calls to deprecated functions. So if
|
||||
you want to deprecate something in Chrome, the idea is basically, you remove
|
||||
all callers, and then you remove the deprecated thing. And we don't allow you
|
||||
to say - to add a warning that tells everyone, hey, please, everyone, remove
|
||||
your calls. The onus is on the person who wants to deprecate something instead
|
||||
of punting that to everyone else.
|
||||
|
||||
44:46 SHARON: Yeah, I mean, the thing that I was working on has a deprecating
|
||||
effect, so removing callers, which is why I have so many branches. But I've
|
||||
also seen presubmit warnings for if you include something deprecated. So - oh,
|
||||
yeah, and there's presubmit, too. OK, we'll get to that also. [LAUGHS] Tell us
|
||||
more about all of this.
|
||||
|
||||
45:05 NICO: About presubmits? Yeah, presubmits - presubmits are terrible.
|
||||
That's the short summary. So if you run a `git cl presubmit`, it'll look at a
|
||||
file called presubmit.py, I think, in the current directory, and maybe in all
|
||||
the directories of files - of directories that contain files you touched or
|
||||
something like that. But you can just open the top-level presubmit.py file, and
|
||||
there's a couple thousand lines of Python where basically everyone can add
|
||||
anything they want without much oversight, so it's a fairly long - at least
|
||||
historically, that used to be the case. I don't know if that's still the case
|
||||
nowadays. But yeah, it's basically like a long list of things that random
|
||||
people thought are good if they - like, presubmits are something that are run
|
||||
before you upload, also, implicitly. And so you're supposed to clean them up.
|
||||
And [INAUDIBLE] many useful things. For example, nowadays we require most code
|
||||
to be autoformatted so that people don't argue about where semicolons should go
|
||||
or something silly like that. So one of the things it checks is, did you run
|
||||
`git cl format`, which runs, I guess, Clang format for C++ code and a bunch of
|
||||
custom Python scripts for other files. But it's also - presubmits have grown
|
||||
organically, and there isn't - they're kind of unowned and they're very, very
|
||||
slow. And I think some people have tried to improve them recently, and they're
|
||||
better than they used to be, but I don't love presubmits, I guess is the
|
||||
summary. But yeah, it's another thing to check invariants that we would like to
|
||||
be true about our code base.
|
||||
|
||||
46:48 SHARON: Yeah. I mean, I think - yes, spelling is something I think it
|
||||
also checks.
|
||||
|
||||
46:54 NICO: It checks spelling? OK.
|
||||
|
||||
46:54 SHARON: Or maybe that's a separate bot in Gerrit.
|
||||
|
||||
46:59 NICO: Oh, yeah, yeah, yeah, yeah. Like, there's this thing called -
|
||||
what's its name?
|
||||
|
||||
47:06 SHARON: Trucium? Tricium?
|
||||
|
||||
47:06 NICO: Tricium, yeah. Tricium, right. Tricium is something that adds
|
||||
comments to your - automatically adds comments to your change list when you
|
||||
upload it. And Tricuium can do spelling correction, but it can also - it runs
|
||||
something called Clang Tidy, which is basically a static analysis engine which
|
||||
has quite a few false positives, so sometimes it complains about something
|
||||
that - but it's actually incorrect, and so we don't put that into the compiler
|
||||
itself. So we've added a whole bunch of warnings to the compiler for things
|
||||
that we think are fairly buggy. But Clang Tidy is - but these warnings have to
|
||||
be - they have to have a very low false positive rate. Like, if they complain,
|
||||
they should almost always be right. But sometimes, for static analysis, it's
|
||||
hard to be right. Like, you can say this might be wrong. Please be sure. But
|
||||
this is not something the compiler can say, so we have this other system called
|
||||
Clang Tidy which also adds a comment to your C++ code which says, well, maybe
|
||||
this should be a reference instead of a copy, and things like that.
|
||||
|
||||
48:04 SHARON: Yeah. And I think it - I've seen it - it checks for unused
|
||||
variables and other - there's been useful stuff that's come from comments from
|
||||
there, so definitely. All right. Very cool. So if people are interested in all
|
||||
this build "infra-y" kind of stuff and they want to get more into it, what can
|
||||
they do?
|
||||
|
||||
48:32 NICO: We have a public build@chromium.org mailing list. It's very low
|
||||
volume, but if you want to reach out, you can send an email there and a few of
|
||||
us will see your email and interact with you. And there's also I think the tech
|
||||
build on crbug. So you can just look for build bugs and fix all our bugs for
|
||||
us. That'd be cool.
|
||||
|
||||
48:51 SHARON: [LAUGHS]
|
||||
|
||||
48:51 NICO: And if there's anything specific, just talk to local OWNERS. Or if
|
||||
you feel this is just something you're generally interested in and you're
|
||||
looking for a project, you can talk to me, and I probably have a long list of -
|
||||
I do have a long list of somewhat beginner-friendly projects that people could
|
||||
help out with, I guess.
|
||||
|
||||
49:15 SHARON: Yeah. I mean, I think being able to - if you're looking for a
|
||||
20%y kind of project or something else. But knowing how things actually get put
|
||||
together is always a good skill and definitely applicable to other things. It's
|
||||
the kind of thing where the more low level-knowledge you have, the more - it
|
||||
works - it applies to things higher up, but not necessarily the other way
|
||||
around, right?
|
||||
|
||||
49:34 NICO: Mm-hmm.
|
||||
|
||||
49:34 SHARON: So having that kind of understanding is definitely a good thing.
|
||||
All right. Any last things you'd like to mention or shout out or cool things
|
||||
that you want people to know about? [LAUGHS]
|
||||
|
||||
49:48 NICO: I guess -
|
||||
|
||||
49:48 SHARON: Or what - yeah, quickly, what is the future of the whole build
|
||||
thing? Like, what's the ideal situation if -
|
||||
|
||||
49:55 NICO: Ideally, it'll all be way faster, I guess is the main thing. But
|
||||
yeah, yeah, I think build speed is a big problem. And I'm not sure we have the
|
||||
best handle on that. We're working on many things, but - not many. A bunch of
|
||||
things. But it's - like, people keep adding all that much code, so if y'all
|
||||
could delete some code, too, that would help us a lot. I mean, having -
|
||||
supporting more languages is something we have to - this is something that's
|
||||
happening. Like, Rust is happening. We are also on iOS also using Swift.
|
||||
Currently, we can't LTO Swift with the rest because that's on a different OEM
|
||||
version. There's this - in C++ - we keep upgrading C++ versions. So Peter
|
||||
Kasting is working on moving us to C++20. And then 23, we'll have them, and so
|
||||
on. There's maybe C++ modules at some point, which may or may not help with
|
||||
build speed. And there's a bunch of tech debt that we need to clean up, but
|
||||
that's not super interesting.
|
||||
|
||||
51:24 SHARON: I don't know. I think people in Chrome in general are more
|
||||
interested and care about reducing tech debt in general, right? A lot of people
|
||||
I know would be happy to just do tech debt clean-up things only, right?
|
||||
Unfortunately, it doesn't really work out for job reasons. But a lot of people,
|
||||
I think, are interested in, I think, in higher proportions than maybe other
|
||||
places.
|
||||
|
||||
51:47 NICO: It depends on the tech debt. Some of it might work out for job
|
||||
reasons. But, yeah.
|
||||
|
||||
51:54 SHARON: Yeah. I mean, some of it is easier than others, too, right? Some
|
||||
of it is like, yeah, so, OK, well, go delete some code. Go clean up some
|
||||
deprecated calls. [LAUGHS] All that.
|
||||
|
||||
52:08 NICO: Yeah, and again, I think finishing migrations is way harder than
|
||||
starting them, so finish more migrations, start fewer migrations. That'd be
|
||||
also cool.
|
||||
|
||||
52:16 SHARON: All right. I am sure everyone listening will go and do that right
|
||||
away.
|
||||
|
||||
52:21 NICO: Yep.
|
||||
|
||||
52:21 SHARON: And things will immediately be better.
|
||||
|
||||
52:27 NICO: They've just been waiting to hear that from me, and now they're
|
||||
like, ah, yeah, right. That makes sense.
|
||||
|
||||
52:27 SHARON: Yeah, yeah. All right. Well, you all heard it here first. Go do
|
||||
that. Things will be better, et cetera. So all right. Well, thank you very
|
||||
much, Nico, for being here answering all these questions. I learned a lot. A
|
||||
lot of this is stuff that - everyone who works on Chrome builds Chrome, right?
|
||||
But you can get by with a very minimal understanding of how these things are.
|
||||
Like, you see your - you follow the Intro to Building Chrome doc. You copy the
|
||||
things. You're like, OK, this works. And then you just keep doing that until
|
||||
you have a problem. And depending on where you work, you might not have
|
||||
problems. So it's very easy to know very little about this. But obviously, it's
|
||||
so important because if we didn't have any of this infrastructure, nothing
|
||||
would work. So one, I guess, thank you for doing all the stuff behind the
|
||||
scenes, determinism, OSes, all that, making it a lot easier for everyone else,
|
||||
but also thank you for sharing about it so people understand what's actually
|
||||
going on when they run the commands they do every day.
|
||||
|
||||
53:31 NICO: Sure. Anytime. Thanks for having me. And it's good to hear that
|
||||
it's possible to work on Chrome without knowing much about the build because
|
||||
that's the goal, right? It should just work.
|
||||
|
||||
53:44 SHARON: Yeah.
|
||||
|
||||
53:44 NICO: Sometimes it does.
|
||||
|
||||
53:44 SHARON: [LAUGHS] Yeah. Well, thank you for all of it, and see you next
|
||||
time.
|
||||
|
||||
53:51 NICO: Yeah. See you on the internet. Bye.
|
||||
|
||||
54:03 SHARON: OK. So we will stop recording -
|
||||
|
||||
54:03 NICO: Wee. Time for the second take.
|
||||
|
||||
54:03 SHARON: [LAUGHS] Let's do that, yeah, all over again.
|
||||
|
||||
54:11 NICO: Let's do it.
|
||||
|
||||
54:11 SHARON: I will stop recording.
|
978
docs/transcripts/wuwt-e06-open-source.md
Normal file
978
docs/transcripts/wuwt-e06-open-source.md
Normal file
@ -0,0 +1,978 @@
|
||||
# What’s Up With Open Source
|
||||
|
||||
This is a transcript of [What's Up With
|
||||
That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)
|
||||
Episode 6, a 2023 video discussion between [Sharon (yangsharon@chromium.org)
|
||||
and Elly
|
||||
(ellyjones@chromium.org)](https://www.youtube.com/watch?v=zOr64ee7FV4).
|
||||
|
||||
The transcript was automatically generated by speech-to-text software. It may
|
||||
contain minor errors.
|
||||
|
||||
---
|
||||
|
||||
What does it mean for Chrome to be open source? What exactly is Chromium? Can
|
||||
anyone contribute to the browser? Answering all that is today's special guest,
|
||||
Elly. She worked all over Chrome and ChromeOS, and is passionate about
|
||||
accessibility, the open web, and free and open-source software.
|
||||
|
||||
Notes:
|
||||
- https://docs.google.com/document/d/1a6sdrspJgAHDdQMMNGV0t7zo8QWgqq0hgsyV55tc_dk/edit
|
||||
|
||||
Links:
|
||||
- [What's Up With BUILD.gn](https://www.youtube.com/watch?v=NcvJG3MqquQ)
|
||||
- [What's Up With //content](https://www.youtube.com/watch?v=SD3cjzZl25I)
|
||||
- [What are Blink Intents?](https://www.youtube.com/watch?v=9cvzZ5J_DTg)
|
||||
|
||||
---
|
||||
|
||||
00:00 SHARON: Hello, and welcome to "What's Up With That?" the series that
|
||||
demystifies all things Chrome. I'm your host, Sharon. And today, we're talking
|
||||
about open source. What does it mean to be open source? I've heard of Chrome,
|
||||
but what's Chromium? What are all the ways you can get involved? Answering
|
||||
those questions and more is today's special guest, Elly. Elly currently works
|
||||
on the Chrome content team, which is focused on making the web more fun and
|
||||
interesting to use. Previously, she's worked all over Chrome and Chrome OS.
|
||||
She's passionate about accessibility, the open web, and free and open-source
|
||||
software. Welcome, Elly.
|
||||
|
||||
00:34 ELLY: Thank you, Sharon.
|
||||
|
||||
00:34 SHARON: All right. First question I think is pretty obvious. What is open
|
||||
source? What does that mean?
|
||||
|
||||
00:40 ELLY: Yeah, so open source is a pretty old idea. And it basically just
|
||||
means, in the purist sense, that the source code for a program is open to be
|
||||
read by others.
|
||||
|
||||
00:51 SHARON: OK. And Chrome, the source code is available to be read by
|
||||
anyone. What else is it? Open source - I've heard of open-source community. It
|
||||
seems like there's a lot to it. So why don't you just tell us more about open
|
||||
source, generally?
|
||||
|
||||
01:03 ELLY: Yeah, for sure. There's quite a bit of nuance here. And there's
|
||||
been differing historical interpretations of some of these terms, so I'll -
|
||||
there's two big camps that are important to talk about. One is open source,
|
||||
which means what I said - the source is available to be viewed. There's also
|
||||
the idea of free software, which is software that actually has license terms
|
||||
that allow for people to modify it, to make their own derivative versions of
|
||||
it, and that kind of thing. And so historically, there was a pretty big
|
||||
difference between those things. These days, the two concepts are often talked
|
||||
about pretty interchangeably because a lot of open-source projects are free
|
||||
software, and all free software projects basically are open source. But the
|
||||
distinction used to be very important and is still pretty important, I guess.
|
||||
Chromium is both open source and free software. So we ship under a license that
|
||||
allows for - not only for everyone to read and look at our code, but also for
|
||||
other folks to make their own versions of Chromium. So, yeah, Chromium, both
|
||||
open source and free software.
|
||||
|
||||
01:56 SHARON: OK, very cool. And you mentioned Chromium in there. But I think
|
||||
for most people, when they think of the browser, they call it Chrome. So what
|
||||
is the difference between Chrome and Chromium? Are they the same thing? I think
|
||||
people, myself included, sometimes use those interchangeably, especially when
|
||||
you work on it. So what is the difference there?
|
||||
|
||||
02:16 ELLY: Yeah, fantastic question. So Chromium is an open-source and free
|
||||
software web browser that is made by the Chromium Foundation, which is like an
|
||||
actual .org that exists somewhere on the internet. Chrome is a Google-branded
|
||||
web browser that is basically made by taking Chromium, which is an open-source
|
||||
and free software web browser, adding some kind of Google magic to it, like
|
||||
integrations with some Google services, some kind of media codecs that maybe
|
||||
aren't themselves free software, that kind of thing, bundling that up into a
|
||||
more polished product which we call Google Chrome, and then shipping that as a
|
||||
web browser. So Chromium is an open-source project. Google Chrome is a Google
|
||||
product that is built on top of Chromium.
|
||||
|
||||
03:03 SHARON: OK. So Google Chrome is a Chromium-based browser, which is a term
|
||||
I think that people who work in any browser stuff - it's a term that they've
|
||||
all [INAUDIBLE] before.
|
||||
|
||||
03:08 ELLY: Yeah, exactly. And in fact, you alluded to the fact that we
|
||||
sometimes use those terms interchangeably. And especially at Google, we
|
||||
sometimes get a little confused about what we're talking about sometimes
|
||||
because we're - the Google Chrome team are the biggest contributors to
|
||||
Chromium, the open-source project. And so we tend to sometimes talk about the
|
||||
two things as though they're the same. But there's a really important
|
||||
difference for folks who are working on other Chromium-derived browsers. So if
|
||||
you're working on a Chromium derivative that a Linux distribution ships, for
|
||||
example, your browser is based on Chromium, and it's really not Chrome. It's
|
||||
Chromium, right? It is the open-source browser that Chrome is based on. But
|
||||
it's not the same thing at all.
|
||||
|
||||
03:52 SHARON: Yeah, if you want to learn a bit more about basing things on
|
||||
Chromium, the content episode is a good one to check out. We talk a bit about
|
||||
that and embedding Chrome in Chromium and what that means. So -
|
||||
|
||||
04:03 ELLY: Yeah, absolutely.
|
||||
|
||||
04:03 SHARON: check it out if you [INAUDIBLE]...
|
||||
|
||||
04:03 ELLY: And there's also, in the Chromium source tree, there's actually a
|
||||
thing called Content Shell, which is a minimal demonstration browser. It's like
|
||||
the rendering engine from Chromium wrapped in the least amount of browser
|
||||
possible to make it work. And we use it for testing, but it's also a really
|
||||
good starting point if you're trying to learn how to build a Chromium
|
||||
derivative browser.
|
||||
|
||||
04:22 SHARON: OK, very neat. So I think a next very natural question to come
|
||||
out of this is, why is Chrome or Chromium - Chromium rather - going to try to
|
||||
be good about using those correctly here - but why is Chromium open source?
|
||||
|
||||
04:40 ELLY: Yeah, so this is the decision that we made right when we were
|
||||
starting the project actually. And it's based on this really fundamental idea
|
||||
that the web benefits when users have better browsers. So if we, like the
|
||||
Chromium project, come up with some super clever way of doing something, or we
|
||||
come up with some really ingenious optimization to our JavaScript Engine or
|
||||
something like that, it's better for the web, better for everyone, and
|
||||
ultimately even better for Google as a business if those improvements are
|
||||
actually adopted by other people and taken by other people and used by them. So
|
||||
it is better for us if other people make use of anything clever that we do. And
|
||||
separately from that, there's this idea that's really prevalent in open-source
|
||||
communities of, if people can read the code, they're more likely to find bugs
|
||||
in it. And that's something that Chromium constantly benefits from, is folks
|
||||
who are outside the project, just kind of looking through our code base,
|
||||
reading and understanding it, spotting maybe security flaws that are in there.
|
||||
That kind of research is so much easier to do when the source code is just
|
||||
there, and you're not trying to reverse-engineer something you can't see the
|
||||
source to. So we get a lot of benefit from being open-source like that. And
|
||||
those are the reasons we had originally, and those still all hold totally true
|
||||
today, I think.
|
||||
|
||||
05:51 SHARON: That makes sense. Yeah, it seems, at first, a bit odd for a big
|
||||
company like Google to make something like this open source. But there are
|
||||
other massive open-source things at Google - Android, I think, being the other
|
||||
canonical example, which we don't know too much about, but we won't be getting
|
||||
too into that. But there are other big open-source projects around.
|
||||
|
||||
06:08 ELLY: Yeah, absolutely. And there's also, like - there's Go. That's an
|
||||
open-source programming environment, like a language and a compiler and a bunch
|
||||
of tools around it that is open source built by Google. There are plenty of
|
||||
other open-source and free software projects built by large corporations, often
|
||||
for really the same reasons. We benefit because the entire web benefits from
|
||||
better technology.
|
||||
|
||||
06:32 SHARON: Yeah, I think some of the Build stuff we do is open source. Check
|
||||
out the previous episode for that. And that's, yeah, exactly - not strictly
|
||||
only used by -
|
||||
|
||||
06:37 ELLY: Yeah, and by the way, partly because we're open source - like, for
|
||||
example, the Chromium base library, which is part of our C++ software
|
||||
environment - our base library is regularly used in other projects, even things
|
||||
that are totally unrelated to browsers, because it provides a high-quality
|
||||
implementation of a lot of basic things that you need to do. And so that code
|
||||
is being used in so many places we would never have anticipated and has done
|
||||
honestly more good in the world than it would do if it was just part of a
|
||||
really excellent browser.
|
||||
|
||||
07:13 SHARON: Something that someone on my first team told me was, if you've
|
||||
changed anything in base, that probably is going to get run any time the
|
||||
internet gets run, somewhere in that stack, which, if you think about it, is so
|
||||
crazy.
|
||||
|
||||
07:26 ELLY: Oh, Yeah. Absolutely. Early in my career, I added a particular
|
||||
error message to part of the Chrome network stack. And that network stack, too,
|
||||
is one of those components that gets reused in a lot of places. And so
|
||||
occasionally, I'll be running some completely other program. Like, I'll be
|
||||
running a video game or something, and I'll see that error message that I added
|
||||
being emitted from this program. And I'm like, oh, my code is living on in a
|
||||
place I would never have really thought of.
|
||||
|
||||
07:51 SHARON: Oh, that's very cool.
|
||||
|
||||
07:51 ELLY: Yeah.
|
||||
|
||||
07:51 SHARON: Yeah.
|
||||
|
||||
07:51 ELLY: It's one of those unique open-source experiences in my book, of
|
||||
seeing your own work being used like that by other folks you wouldn't have
|
||||
anticipated.
|
||||
|
||||
07:57 SHARON: Yeah, that's very cool. So something I think I've heard you say
|
||||
before that I thought sounded very cool was the open-source dream. So can you
|
||||
tell us a bit more about what that is. What is that vision? It sounds very
|
||||
nice.
|
||||
|
||||
08:09 ELLY: Yeah, so I talked about this a little bit. And earlier, I cautioned
|
||||
against conflating open-source and free software. But it really is more of the
|
||||
free software dream than the open-source dream, in some sense. That dream is
|
||||
this idea that if we have software that is made available for free, under
|
||||
licenses that let people modify it and make derivative works and keep using it,
|
||||
that over time, everyone will get access to really high-quality and
|
||||
freely-available software. And we will have a situation where the software that
|
||||
people need is built by their communities, built by the people who are in those
|
||||
communities, instead of being something that they have to buy from a company
|
||||
that makes it. It'll be something they can instead produce for themselves. And
|
||||
over time, I think that this has really played out in that way. If you look at
|
||||
the state of operating systems today, for example, there are these really
|
||||
high-quality, freely-available open-source free software operating systems that
|
||||
are readily available and anyone can use, and they really do meet the needs for
|
||||
a lot of folks. And then, in fact, it kind of circles back to where Linux is a
|
||||
high-quality, free software open-source operating system that Google can then
|
||||
turn around and make really good use of to build something like Chromium OS,
|
||||
which is another free software open-source project that uses Linux as one of
|
||||
its major components. And then we get to produce a product that the Chromium OS
|
||||
engineering team would have had to spend a lot of time if we weren't able to
|
||||
make use of that existing Linux kernel work. So you get into this cycle of
|
||||
giving back and sharing and benefiting from the effects of other people
|
||||
sharing. That's the free software dream to me.
|
||||
|
||||
09:57 SHARON: It does - yeah, that sounds great. And for sure - I try to use
|
||||
open-source options when I can. When I edit these videos, I use something
|
||||
open-source. It feels appropriate for what we're doing here. So, yeah, that
|
||||
sounds like it would be - it's a good system that everyone contributes to and
|
||||
everyone benefits from. And that's really nice.
|
||||
|
||||
10:10 ELLY: Yeah, absolutely.
|
||||
|
||||
10:16 SHARON: So going away from that towards the more less open-source part,
|
||||
so what kind of things in Chrome, the browser, are not open source? You
|
||||
mentioned a couple of things earlier. Can you tell us a bit more about some of
|
||||
those things?
|
||||
|
||||
10:27 ELLY: Yeah, I'm going to caveat this by saying that I don't personally
|
||||
work on the stuff I'm about to talk about. And so my knowledge is more
|
||||
superficial. There's a couple things I'm pretty confident about. So one is, for
|
||||
example, there's a few video formats that Chrome can play that Chromium cannot
|
||||
play because Google has agreements with the companies that make those codecs
|
||||
that allow us to basically license and embed their thing and ship it as part of
|
||||
Chrome. But those agreements, we can't really extend them to everyone who might
|
||||
make a Chromium browser. And so it ends up in a situation where there is a
|
||||
closed-source component that's included in Chrome to make that possible. I'm
|
||||
struggling to think of another example right off the top of my head. I believe
|
||||
that there's also a couple things in Chrome that are integrating with Google
|
||||
APIs, where they're features that are Chrome-specific because they're
|
||||
Google-specific. And one of the things that is generally true between the two
|
||||
products is that Chrome will have more Google integrations and more Google
|
||||
magic and more Google smarts than Chromium will. And so I think some of those
|
||||
are actually closed-source components that come from Google that get embedded
|
||||
into Chrome. But because they're a closed-source, we wouldn't want to put them
|
||||
into Chromium.
|
||||
|
||||
11:37 SHARON: Right. It seems like, yeah, I can sign into Chrome. I don't
|
||||
expect that I'd be able to sign in with my gmail.com into, say, Chromium. I'm
|
||||
not sure it's actually part of it, but that's a guess.
|
||||
|
||||
11:49 ELLY: Yeah, so that does work, except that you need to - any Chromium
|
||||
distributor needs to go and talk to - basically, talk to the sign-in team to
|
||||
get an API key that allows their browser to sign in. There is a process for
|
||||
doing that. It doesn't actually require any closed-source code components. But
|
||||
there is still a thing where you have to talk to the accounts team and
|
||||
basically be like, hey, we're a legitimate web browser, and we want to allow
|
||||
users to sign in. Because we don't want a situation where bots or malware are
|
||||
doing fake user sign-ins from - pretending to be Chromium. That's bad.
|
||||
|
||||
12:25 SHARON: Right. That makes sense. Yeah, and I think because of where
|
||||
Chrome and Chromium are positioned, I think there will be some interesting
|
||||
comparisons and differences between Chrome, Chromium, and other internal
|
||||
google3 projects. So that's kind of the term for things that are closed-source
|
||||
Google - the typical Maps, Search, all that stuff - and also comparing Chromium
|
||||
to other open-source projects. So we've talked a bit about the similarities and
|
||||
differences between Chrome and Google internal. Are there any other things you
|
||||
can think of that are either similar or different between Chrome the project
|
||||
and the people who work on it and how people do things internally at Google?
|
||||
|
||||
13:11 ELLY: Yeah. So internally at Google, there's this very powerful, very
|
||||
custom-built whole technology stack around the projects. There is a continuous
|
||||
integration system. There's an editor. There's a source control system. There's
|
||||
all of this stuff. Within Google, all of that is custom. And it's all fitted to
|
||||
Google's needs. And a lot of it is just built from scratch, frankly. Whereas
|
||||
for Chromium, we're using essentially off-the-shelf open-source stuff to meet a
|
||||
lot of those needs. So, for example, for version control, we're just using Git,
|
||||
which is I think the most popular version control system in the world right
|
||||
now. It's definitely open source. And our build system, for example, which is
|
||||
like GN and Ninja put together, those are both free software open-source
|
||||
projects. Admittedly, both of them were, I think, started as part of Chromium
|
||||
because we had those needs. But they, themselves, are free software components
|
||||
that anyone else can also use to build a Chromium. And the reason why that's
|
||||
done that way - like, why doesn't - it's actually a really good question. Why
|
||||
doesn't Chrome, which is a Google project, use all of this amazing
|
||||
infrastructure for engineering that Google has? And the answer is, we want the
|
||||
Chromium project to be possible to work on for people who don't work at Google.
|
||||
And so we can't say, oh, hey, whenever you're going to make a change, you have
|
||||
to commit it into Google's internal source control system. That wouldn't work
|
||||
at all. So we're almost - because we want to be an open-source project, and
|
||||
because we want to have contributors from outside of Google, we end up almost
|
||||
pushed into using this pretty open free software stack, which I - to be honest,
|
||||
from my perspective, has a lot of other benefits. When we have new folks
|
||||
joining the team, we can actually offer them tools they're already pretty
|
||||
familiar with. They don't have the feeling that new Googlers sometimes get,
|
||||
where they're totally disoriented. Like, everything they know about programming
|
||||
doesn't apply anymore. We actually be like, hey, here's Git. You know how to
|
||||
use this. Here's Gerrit, which is another piece of open-source software that we
|
||||
use. They may not have used Gerrit before, but a lot of projects do. And so
|
||||
they might have run into it previously. So it has pluses and minuses,
|
||||
definitely. So that's a big difference. There's also a bit of what I would say
|
||||
is a cultural difference more than anything else because most Google projects
|
||||
that are not open source - so I'm not talking about things like Android or Go
|
||||
or something like that - but projects that are really just not open source,
|
||||
like Search, their ecosystem of discussion and culture and stuff is very much
|
||||
inside Google. Whereas for Chromium, we constantly are getting ideas and
|
||||
suggestions and code changes and stuff from outside of Google. And so we also
|
||||
tend to have perspectives from outside of Google in our discussions more often
|
||||
as we work on Chromium. So part of that is at the level of, if we're going to
|
||||
make a change, we would have maybe input coming in on that change from Mozilla
|
||||
even. They're a group we collaborate with a ton on web standards. And so we
|
||||
would have their perspective in the discussion. Whereas if we were working
|
||||
entirely within Google, we might not have those external perspectives. So
|
||||
culture-wise, I feel like Chromium has more perspectives in the room sometimes
|
||||
when we're thinking about stuff.
|
||||
|
||||
16:26 SHARON: That makes sense because browsers exist across other companies
|
||||
too, and there's a lot of compatibility and standards and stuff. So just in
|
||||
that nature of things, you have to have a lot more of this collaboration. If
|
||||
you make a change, it'll affect all of the embedders maybe, and then you have
|
||||
to think about this. And, yeah, there's a lot more discussion - [INTERPOSING
|
||||
VOICES]
|
||||
|
||||
16:42 ELLY: Yeah, absolutely.
|
||||
|
||||
16:42 SHARON: If you're Search, you're like, OK, we're going to, I don't know,
|
||||
do our thing.
|
||||
|
||||
16:47 ELLY: Yeah, you have more - I don't know if "autonomy" is the right word.
|
||||
But, yeah. I want to caveat this by saying I'm not on Search. And so maybe it's
|
||||
totally different. But that's how it looks to me as a person who works on
|
||||
Chrome.
|
||||
|
||||
16:59 SHARON: Yeah. Yeah. And I think in terms of actual development and making
|
||||
code changes and stuff, I think probably the biggest difference is that because
|
||||
anyone can download the source repository and make changes and all that, the
|
||||
actual programming and changes you do, you do those on a computer. Maybe that's
|
||||
a machine you SSH into or a cloud top or whatever. But you have to actually
|
||||
download all of the code. Whereas with all of the google3 stuff, everything
|
||||
happens in a cloud somewhere. So everything is all connected, and you just do
|
||||
things through the browser pretty much.
|
||||
|
||||
17:29 ELLY: That's very true. Actually, there's another important facet that
|
||||
just occurred to me, which is, because Chromium is open source - and in
|
||||
particular, some open-source projects will use this model where they send out a
|
||||
release every so often. So they'll be like, we're shipping a new major release
|
||||
of our program, and here's the source that corresponds to that. So there are
|
||||
companies that do that. But we actually do what's called developing in the
|
||||
open. So our main Git repository that stores our source is public. Which means
|
||||
that as soon as you put in a commit, or even if you just put it up for code
|
||||
review, that's public. Everyone on the internet can see what we're doing live,
|
||||
which is really pretty interesting in terms of its effects on you. So for
|
||||
example, if you're in - you're working inside google3, and you're like, I have
|
||||
this really cool, wild idea, I'm going to go and make an experimental branch
|
||||
and just make a prototype of it and see what happens, you can just go do that.
|
||||
It's not a problem. But if you're working in Chromium, and you go and make your
|
||||
wild prototype experimental branch, you have a pretty good chance that
|
||||
someone's going to notice that. And then maybe you get a news story that's
|
||||
like, hey, Chromium might be adding this amazing feature. And you're like, oh,
|
||||
no, that was my wild, experimental idea. I didn't intend for this to happen.
|
||||
But now people have really picked up on it, and people outside of the company
|
||||
that you've never met are starting to get excited about something that you
|
||||
never really intended to build and just wanted to try. So it's a different way
|
||||
of working. You're sort of always in the public eye a little bit. And you want
|
||||
to be a little bit more considerate about how something might look to people
|
||||
way outside of your team and outside of your context. Whereas teams that are
|
||||
inside google3 I don't think have to think about that as much.
|
||||
|
||||
19:07 SHARON: Yeah, I mean, for me, I've only really worked in Chromium full
|
||||
time and all that. And I've just gotten used to the fact that all of my code
|
||||
changes are fully public and anyone can look at them. Whereas I think people
|
||||
who work in anything that's not like that - people in the company you work, I
|
||||
can see it. But not just anyone out there. So I don't know. I've gotten used to
|
||||
it, but I think it's not a typical thing to [INAUDIBLE].
|
||||
|
||||
19:30 ELLY: Oh, yeah. Absolutely. And in fact, this is something that folks who
|
||||
are transferring into Chrome from other parts of Google sometimes have a little
|
||||
difficulty with, is if you're used to writing a commit message where maybe the
|
||||
only description in the commit message is go/doc about my project, for Chromium
|
||||
that doesn't fly because only Googlers can actually follow those links. And so
|
||||
the commit message to a non-Googler doesn't say anything. And so you actually
|
||||
have to start thinking, how am I going to explain this whole thing I'm doing to
|
||||
a non - to a person who doesn't have any of this Google-specific context about
|
||||
what it is. You go through this little mental - you cross this little mental
|
||||
bridge where you actually are forced to reframe your own work away from, what
|
||||
are Google's business goals, and towards, how does this fit Chromium, the
|
||||
open-source project, that other people also use? It's interesting and
|
||||
occasionally a little frustrating, but interesting and usually really
|
||||
beneficial.
|
||||
|
||||
20:26 SHARON: Yeah, for sure. And I think from people I've talked to, it just
|
||||
seems like another, briefly, difference between internal Google stuff and
|
||||
Chromium is that internal Google just has a ton of tools you can use.
|
||||
|
||||
20:37 ELLY: Yes, absolutely.
|
||||
|
||||
20:37 SHARON: Which both means a lot of things that are maybe a bit challenging
|
||||
in Chromium are probably easier, but also maybe finding the right tool is hard.
|
||||
But -
|
||||
|
||||
20:42 ELLY: Oh, yeah. That is very much the case. I have only limited
|
||||
experience working inside google3. But I definitely have experienced the
|
||||
profusion of tools and also the fact that the tools are just honestly amazing.
|
||||
And it makes total sense. Google has many, many engineers whose whole job is to
|
||||
build great tools. And Chromium is just not that big of a project. We just
|
||||
don't have that many folks that are working on it. The folks who do build
|
||||
infrastructure work for Chromium do amazing work, but there's not hundreds of
|
||||
them. And so it's not on the same level.
|
||||
|
||||
21:12 SHARON: Yeah. And what you said earlier makes me have - gives me - has -
|
||||
makes me wonder - and this ties us into the next thing - of other open-source
|
||||
projects, they just do a release, and they don't maybe do development in the
|
||||
open. And having not actually worked on other open-source projects really, I
|
||||
kind of assumed that this development in the open was the norm. So how common
|
||||
do you think or you know that that practice is?
|
||||
|
||||
21:45 ELLY: Gosh, I would really be guessing, to be honest with you. But I
|
||||
would say the development in the open is by far the norm these days. And when
|
||||
you see projects that follow the big release model instead, the way that looks
|
||||
is they'll be like, hey, version 15 is out, and here's the source for
|
||||
version 15. You can look at it. But the development, as it happens, happens
|
||||
internally. I would tend to associate that with being maybe big company
|
||||
projects that have a lot of confidentiality concerns. So for example, if you're
|
||||
building the software that goes with some cool, new hardware for your company,
|
||||
you don't want to start checking that software into Git publicly because then
|
||||
people are going to read it and be like, ooh, this has support for a
|
||||
billion-megapixel camera. That must be coming in the new thing. And so I think
|
||||
that the big release model might be, these days, more prevalent when people are
|
||||
doing hardware integrations, where there's other components that are shipping
|
||||
at a fixed time and you don't want your source to be open until that point. But
|
||||
honestly, the developing in the open model is, I think, much more common these
|
||||
days. Historically, back in the '70s and '80s, when you would buy an operating
|
||||
system and it would come with source, that was just a thing that you got as
|
||||
part of the package, then it was much more of the source is released with the
|
||||
OS model. Whereas these days, because distributed development is so easy with
|
||||
modern version control systems, it's just so common to just develop in the open
|
||||
like we do.
|
||||
|
||||
23:11 SHARON: Oh, cool. I didn't know that. So compared to other open-source
|
||||
projects, what are some similarities and differences that Chromium has to
|
||||
others that you may be familiar with?
|
||||
|
||||
23:25 ELLY: Ooh. All the ones I'm familiar with are quite a bit smaller than
|
||||
Chromium. And so it's going to be hard to talk about it because, frankly -
|
||||
|
||||
23:32 SHARON: That's probably the common difference, though, right? Probably
|
||||
very few are as big as Chromium.
|
||||
|
||||
23:32 ELLY: Oh, yeah. So in particular, one of the hardest problems in open
|
||||
source - in running an open-source project is managing how humans relate to
|
||||
other humans. The code problems are often relatively easy. The problems of how
|
||||
do we make decisions about the direction of a project that maybe has a hundred
|
||||
contributors who speak 10 different languages across a dozen time zones, that's
|
||||
a hard problem. And so I often talk about the idea between open source, open
|
||||
development, and then open governance. And so open source is just, like, you
|
||||
can see the source. Open development is you can see the development process. So
|
||||
the Git repo is open. The bug tracker is open. The mailing lists, where we do a
|
||||
lot of our discussion, are open. So we do open development. But then you have
|
||||
this next step of open governance, where the big decisions about where the
|
||||
project is going are made in the open. And for Chromium, some of those are made
|
||||
in the open, especially when it's really about the web platform or that kind of
|
||||
thing. But some of them are not. For example, if we're deciding that we're
|
||||
going to do some cool new UI design, that design and the initial development of
|
||||
it might not necessarily be - or sorry, the development would be done in the
|
||||
open, but the designing of it might not. That might be a discussion between a
|
||||
few UX designers who all work at Google in a Google internal place. And so
|
||||
Chromium has a bit of open governance but not all the way. A lot of smaller
|
||||
projects have super open governance. So they'll literally be like, hey, should
|
||||
we rewrite this entire thing in Rust? And they'll make that decision by arguing
|
||||
about it on a mailing list, where everyone can see. And that's totally, totally
|
||||
fine. Because Chromium is so big, we can't make those kinds of decisions by
|
||||
having every Chromium engineer have their opinion and just post. It would be
|
||||
complete chaos. And because we're big and prominent, a lot of the work that we
|
||||
do is very much in the public eye. And so even discussions that are maybe
|
||||
relatively speculative - like that example I gave before, where you have an
|
||||
idea and you're like, wouldn't it be neat if we did this? It's easy for that to
|
||||
turn into people inferring what Google's intentions are with Title Case, like,
|
||||
Big Important Thing, and turning that into a lot when you would not have
|
||||
intended it to be that way. And so we do end up keeping our governance
|
||||
relatively on the closed side compared to other open-source projects I've
|
||||
worked on. Other than that, in terms of engineering practices and what we do to
|
||||
get the code written, we uphold a super high standard of quality. And in
|
||||
particular - which is not to say that most open-source projects don't, because
|
||||
they totally do. But Chromium, in my opinion, is really, really thoughtful
|
||||
about not just, hey, how should code review work, but really evolving stuff
|
||||
like, how should we bring new developers into this project? What should that
|
||||
feel like? Those are discussions that we have. And I often feel like those are
|
||||
discussions that other open-source projects don't talk about as much. What else
|
||||
is different for us? I'm not sure. I think that those are some of the big ones.
|
||||
The differences in scale are such that it's almost hard to talk about. The
|
||||
difference between an open-source project that maybe has 5 contributors and one
|
||||
that has 500 is very, very large.
|
||||
|
||||
27:07 SHARON: With the open governance thing you mentioned, something that that
|
||||
made me think of is maybe Blink Intents, where you submit a thing to a list and
|
||||
then that gets discussed. So that's part of the Chromium project, I think,
|
||||
right? That falls under that category.
|
||||
|
||||
27:20 ELLY: Yep. Yep.
|
||||
|
||||
27:20 SHARON: And so that's where, if you want to make a change to Blink, the
|
||||
rendering engine, you do this process of posting it to a list, and then people
|
||||
weigh in.
|
||||
|
||||
27:25 ELLY: Yeah, absolutely. So Blink really does do open governance in a way
|
||||
that I, honestly, very much admire. Blink and the W3C and a lot of these groups
|
||||
that are setting standards for the internet do do open governance. Because,
|
||||
frankly, it's the only way for them to work. It would not be good or healthy
|
||||
for the web if it was just like, we're going to do whatever - whatever we,
|
||||
Google, have decided to do and good luck everyone else. That would be very bad.
|
||||
So yeah, Blink definitely does do open governance. But when it gets to things
|
||||
that are more part of the browsers' behavior and features, we tend to have the
|
||||
governance a little more closed.
|
||||
|
||||
28:08 SHARON: Right. And I think an example of Blink being more open governance
|
||||
is the fact that BlinkOn is open to anyone to participate to. And that's the
|
||||
channel that we're posting this on right now. It just happened to make sense
|
||||
that I figured most of the audience who is watching Blink [INAUDIBLE] already
|
||||
are interested in these, too. So that's why - [INTERPOSING VOICES]
|
||||
|
||||
28:27 ELLY: Yeah, absolutely.
|
||||
|
||||
28:27 SHARON: And for people who may not have - may have found these videos
|
||||
that don't know about BlinkOn. That's what that is.
|
||||
|
||||
28:34 ELLY: Yeah. And just in that vein of open governance for Blink,
|
||||
especially, there's also this idea of being a standard and then having things
|
||||
be compatible with it. So the web platform is a collection of standards. And
|
||||
other browsers have to implement those standards, too. And so for example, if
|
||||
we make up a standard that is very difficult or impossible for, like, Firefox
|
||||
to implement, that's not good. That's fragmenting the web platform. That's a
|
||||
bad thing. Whereas the Chromium UI, like how the omnibox works in Chromium, for
|
||||
example, isn't a standard. It doesn't matter whether Firefox or Edge or Opera
|
||||
or whoever have the same omnibox behavior as us, right? And so there's much
|
||||
less of a need to all agree. And instead, it's almost a little bit better to
|
||||
have some variety there so that users can get a little bit more of a choice and
|
||||
that collectively more things get tried in that vein. So there's places where
|
||||
agreement and standardization are really important. And then there's places
|
||||
where it's actually OK for each individual browser to go off on its own a bit
|
||||
and be like, hey, we thought of this cool, new way to do bookmarks. And so we
|
||||
have built this. And it doesn't matter whether the other browsers agree about
|
||||
it because bookmarks are not a thing that interoperates between browsers.
|
||||
|
||||
29:44 SHARON: Yeah, that makes sense. So now let's talk about some of the
|
||||
actual details of what it's like to work on Chromium and make changes, write
|
||||
code, and new ideas. So I think you mentioned a few things, like bug tracking.
|
||||
That's all public, in the open, apart from, of course, security-sensitive
|
||||
things and other [INAUDIBLE] are hidden. What else is there? Code review - that
|
||||
was Gerrit. You mentioned that. So You can see all the comments that everyone
|
||||
leaves on everyone's changes.
|
||||
|
||||
30:16 ELLY: Oh, Yeah. And for better or for worse, by the way. It's good to
|
||||
bear in mind that if you're like - you're going to type like a slightly jerk
|
||||
message to someone on a code review, that's going to be preserved for all time,
|
||||
and everyone's going to be able to see it.
|
||||
|
||||
30:29 SHARON: Yeah. Yeah. Be nice to people. [CHUCKLES] Version control -
|
||||
that's Git. Probably people will know about that. Something that might be worth
|
||||
mentioning is that a lot of people who contribute to Chromium, and if you look
|
||||
at things like Gerrit and Chromium Code Search - that's also public, of
|
||||
course - looks a lot like Google internal code search, but obviously it's open
|
||||
source. So a lot of people have @chromium.org emails.
|
||||
|
||||
31:00 ELLY: Yes.
|
||||
|
||||
31:00 SHARON: So why are there separate emails? Because you can use at a
|
||||
google.com or a GMail or any email. So why have this @chromium.org email thing?
|
||||
|
||||
31:05 ELLY: Yeah, so there's a few different reasons for that. So chromium.org
|
||||
emails are available to members of the project, which is a little bit
|
||||
nebulously defined, but it's definitely not just Googlers. And so there's a
|
||||
couple reasons why people like having those. So for some folks, it's sort of a
|
||||
signal that you are acting as a member of the open-source project rather than
|
||||
acting with your Google hat on, if you like. And so for example, I help run the
|
||||
community moderation team for Chromium. And so when I'm doing work for that
|
||||
team, I'm very careful to use my chromium.org account because I want it to be
|
||||
clear that I'm enforcing the Chromium community guidelines, which are something
|
||||
that was agreed upon by a whole bunch of Chromium members, not just Googlers.
|
||||
And so I'm not enforcing Google's code of conduct. I'm enforcing Chromium's
|
||||
code of conduct in my role as a Chromium project person. So sometimes you
|
||||
deliberately put on your Chromium hat so that you can make it clear that you
|
||||
are acting on behalf of their project. Some folks - and I'm also one of these
|
||||
folks, by the way - just happen to really be big fans and supporters of free
|
||||
software and of open source. And so if I have the choice between wearing my
|
||||
corporate identity and wearing my open-source project member identity, I might
|
||||
just wear my open-source project member identity and decide to actually
|
||||
contribute that way. And so a lot of the folks who've been on Chromium - or
|
||||
have been on Chrome, I should say, for a while, that's part of their reasoning.
|
||||
They joined because they were excited to work on something that was open. And
|
||||
so they have this open-source identity, this Chromium identity, that they use
|
||||
for that. There's a third factor, and this touches on one of the sometimes less
|
||||
pleasant parts of working in open source, which is our commit log and our bug
|
||||
tracker and all of that stuff are public. And what that means is that everyone
|
||||
on the internet can go see them. And that is often great, but it's occasionally
|
||||
not great. So for example, if you go and make an unpopular UI change, people on
|
||||
the internet know that that was you. And that might not be something that
|
||||
you're necessarily super ready to deal with. So for example, way, way, way, way
|
||||
early in my career, I made a change to Chromium OS because I was working - I
|
||||
was on the Chrome OS team as a brand Noogler. So this is I've been at Google
|
||||
maybe five or six months. I made a change to Chrome OS. Somebody happened to
|
||||
notice it and take issue with it. I don't even remember what the change was or
|
||||
the issue. But they happened to notice it and take issue with it. They showed
|
||||
up in our IRC channel, because we used IRC at the time, which was also public
|
||||
because the whole project was very open like that, and really just started
|
||||
yelling at me personally about it. And I'm like, this is not a cool experience.
|
||||
This is something that if this was a Google coworker of mine, I would be
|
||||
talking to HR about this. But it's actually just a random person on the
|
||||
internet. And so there are some folks who use their Chromium username as a
|
||||
little bit of a layer of insulation almost, where it's like, I want to work on
|
||||
this project, but I don't - maybe my Google username has my full name in it. I
|
||||
don't necessarily want every change I make to be done like that. And so if you
|
||||
don't do that, you can end up in a situation where you make a change, and then
|
||||
it's really attributed to you as though it was your personal idea and you did
|
||||
this bad thing. And that's not a risk that everyone wants to take as part of
|
||||
doing their work. And so sometimes people have a chromium.org account really
|
||||
because they want an identity that's separate from their Google account - that
|
||||
has a different name on it, that has different stuff like that. And so one of
|
||||
the things that I'm always cautious to remind folks of on my team is, if you're
|
||||
working with someone who has a chromium.org account, always use that
|
||||
chromium.org account when you're speaking in public, always, always, always,
|
||||
because you don't want to break that veil if someone is relying on it.
|
||||
|
||||
35:09 SHARON: Right. Yeah, that makes sense. And I think, in general, whenever
|
||||
you are signing up for interacting in these public spaces, generally, I think
|
||||
it's encouraged to use your chromium.org account. So for example, Slack, which
|
||||
is the modern - current IRC often -
|
||||
|
||||
35:27 ELLY: It hurts my soul to hear you say that.
|
||||
|
||||
35:32 SHARON: Well - [LAUGHS]
|
||||
|
||||
35:32 ELLY: I'm a die-hard IRC user. I've been using IRC for 30 years. And I
|
||||
was one of the few people who was I think very sad when we decided to move off
|
||||
IRC. But you're right, that it is the modern IRC option.
|
||||
|
||||
35:44 SHARON: I think a lot of people are very die hard about IRC. So, you
|
||||
know, but modern or not, that's what's currently being used.
|
||||
|
||||
35:49 ELLY: Absolutely.
|
||||
|
||||
35:55 SHARON: So Slack is where anyone can join and discuss Chromium stuff. And
|
||||
generally, that kind of thing, you're encouraged to use your chromium.org
|
||||
account.
|
||||
|
||||
36:01 ELLY: Yeah, absolutely. And to be fair to Slack also, the Slack has
|
||||
probably 30 times as many people in it as the IRC channel ever did. So I think
|
||||
that it's pretty clear that Slack is more popular than IRC was. But, yeah, no,
|
||||
we use our Chromium identities a lot, really, really on purpose. And to be
|
||||
honest, I would like it if we use them even more. Sometimes you will see folks
|
||||
who actually have both identities signed up. So they'll have their google.com
|
||||
and their Chromium, and that's always confusing for everyone. So if it was up
|
||||
to me, I would say everyone has a Chromium identity, and they'd just all use it
|
||||
when they're contributing.
|
||||
|
||||
36:39 SHARON: Yeah, that's definitely one of these unique two Chromium
|
||||
[INAUDIBLE] pain points of someone [INAUDIBLE] use their maybe - often, they're
|
||||
the same for most people. But sometimes they're different. Sometimes they're
|
||||
very subtly different, and it's -
|
||||
|
||||
36:53 ELLY: Absolutely.
|
||||
|
||||
36:53 SHARON: you end up sending your [INAUDIBLE]...
|
||||
|
||||
36:53 ELLY: I also - I have met a couple folks who the Google username they
|
||||
really wanted wasn't available, but it was available for chromium.org. And so
|
||||
they picked a shorter, cooler username for chromium.org, which is totally -
|
||||
totally fine to do. But then, every time you have to remember, oh, I know them
|
||||
by this longer Google username, but actually they use this shorter username for
|
||||
Chromium.
|
||||
|
||||
37:13 SHARON: Yeah, you have to remember their real life name. You have to
|
||||
remember their work email. And then now you have to remember another work
|
||||
email.
|
||||
|
||||
37:19 ELLY: Well, we have software that can help with that a bit.
|
||||
|
||||
37:25 SHARON: Yeah, for sure. So as part of that, and that's, in a way - a
|
||||
thing that to me feels very related is there's a thing called being a committer
|
||||
in Chromium. So what does it mean to be a committer? And what does it entail?
|
||||
|
||||
37:37 ELLY: Yeah, so committers are basically people who are trusted to commit
|
||||
to CLs, for want of a better way of putting it. So the way the project is
|
||||
structured, anyone can upload a CL. And anyone anywhere on the internet can
|
||||
upload a CL. It has to be reviewed by the OWNERS of the directories that it
|
||||
touches or whatever. But there are some files that are actually, like, OWNERS
|
||||
equals star. So for example, the build file in Chrome browser, because
|
||||
everybody needs to edit it all the time, it just has OWNERS equal star. And
|
||||
there's a comment that's like, hey, if you're making a huge change, ask one of
|
||||
these people. But otherwise, you're just freely allowed to edit it. And so if
|
||||
the committer system didn't exist, anyone on the internet would be allowed to
|
||||
edit a bunch of parts of the project without any review, which is pretty bad.
|
||||
And so there's this extra little speed bump where it's like, you have to send
|
||||
in a few CLs to show that you're really a legit person who's contributing to
|
||||
the project. And once you've done that, you get this committer status, which
|
||||
actually allows you to push the button that makes Gerrit commit your change
|
||||
into the tree. And that's what it does mechanically. We culturally tend to have
|
||||
it mean something a little different than that, but it's - culturally, it's
|
||||
like a sign of trust of the other project members in you. So getting that
|
||||
committer status really means, we collectively trust you to not totally screw
|
||||
things up. That's what it is. And so you have to be a committer to actually be
|
||||
in an OWNERS file, for example. You can't be listed as an owner until you're a
|
||||
committer. Because if you're not a committer yet, we're not really - if we're
|
||||
not trusting you to commit code, we're not really going to trust you to review
|
||||
other people's code. And, yeah, when you're new joining the project, it's
|
||||
actually a pretty big milestone to become a committer. You become a committer
|
||||
after you've been working for anywhere from three to six months, I would say.
|
||||
And it's definitely this moment of being like, yeah, I've really arrived. I'm
|
||||
no longer new on the project. I'm now a full committer.
|
||||
|
||||
39:51 SHARON: Can you briefly tell us what the steps, mechanically, to becoming
|
||||
a committer are?
|
||||
|
||||
39:51 ELLY: Yeah, so you need to have landed enough CLs to convince people you
|
||||
know what you're doing. And there is no hard and fast limit, but it's like - it
|
||||
should be convincing. And so I often hear maybe 15 to 20 of nontrivial CLs is a
|
||||
pretty good number. Having done that, you need someone to propose you or
|
||||
nominate you for committership. So there's actually - there's a mailing list
|
||||
for having these discussions. And so whoever's going to nominate you, who has
|
||||
to already be a committer, they'll send mail to that list, basically being
|
||||
like, I would like to nominate this person for committer. There's a comment
|
||||
period during which people can reply. And then if there's nobody who is raising
|
||||
a big objection to you being a committer, after - I don't know what the actual
|
||||
time period is - but after some amount of time, the motion carries with no
|
||||
objections, and then your Chromium account becomes a committer. I think Google
|
||||
accounts can also be committers as well, but I've only ever done this process
|
||||
for Chromium accounts. And so those threads - what's going on in those threads
|
||||
is mostly people endorsing the request. So let's say that I have someone who's
|
||||
new on my team who I want to propose as a committer. I'll start the thread
|
||||
nominating them as a committer, and then I'll go and talk to maybe two or three
|
||||
of the people who have reviewed a lot of their changes, and I'll be like, hey,
|
||||
would you endorse this person for a committer? If so, please post in this
|
||||
thread. And so in the thread, there will actually be a couple of replies that
|
||||
are like, plus 1, or, yes, this seems like a good fit. Very rarely, there might
|
||||
be a reply, which is like, hey, I saw some - I saw some stuff on this CL that
|
||||
shows that maybe this person isn't quite ready. We had a whole bunch of back
|
||||
and forth comments, and eventually it really didn't seem like they understood
|
||||
what I was asking for. And I feel like they're not really ready yet. Sometimes
|
||||
that will happen. But usually the threads - by the time someone's nominating
|
||||
you, you're already in good shape. So that's the mechanical process. And then
|
||||
there is - it might actually just be Eric, individually, who goes through and
|
||||
flips the bits on people being committers based on the threads. I'm not sure.
|
||||
But there's some process by which those threads turn into people being
|
||||
committers.
|
||||
|
||||
42:14 SHARON: OK, cool. Is there an analog of this either internally at Google
|
||||
or in other open-source projects? Because internally at Google, there's the
|
||||
concept of readability, which means you are vouched for that you know how to
|
||||
code in this one language, which has some similarities. That's maybe a similar
|
||||
thing. Are there any similar notions in other projects you've seen?
|
||||
|
||||
42:38 ELLY: Yeah, so many projects have this notion of being a member. And that
|
||||
often combines our notions of committer and sometimes code owner. And so they
|
||||
might - or for some open-source projects, you'll actually hear "maintainer" as
|
||||
the thing. And so they'll be like, only people who are project members can
|
||||
upload changes in the first place. And only people who are maintainers can
|
||||
merge those changes. So that little speedbump on entry is pretty common.
|
||||
Because it's a fact of life that if you are on the public internet and you have
|
||||
no barriers to entry, you're going to have spam in your community no matter
|
||||
what you do. And so that kind of split is super, super common. For some
|
||||
projects that don't do open development, the entire thing might happen inside a
|
||||
company or inside an organization anyway. And then there is no notion of
|
||||
committer status because you're just hired onto that team and then you can
|
||||
commit. But for projects that do open development and free software projects,
|
||||
there is often a sense of, these are the people who are roughly trusted to land
|
||||
code. And for a lot of projects, especially bigger ones, there's actually a
|
||||
two-tiered model, where maybe you have people who are domain experts on a
|
||||
specific thing, like, they maintain some subsystem. And they're trusted to make
|
||||
whatever changes they need or approve other people's changes in that area. But
|
||||
then at the wider scale, there's what's often called a steering committee or a
|
||||
core group or something. And those groups have authority over the whole project
|
||||
and the direction of everything that's going on. And so you'll often see that
|
||||
kind of model in larger projects. At smaller scales, it's often literally a
|
||||
list of one to five people who all have commit access to the same Git repo, and
|
||||
there's no - no structure on top of that. But for bigger projects, governance
|
||||
becomes a real concern. And so people start thinking about that.
|
||||
|
||||
44:35 SHARON: All right. Now, let's switch topics to talking about the more
|
||||
day-to-day logistics of working on Chromium. So if you're not a Googler, don't
|
||||
work at Google, to what extent can you effectively contribute to Chromium, the
|
||||
project?
|
||||
|
||||
44:48 ELLY: Yeah, so that depends where you're coming from, both whether you're
|
||||
part of another large organization, like maybe you work at Microsoft, you work
|
||||
at Opera, Vivaldi, one of those companies, or if you're really an IC lone
|
||||
contributor. If you're in a large organization, probably your org will have its
|
||||
own structure around how you should contribute anyway. And so you might just
|
||||
want to talk about that. So I'll really focus on the individual contributor
|
||||
angle. And so for engineers specifically, like if you're a programmer who wants
|
||||
to contribute to the code base, that's awesome. The best approach I think is
|
||||
really to find an area that you're passionate about because it's so much more
|
||||
fun and enjoyable to contribute when you're doing something you care about. So
|
||||
find an area you care about. Get in touch with the team that works on that
|
||||
area, either through their mailing lists or find their component in Monorail or
|
||||
find them in the OWNERS files or whatever. Get in touch with those folks. Ask
|
||||
them what are good places for you to contribute as a new person. That's often a
|
||||
really great way to get started. And you'll have a person you can go to for
|
||||
advice to be like, hey, how do I go about doing this thing? My experience has
|
||||
been that Chromium contributors are pretty much all super helpful. And so
|
||||
they're very willing to just give you guidance or do whatever. And you'll then
|
||||
know who to send your code reviews to.
|
||||
|
||||
46:01 SHARON: Cool. Yeah. And if you're not an engineer, what are some ways you
|
||||
can also contribute?
|
||||
|
||||
46:06 ELLY: Yeah, so there's a whole bunch of these. And by the way, these all
|
||||
apply to basically every open-source project, so not just Chromium
|
||||
specifically. So open-source projects, if you are a good writer, if you enjoy
|
||||
doing technical writing or you enjoy doing UX writing or you want to do that
|
||||
kind of thing, almost every open-source project out there is looking for people
|
||||
to contribute documentation. And Chromium is no exception at all to that. So
|
||||
high-quality documentation, we love that stuff. Or even if you're just honing
|
||||
that craft and you want to practice, Chromium is not a bad spot to do that. If
|
||||
you're a UX designer or a visual designer, a lot of open-source projects will
|
||||
actually appreciate your contributions of you bringing in, like, hey, I thought
|
||||
of a way that this user experience could feel or how the screen could look or
|
||||
something like that. They'll often appreciate that kind of input or design
|
||||
work. If you are someone who speaks multiple languages, translations are
|
||||
another great way to contribute to open-source projects. A lot of open-source
|
||||
projects don't have access to the same kind of - Chromium has access to a
|
||||
translation team within Google who do a lot of our translations. A lot of
|
||||
open-source projects don't have that. And so contributing translations of
|
||||
documentation, of user-facing interface, stuff like that, can be super
|
||||
valuable. And the last thing I'll say, which can be done by really anyone - you
|
||||
don't even need special skills for this one - is try early releases of stuff.
|
||||
So try development branches. If you're a Chrome user, try running Beta or Dev
|
||||
or Canary. And then when something doesn't feel right or when it's - when it
|
||||
doesn't work for you or it crashes or whatever, file bugs. And try to get
|
||||
practiced at filing good bugs, with details and info and steps to reproduce the
|
||||
bug and stuff like that. That's such a huge help as a developer of any
|
||||
open-source projects - to get that early-user feedback and be able to correct
|
||||
problems before they make it to the stable channel. And on Chromium, I've run
|
||||
into a few folks who just - their main contribution to the project is really
|
||||
just that they file great bugs all the time. There's a few folks who all they
|
||||
really do is they run Canary on Mac, and they notice when something doesn't
|
||||
feel quite right. And so they file stuff that's like, maybe the engineering
|
||||
team wouldn't necessarily have noticed it. But when someone calls it out, we're
|
||||
like, oh, that actually does feel kind of janky, and now we can go fix that.
|
||||
And getting that feedback early is so, so valuable. So there's a lot of
|
||||
different ways. Those are some, but there's plenty more, too.
|
||||
|
||||
48:21 SHARON: OK. Cool. Yeah, and a few things on that. If you want to really
|
||||
try out random things, you can go to chrome://flags, play around there, see
|
||||
what happens. In terms of going back a bit for being an engineer, there's other
|
||||
web-adjacent stuff that you can do that we won't get into too much now. But
|
||||
that can be things like adding web platform test, web standard stuff. And for
|
||||
people who are into security, we have a VRP, Vulnerability Rewards Program. But
|
||||
if you know about that, probably you're into the whole security space. This is
|
||||
not how you're going to - maybe this is how you heard about it, and you want to
|
||||
get into it. But, anyway -
|
||||
|
||||
48:59 ELLY: Yeah. I will say, if you're a security researcher and you aren't
|
||||
familiar with the Chromium VRP, you should go take a look because it's -
|
||||
Chromium is a really interesting project to audit for security. And the VRP can
|
||||
make it very worth your while to do so if you find good bugs.
|
||||
|
||||
49:12 SHARON: Mm-hmm. Yeah. And going back a bit earlier to being an engineer,
|
||||
like an IC, who is not at Google or any of these other big companies, there are
|
||||
other barriers to entry to being a contributor, right?
|
||||
|
||||
49:28 ELLY: Oh, yeah.
|
||||
|
||||
49:28 SHARON: So I definitely encountered this after my internship. I worked on
|
||||
Chrome. I was like, hey, I know what's going on now at the end of it. A couple
|
||||
things we didn't finish. I'll go home, and I will keep working on this - good
|
||||
intentions. And I got home, got my laptop, which was a pretty good laptop, but
|
||||
still a laptop. I downloaded Chrome. That took a very long time. I built it for
|
||||
the first time, which always takes a bit longer. But that took so long. And
|
||||
even the incremental builds just took so long that I was like, OK, this is not
|
||||
happening. I'm in school right now. I've got other things to worry about. So
|
||||
how feasible is it for a typical person, let's say, to actually make changes in
|
||||
Chromium?
|
||||
|
||||
50:05 ELLY: Yeah, that is unfortunately probably the biggest barrier to entry
|
||||
for individuals who want to make technical contributions. Obviously, it doesn't
|
||||
affect you if you're contributing documentation translations, whatever. But if
|
||||
you're trying to modify the code, yeah, the initial build is going to be very
|
||||
slow, and then the incremental builds are going to be very slow. And a lot of
|
||||
the ancillary tasks are slow too, like running the test suite or running stuff
|
||||
in a debugger. The project is just very big. And that's something that I think
|
||||
a lot of folks on the Chromium team wish we could reduce. But Chromium is big
|
||||
because the web is big and because what people want it to do is big. And so
|
||||
it's not just big for no reason. But it does make it harder to get started as a
|
||||
contributor. I've had this experience, too. I have a modern laptop sitting on
|
||||
the desk over there. And it takes seven to eight hours to do a clean Chromium
|
||||
build on that. Whereas on my work workstation, which has access to Goma,
|
||||
Google's compile farm, it takes a few minutes. And the large organizations that
|
||||
contribute also all have compile farms for the same reason. It's just so slow
|
||||
to work when you're only doing local building and don't have access to a ton of
|
||||
compilation power.
|
||||
|
||||
51:12 SHARON: Mm-hmm. Yeah. I wonder if we could, I don't know, do a thing for
|
||||
people who are individuals who contribute more. Probably that would be really
|
||||
hard to do. Probably people have thought about it. But, yeah.
|
||||
|
||||
51:24 ELLY: It would be nice if we could. I don't know what the challenges
|
||||
would be offhand, but it would be very cool if we could somehow make that
|
||||
available.
|
||||
|
||||
51:30 SHARON: All right. That all sounds very cool. I know I learned a lot.
|
||||
Hopefully some of you learned a lot, too. I think if you are working within
|
||||
Google, it's really easy to not really interact with any of this more
|
||||
open-source stuff, depending on which part you work on. Maybe you work on a
|
||||
part that's very Google Chrome specific. I know before I was working on
|
||||
Fuchsia, so that was before Launch. So that was not really something we were
|
||||
open to the public about anyway. And a lot of even the typical Chrome tools I
|
||||
was unfamiliar with. So I think depending on which part you work on, this
|
||||
stuff - it's all there, but you might not have had a chance to interact with.
|
||||
So thank you, Elly, for telling us about it and giving us some context about
|
||||
free and open-source software in general.
|
||||
|
||||
52:08 ELLY: Yeah, of course.
|
||||
|
||||
52:08 SHARON: Is there anything you would like to give a shout out? Normally,
|
||||
we shout out a specific Slack channel. I think in this case, the Slack in
|
||||
general is the shout out. Anything else?
|
||||
|
||||
52:20 ELLY: The Slack, in general, definitely deserves it. Honestly, I'm going
|
||||
to go a little bit larger scale here. I'm going to shout out all of the folks
|
||||
who have contributed to Chromium, both at Google and elsewhere. It is the work
|
||||
of many hands. And it would not be what it is without the contributions from
|
||||
the folks at Google, the folks at Microsoft, folks at Yandex, folks at Naver.
|
||||
All of these different browsers and projects and all of the different
|
||||
individuals that have contributed, like everyone in the AUTHORS file - so shout
|
||||
out to all of those folks. And also, I really want to shout out the open-source
|
||||
projects not even part of Chromium that we use and rely on every day. So for
|
||||
example, we use LLVM, which is a separate open-source project for our
|
||||
compilation toolchain. And I think I would not be exaggerating to say that
|
||||
Chromium couldn't exist in its current form without the efforts of a bunch of
|
||||
other open-source projects that we're making use of. And so I'm really hopeful
|
||||
and optimistic that Chromium can live up to that. We're standing on the
|
||||
shoulders of a lot of other open-source projects to build the thing that we've
|
||||
built. And I'm hopeful that, in turn, other projects are going to stand on our
|
||||
shoulders to build yet cooler stuff and yet - yet better programs and build a
|
||||
yet better open-source community. So shout out to all of the authors of all the
|
||||
open-source software that Chromium uses, which is a lot of people. But they
|
||||
deserve it.
|
||||
|
||||
53:37 SHARON: Yeah, for sure. It's very cool how it's very - all very related.
|
||||
And even within Chrome, I think people stick around longer than typical other
|
||||
projects. And it's cool to see people around, like a decent number of them,
|
||||
from before Chrome launched. And that's probably [INAUDIBLE] to a generally
|
||||
more positive engineering culture. So that's very good.
|
||||
|
||||
53:58 ELLY: I think so. But I'm biased, of course.
|
||||
|
||||
53:58 SHARON: Yeah, maybe. [LAUGHS] Cool. You mentioned mailing lists a bunch.
|
||||
Any favorites that you have?
|
||||
|
||||
54:08 ELLY: Oh, yeah. chromium-dev is the mailing list of my heart, I would
|
||||
say. It's the main open-source development mailing list for us. It's a great
|
||||
place for all of your newbie questions. If you're just like, how the heck do I
|
||||
even check out the source, that's a good place to ask. The topic-specific
|
||||
mailing lists, especially net-dev and security-dev, are really good if you have
|
||||
questions in those specific areas. But honestly, all of the mailing lists on
|
||||
chromium.org are good. I haven't yet encountered one where I'm like, that
|
||||
mailing list is bad. So check them all out.
|
||||
|
||||
54:33 SHARON: Cool. All right. Check out every single mailing list. Sounds
|
||||
good.
|
||||
|
||||
54:38 ELLY: Yeah, every mailing list, every Slack channel.
|
||||
|
||||
54:38 SHARON: All right. Great.
|
||||
|
||||
54:38 ELLY: You're all good.
|
||||
|
||||
54:38 SHARON: Every Slack channel, I think - yeah, I'll add myself to the rest
|
||||
of them. All right. Well, thank you very much, Elly.
|
||||
|
||||
54:45 ELLY: Of course.
|
||||
|
||||
54:45 SHARON: Thank you for chatting with us. And see you all next time.
|
||||
|
||||
54:51 ELLY: All right. Thank you, Sharon. Easter egg - in the second part of
|
||||
this video, Elly is drinking soda.
|
1057
docs/transcripts/wuwt-e07-mojo.md
Normal file
1057
docs/transcripts/wuwt-e07-mojo.md
Normal file
File diff suppressed because it is too large
Load Diff
1091
docs/transcripts/wuwt-e08-processes.md
Normal file
1091
docs/transcripts/wuwt-e08-processes.md
Normal file
File diff suppressed because it is too large
Load Diff
691
docs/transcripts/wuwt-e09-site-isolation.md
Normal file
691
docs/transcripts/wuwt-e09-site-isolation.md
Normal file
@ -0,0 +1,691 @@
|
||||
# What’s Up With Site Isolation
|
||||
|
||||
This is a transcript of [What's Up With
|
||||
That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)
|
||||
Episode 9, a 2023 video discussion between [Sharon (yangsharon@chromium.org)
|
||||
and Charlie (creis@chromium.org)](https://www.youtube.com/watch?v=zOr64ee7FV4).
|
||||
|
||||
The transcript was automatically generated by speech-to-text software. It may
|
||||
contain minor errors.
|
||||
|
||||
---
|
||||
|
||||
Site Isolation is a major part of Chrome's security. What exactly is it? How
|
||||
does it fit into navigation? What about security? Today’s special guest telling
|
||||
us all about it is Charlie, who made it happen. He's also worked all over
|
||||
navigation, making sure it works with all its complexities and remains secure.
|
||||
|
||||
Notes:
|
||||
- https://docs.google.com/document/d/19LTLcwd2_JfiIklPXY0yu0ktpy-p8za2ZZXXzqBBVIY/edit
|
||||
|
||||
Links:
|
||||
- [What's Up With Processes](https://www.youtube.com/watch?v=Qfy6T6KIWkI)
|
||||
- [Life of a Navigation](https://www.youtube.com/watch?v=OFIvyc1y1ws)
|
||||
|
||||
---
|
||||
|
||||
0:00 SHARON: Hello, and welcome to "What's Up With That?" the series that
|
||||
demystifies all things Chrome. I'm your host, Sharon, and today we're talking
|
||||
about site isolation, what exactly is it? How does it fit into navigation? What
|
||||
about security? Today's special guest telling us all about it is Charlie. He
|
||||
helped make site isolation happen. He's worked on Chrome since before the
|
||||
launch, though as an intern, and since then, he has worked all over navigation
|
||||
including things like the process model, site isolation, and just making sure
|
||||
that changes to that are all secure and that things still work. So welcome,
|
||||
Charlie.
|
||||
|
||||
0:30 CHARLIE: Thank you for having me.
|
||||
|
||||
0:30 SHARON: OK, let's start off with what is site isolation?
|
||||
|
||||
0:36 CHARLIE: So site isolation is a way to use Chrome's sandbox to try to
|
||||
protect websites from each other. So it's a way to improve the browser security
|
||||
model.
|
||||
|
||||
0:43 SHARON: OK, we like security. And can you tell us a bit about what a
|
||||
sandbox is?
|
||||
|
||||
0:50 CHARLIE: Yeah. So sandbox is a mechanism that tries to keep web pages
|
||||
contained within the renderer process even if something goes wrong. So if they
|
||||
find a bug to exploit, it should still be hard for them to get out and install
|
||||
malware on your computer or do things outside the renderer process.
|
||||
|
||||
1:05 SHARON: OK. Last video, we talked all about the different types of
|
||||
processes and what they all do. So why are we particularly concerned about
|
||||
renderer processes in this case?
|
||||
|
||||
1:17 CHARLIE: Sure. So renderer processes really have the most attacked
|
||||
surface. So browser's job is to go out and get web pages from websites you
|
||||
don't necessarily trust, pull down code, and run that on your machine. And most
|
||||
of that code is running within this sandbox renderer process. So an attacker
|
||||
may be able to run code in there and try and find bugs to exploit. The renderer
|
||||
process is where most of those bugs are going to be. It's where the attacker
|
||||
has the most options and direct control. So we want that to be locked down as
|
||||
much as possible.
|
||||
|
||||
1:55 SHARON: OK. Right. So how exactly does this work? How am I getting
|
||||
attacked?
|
||||
|
||||
2:02 CHARLIE: Right. So all software tends to have bugs, and an attacker will
|
||||
try to find ways to exercise those bugs in the code to let them accomplish
|
||||
their goals. So maybe they find that there's some parsing error, and so the
|
||||
code in the web browser does the wrong thing when you give it some input. And
|
||||
for an attacker on the web, that input could be something in HTML or JavaScript
|
||||
that makes the browser do something wrong, and maybe they can use that to their
|
||||
advantage.
|
||||
|
||||
2:36 SHARON: So say I do get attacked. What's the worst that can happen? Should
|
||||
I really be concerned about this?
|
||||
|
||||
2:42 CHARLIE: Well, that's exactly what we think about in the browser security
|
||||
model is, what's the worst that can happen? How can we make that not be as bad
|
||||
as it could be? So in the old days when browsers were first introduced, it was
|
||||
basically just a program, it's all one process. And it would fetch content from
|
||||
the web, and so if something went wrong, there was no sandbox. There was no
|
||||
other protection. You were just relying on there not being bugs in the browser.
|
||||
But if something did go wrong, that web page could then install malware in your
|
||||
computer and your whole machine would be compromised. And so that might give
|
||||
them access to files on your disk or other things that you have access to on
|
||||
the network like your bank account or so on, which, obviously, is a big deal.
|
||||
|
||||
3:28 SHARON: Right. Yeah, it would like to not have other people have that. OK,
|
||||
cool. So can you tell us a bit about how site isolation actually works? What is
|
||||
the mechanism behind it? What is going on?
|
||||
|
||||
3:41 CHARLIE: Sure. So when Chrome launched, we were using the sandbox to try
|
||||
and prevent that first type of attack of installing malware in your machine or
|
||||
having access to the file system or to network, but we wanted it to do more to
|
||||
protect websites from each other. And to do that, you have to treat each
|
||||
renderer process like it can only load pages from one website. And if you go to
|
||||
visit a different website, that should be in a different process. And so
|
||||
there's a bunch of aspects of site isolation for, well, OK, as you go from one
|
||||
website to another, we need to use a different process, but the big one that
|
||||
made this such a large change to the browser was making cross-site iframes run
|
||||
in a different process.
|
||||
|
||||
4:30 SHARON: What is an iframe?
|
||||
|
||||
4:30 CHARLIE: So an iframe is basically a web page embedded inside of another
|
||||
web page. So you can think about this as an ad or a YouTube video. It might be
|
||||
from a different origin from the top level page that you're viewing, but it's
|
||||
another web page embedded inside it. And so that has a different security
|
||||
context that it's running on.
|
||||
|
||||
4:54 SHARON: You mentioned it might be from a different origin, and it might be
|
||||
useful to know what the difference between a site and an origin is, especially
|
||||
as it relates to what we call site isolation.
|
||||
|
||||
5:00 CHARLIE: Yeah, so we're being specific in using the word site isolation
|
||||
instead of origin isolation. A site is a little broader, so it's a registered
|
||||
domain name plus a scheme, so https://example.com would be an example of a
|
||||
site, but you might have many origins within that as you get into subdomains.
|
||||
So if you had foo.example.com and bar.example.com, those would be different
|
||||
origins within the example.com site. Web security models all about origins.
|
||||
Those foo.example.com and bar.example.com shouldn't be able to access each
|
||||
other, but there are some old web APIs that have stuck with us like being able
|
||||
to modify something called document.domain, where two different origins in the
|
||||
same site can sometimes access and modify each other, and we don't know in
|
||||
advance if they're going to do this. So therefore, we have to put everything
|
||||
from a site in the same process because we can't move things from one process
|
||||
to another later. We hope that someday we can get rid of that. There is some
|
||||
work in progress for that to go away. Maybe we can do origins.
|
||||
|
||||
6:10 SHARON: Cool. So the site isolation stuff is all in the browser, so that's
|
||||
the browser security model. What's the difference between that and the web
|
||||
security model? Are these the same?
|
||||
|
||||
6:16 CHARLIE: They're certainly related to each other, but they're a little
|
||||
different. So the web security model is conceptually what can web pages do, in
|
||||
general, what are they allow to access for another website or for another
|
||||
origin or for things on your machine, camera, and microphone, and things like
|
||||
that. And the browser security model is more about how we build that and how do
|
||||
we enforce the web security model, but also, provide some extra lines of
|
||||
defense in case things go wrong. So that incorporates things like the sandbox,
|
||||
the multi-process architecture, site isolation. What can we do to make it
|
||||
harder for attackers to accomplish their goals, even if there are bugs.
|
||||
|
||||
7:04 SHARON: It seems like good stuff to have. So a couple other, maybe
|
||||
definitions to get through. So what is a security context?
|
||||
|
||||
7:10 CHARLIE: Yeah. So that's the environment where this code is running. In
|
||||
the web, it's something like an HTML document or a worker, like a service
|
||||
worker, someplace where code is running from what we would call security
|
||||
principal, which is, for the web, something like an origin. So if you have an
|
||||
HTML document you've gotten from example.com, that's running in a web page in
|
||||
the browser that has a security context. And an ad from a different origin
|
||||
would be a different security context.
|
||||
|
||||
7:49 SHARON: And a security context and security principal always the same, or
|
||||
are there times where those are different?
|
||||
|
||||
7:55 CHARLIE: No, you can have two different security contexts, like two
|
||||
different documents that had the same security principal, and they might be
|
||||
able to access each other. Or they might be living in different processes, but
|
||||
still have access to the same cookies or local storage, things on disk. So the
|
||||
principal is, this is the entity that has access to something.
|
||||
|
||||
8:16 SHARON: When people think of site isolation, often, they think about
|
||||
navigation as well, partly because that's how our teams are structured, so how
|
||||
exactly do these relate, and where in the life of a navigation - name of a
|
||||
talk, want to go watch - does site isolation stuff happen?
|
||||
|
||||
8:34 CHARLIE: Yeah, so they're definitely related. So navigation is about how
|
||||
you get from one web page to another, and that might be a different security
|
||||
context, different security principal. And I got interested and involved with
|
||||
navigation because of site isolation, my interest in that. And as you think of
|
||||
the web browser as an operating system for running programs, it's how you're
|
||||
getting from one program to another. So it would make sense that as you go from
|
||||
one website to another, you get a new container for that, a new process. So
|
||||
that was one part of how I got involved with navigation was building what we
|
||||
call a cross-process navigation. So you have to start in one renderer process
|
||||
and then be able to end up in a different renderer process with all the various
|
||||
parts of the life of a navigation, where you go out to the network and ask for
|
||||
the web page. And maybe you have to run some - before, unload events first to
|
||||
see if you were actually allowed to leave, or maybe the user has some unsaved
|
||||
data. All the timing of that is tricky, and then switch to the new process at
|
||||
the right time. So navigation has a lot of different corner cases and
|
||||
complexity that then get involved with the process model so that you can do
|
||||
this in any type of navigation, in any frame. And so that's where our team ends
|
||||
up involved in both site installation work and the navigation code and the
|
||||
browser.
|
||||
|
||||
10:06 SHARON: Right. What a cool team. So you mentioned the process model, and
|
||||
that is related, but not the same as the multi-process architecture. So let's
|
||||
just quickly mention what the differences there are, because in this case, it
|
||||
is important.
|
||||
|
||||
10:22 CHARLIE: Yes. So the process model for the browser is how we decide what
|
||||
goes into each process, and specifically, we're talking about renderer
|
||||
processes and web pages here, where we can decide, as we create new tabs and we
|
||||
visit websites on those tabs which renderer processes are we going to use. So
|
||||
without site isolation, maybe it's that each newly created tab gets its own
|
||||
process. But anything you visit within a given tab stays in the same process.
|
||||
Or maybe you can do some cross-process transitions within that tab as long as
|
||||
you're not breaking scripting between existing pages. So site isolation defines
|
||||
a process model that says you can never put web pages from two different
|
||||
websites in the same renderer process, and then that provides a bunch of
|
||||
constraints for how navigation works.
|
||||
|
||||
11:16 SHARON: And then the multi-process architecture is more just the fact
|
||||
that we have these different processes.
|
||||
|
||||
11:22 CHARLIE: Right. It makes this possible, because it gives us this ability
|
||||
to run browser code and renderer code separately and plug-in code and other
|
||||
utilities and network service that - yeah.
|
||||
|
||||
11:27 SHARON: Yeah, because back in the day, that wasn't the case. That's what
|
||||
made Chrome different.
|
||||
|
||||
11:34 CHARLIE: Right. So when Chrome launched, we were moving from this more
|
||||
monolithic browser architecture that was common at the time, where everything
|
||||
ran in one process to separate browser process, renderer process that was
|
||||
sandbox, and we could play around with different process models. So when Chrome
|
||||
launched, part of the internship that I was doing was looking at what should go
|
||||
in each renderer process? What process model should we use? And we thought site
|
||||
isolation would be great, but you can't really do that yet. It's too
|
||||
complicated to get the iframe things to work. So maybe we can do a hybrid where
|
||||
sometimes we swap to a new renderer process as you go from one website to
|
||||
another at the top level, but then other times, you'll end up with multiple
|
||||
sites in the same process. And it was like that until we were able to ship site
|
||||
isolation much later.
|
||||
|
||||
12:23 SHARON: Cool. So this sounds, conceptually, like it makes sense. You want
|
||||
to have different sites/different origins in different renderer processes, and
|
||||
it sounds like it shouldn't be that hard, but it is/was/still is very hard. So
|
||||
can you briefly just tell us about how and why navigation is hard? Because
|
||||
other people who don't work on browsers at all or tech or even people in
|
||||
Chrome, I feel like, they're just like, isn't navigation just done? This just
|
||||
works, right? So why is there still a team doing this, and what is so hard
|
||||
about it?
|
||||
|
||||
12:59 CHARLIE: That was often the most common question we would get when we
|
||||
were explaining what work we were doing on site isolation was, oh, doesn't it
|
||||
already work that way? And it's like, yeah, I wish. Yeah, so there's two parts
|
||||
of that. There is, why is navigation hard, and why is site isolation hard? So
|
||||
tying into any kind of navigation thing is tricky because of how many different
|
||||
types of navigation and corner cases there are. As you're going from one page
|
||||
to another, is it redirecting to a different website, or does it end up not
|
||||
actually giving you a web page back? Maybe it's a download. Is it not moving to
|
||||
a new document at all and it's just a navigation within the same document,
|
||||
which has different properties. There's a lot of things that we need to keep
|
||||
track of in the navigation system and how it affects the back-forward history
|
||||
that makes it tricky. And then it continues to get more complicated over time,
|
||||
as we add new fancy features to the browser. So there's lots of things that
|
||||
we've layered on top of that with back-forward cache and pre-rendering and new
|
||||
navigation APIs for interacting with session history, which make things faster
|
||||
and nicer for web developers, but also, provide even more ways that navigation
|
||||
can get into interesting corner cases, like why didn't we think that about
|
||||
pre-rendering a page with a sandbox iframe that might cause a different path to
|
||||
happen? So that's where a lot of the complexity in navigation comes from and
|
||||
why there's ongoing challenges, even though it's something that seems like it
|
||||
has worked from the beginning. Site isolation being hard is related to the fact
|
||||
that you can navigate in any frame in a page, and iframes being embedded is
|
||||
something that we used to just handle entirely within the renderer process. So
|
||||
this is a fun way to think about the multi-process architectures that shipped
|
||||
around when Chrome was launched and then other browsers that did similar things
|
||||
was we could take the rendering engines that had existed already for a decade
|
||||
or so from existing browsers and just run multiple copies of them. So as you
|
||||
open up a new tab, we've got another copy of WebKit, which is the rendering
|
||||
engine we were using at the time, and we had to make changes to make it work in
|
||||
the renderer process talking to the browser process, but we didn't really need
|
||||
to change fundamentally how it rendered a web page. And so it was in charge of
|
||||
deciding what network requests it was going to make for getting iframe content
|
||||
and then rendering the iframe and where a click was going to go, that kind of
|
||||
thing. And to do out-of-process iframes, you need the iframe inside the page to
|
||||
be rendered in an entirely separate renderer process. And that is a big change
|
||||
to how the rendering engine works. And so that was what took all the time and
|
||||
what made site isolation a multi-year project, where we had to fundamentally
|
||||
introduce these new data structures, like render frame host and representations
|
||||
of each frame in the browser process, change how the rendering engine worked,
|
||||
and then change all the features in the browser that assumed the renderer would
|
||||
take care of this. And now, we need to handle them spread across multiple
|
||||
processes.
|
||||
|
||||
16:28 SHARON: How did that fit in with the forking of WebKit into Blink, which
|
||||
is what the rendering engine in Chrome is now?
|
||||
|
||||
16:34 CHARLIE: Yeah, so the fork was absolutely necessary to do this. We pretty
|
||||
much had to wait until that happened, because we didn't have as much
|
||||
flexibility to make large, architectural changes to WebKit as we were sharing
|
||||
it with other browsers, like Safari and so on. We were looking into ways that
|
||||
we might be able to of approximate what we want, but as the decision to fork
|
||||
WebKit into Blink was made, it opened the door and gave us a chance to say, we
|
||||
can do this now. Let's go ahead and dive in and make site isolation happen.
|
||||
|
||||
17:14 SHARON: That makes sense. In a quite early talk, it was probably from 10
|
||||
years ago now, Darin gave a talk, and he was saying how having per site, having
|
||||
each renderer have just one site in it was like the Holy Grail, and he seemed
|
||||
very excited about it. So that makes sense because of the -
|
||||
|
||||
17:34 CHARLIE: Yeah, and it feels like the natural use of a sandbox in a
|
||||
browser. The same reason that we got all these questions, like isn't that how
|
||||
it already works? Is that it's such a natural fit for we have a container for
|
||||
running a web page, what is this unit that you want to put in the container?
|
||||
It's a website that you're visiting. And the fact that we couldn't easily pull
|
||||
them apart into different processes was totally an artifact of how web browsers
|
||||
were originally built that didn't foresee this - oh, they're being used as
|
||||
complicated programs with different security principles.
|
||||
|
||||
18:13 SHARON: Yeah, in a different talk, John from Episode 3 content had
|
||||
mentioned that site isolation was basically the biggest change to Chrome since
|
||||
it launched and probably is still the case. So yeah, it was a project.
|
||||
|
||||
18:29 CHARLIE: Yeah, it was a long project, and we had a lot of help from many
|
||||
people across the Chrome team, but it was cool to get to this outcome, where we
|
||||
could then say, now we have processes that are locked to a single security
|
||||
principal, so it's nice to get to that outcome.
|
||||
|
||||
18:47 SHARON: So for people on the Chrome team now, what do you wish they knew
|
||||
about site isolation/navigation in terms of as an engineer? Because before, I
|
||||
was on a different team, and someone on my team said, oh, you should know how
|
||||
navigation works. And I said, yeah, that sounds like a great idea, but how? So
|
||||
what are things that people should just keep in mind when they're out and about
|
||||
doing their stuff that usually isn't directly interacting with navigation even?
|
||||
|
||||
19:14 CHARLIE: Right. Yeah, so I think that the biggest thing to keep in mind
|
||||
is to limit what we put into a renderer process or what a renderer process has
|
||||
access to, to not include cross-site data. And we already have to have this
|
||||
mentality in Chrome that we don't trust the renderer process. If it sends an
|
||||
IPC or Mojo call to the browser process, we should assume that it might be
|
||||
lying or asking for things that it shouldn't have access to. And I think it's
|
||||
in the back of a lot of people's heads already that, OK, I shouldn't let it
|
||||
like go get a file from disk, but also, we don't want it to mix data from
|
||||
different sites. It shouldn't be able to ask for something from - to lie and
|
||||
say, oh, I'm origin x, please give me data from there. Because that's often how
|
||||
APIs used to work in Chrome was, the renderer process would say what origin
|
||||
it's asking for, and please give me the cookie for that.
|
||||
|
||||
20:12 SHARON: That sounds bananas.
|
||||
|
||||
20:12 CHARLIE: Yeah. Now, it sounds crazy. And so we think that the browser
|
||||
process should already know based on who's asking what they have access to. So
|
||||
that's really the thing that, in order to avoid site isolation bypasses, that's
|
||||
what developers should keep in mind. So for features like Autofill or something
|
||||
where it's easy to think, oh it would be nice for me to just have that data on
|
||||
hand in the renderer process and I can just put it in when it's needed. No, you
|
||||
should keep it out of the renderer, and then only provide the data that's
|
||||
needed.
|
||||
|
||||
20:51 SHARON: In security-discuss circles, another term you hear often is a
|
||||
renderer escape or renderer bypass or whatever. Is that the same as a site
|
||||
isolation bypass, or are those different?
|
||||
|
||||
21:00 CHARLIE: Yeah, so sandbox escape is a common term that is used for when
|
||||
an attacker has found some bug already, and then they are able to escalate
|
||||
their privilege to affect the browser process or get out of the browser process
|
||||
and to the operating system. So a sandbox escape is a lot worse than a site
|
||||
isolation bypass. It would give the attacker control of your computer and
|
||||
installing malware and things. So sandbox escapes, we want to have as many
|
||||
boundaries as possible to try to prevent that from happening. A site isolation
|
||||
bypass is not as bad as a full sandbox escape, but it would be a way that an
|
||||
attacker could find some way to get access to another website's data or attack
|
||||
that website. So maybe it's able to trick the browser into giving it cookies
|
||||
from that site or using the permissions that have been granted to another
|
||||
website. And then renderer compromise would be another type of exploit that
|
||||
happens entirely within the renderer process. That's one where the attacker has
|
||||
found some bug, they can run whatever native code they want within the renderer
|
||||
process, and that's what we're trying to contain with the sandbox and what site
|
||||
isolation tries to make even less useful to the attacker. Because even if you
|
||||
can run any code you want within the renderer process, you shouldn't be able to
|
||||
install malware because of the sandbox, and you shouldn't be able to access
|
||||
other site's data because of site isolation
|
||||
|
||||
22:47 SHARON: Yeah, I think when I was learning about site isolation and stuff,
|
||||
I was like, whoa, this is a lot going on, and most people just have no idea
|
||||
about it. And in terms of how other bugs and whatnot, something that is often
|
||||
mentioned is Spectre and that still affect thing. And the only mention, on
|
||||
Wikipedia in the Mitigation section of Spectre, they mentioned site isolation,
|
||||
but I was like, this should have its own page, so maybe one day -
|
||||
|
||||
23:20 CHARLIE: Maybe one day.
|
||||
|
||||
23:20 SHARON: one of us is going to write a thing about that. But yeah, that's
|
||||
kind of the bug, right? So can you just talk about that?
|
||||
|
||||
23:25 CHARLIE: Yeah, so Spectre and Meltdown were certainly a big change to the
|
||||
security landscape for browsers. At a high level, those are attacks that are
|
||||
based on the micro-architectural parts of the CPU. The way that the basic CPU
|
||||
hardware works, there are ways to leak data that weren't anticipated. And we
|
||||
can view it as it gives attackers what we call an arbitrary read primitive,
|
||||
something that can access anything in your address space in a process. You can
|
||||
think about it as the CPU wants to not stop and wait for going and accessing
|
||||
data from RAM, so it thinks, well, I'll just guess what the answer is going to
|
||||
be and then keep running some instructions. And if I was right in my guess, the
|
||||
next several steps are done already, and I can just move on from there. And if
|
||||
I was wrong, well, I just throw away that work, and I do the right thing, and
|
||||
we move on, and everybody is fine. But attackers found that while you're doing
|
||||
those extra steps ahead of time, you're also affecting the caches on the CPU,
|
||||
and cache timing attacks let you find out what work was done there. So some
|
||||
very clever researchers found that you can do some things in those extra steps
|
||||
that happen in this speculative state to find out what data is in addresses you
|
||||
don't have access to. And so places where we thought some check in the renderer
|
||||
process could say, oh, you don't have access to this thing from another
|
||||
website. We're fine. Now, you could get access to it, just based on how CPUs
|
||||
work, without needing any bugs in the browser. So now, we're thinking, OK,
|
||||
we're running JavaScript, and if it can leak things from the renderer process,
|
||||
we can't have data we're stealing in the renderer process. You could try to
|
||||
find ways to prevent those attacks, but those ended up being difficult. And
|
||||
ultimately, we found that it wasn't really feasible to prevent the attacks in
|
||||
all the forms that they could happen. So site isolation became the first line
|
||||
of defense to say, data from other websites, data we're stealing should not be
|
||||
in the render process where a Spectre attack could get access to it. Now, that
|
||||
was actually one of the big, exciting events that helped us accelerate the work
|
||||
on site isolation and get it launched when that was discovered in 2017 or 2018.
|
||||
|
||||
26:24 SHARON: So at that point, site isolation was mostly done, and it was just
|
||||
getting it out?
|
||||
|
||||
26:24 CHARLIE: Yeah, it was really interesting. So we'd been working on it for
|
||||
several years for a different reason for the fact that we wanted it to be a
|
||||
second line of defense against compromised rendering processes. We assume
|
||||
people are going to find bugs in the renderer process, in V8 or in Blink or
|
||||
things like that, and we wanted that to not be as big of a problem. We wanted
|
||||
to say, OK, whatever. There isn't data we're stealing in that process. We had
|
||||
already shipped some initial uses of out-of-process iframes in 2017 for
|
||||
extensions, and we were working on trying to do some sort of initial steps
|
||||
towards using site isolation for some websites and see how that goes when we
|
||||
found out about Spectre and Meltdown. And so that next six months or so was a
|
||||
very accelerated, OK, we've got to get everything else working with the way
|
||||
that site isolation interacted with DevTools and extensions and printing and a
|
||||
bunch of other features in the browser that we needed to get working. And so it
|
||||
was an interesting accelerated rollout, where we even had an optional mode and
|
||||
an enterprise policy where you could say, I don't care if printing doesn't
|
||||
work, turn on site isolation so that Spectre attacks won't find other data
|
||||
we're stealing in the process. And then we got to where it was working well
|
||||
enough we could ship it for all desktop users in, I think it was Chrome 67 in
|
||||
mid 2018. So it was good that far along that we were able to ship the full
|
||||
thing within a few months.
|
||||
|
||||
28:19 SHARON: Very cool. Yeah, I mean, those are all the things that make
|
||||
navigation hard, like extensions as part of it, and there's just all these
|
||||
things and all of these go-through navigation and effective, so that's very
|
||||
exciting. So what is the state of site isolation now, and are there still going
|
||||
to be changes? That was a few years ago, so are things still happening?
|
||||
|
||||
28:45 CHARLIE: Yeah, we're still trying to make several different improvements.
|
||||
We've made several improvements since the launch, so that initial launch, since
|
||||
it was mostly focused on Spectre, didn't have all the defenses we wanted
|
||||
against compromise renderer processes, because the Spectre attack can't affect
|
||||
actual running code. It can't go and lie to the browser process. It won't give
|
||||
you full control over what's running in the renderer process, but it can leak
|
||||
all the data that's in there. So anything that a web page can pull into a
|
||||
renderer process can be leaked. So after that initial launch, we needed to go
|
||||
and actually finish the compromise renderer defenses and say, OK, all the IPCs
|
||||
that come out of the renderer, make sure they can't lie and steal someone
|
||||
else's data, so get all the browser process enforcements in place. Another big
|
||||
thing after that was getting it to work on Android, where we wanted this
|
||||
defense. We have a much different set of resource constraints on mobile
|
||||
devices, where there's not nearly as much memory and renderer processes are
|
||||
often killed or just discarded. So there, we couldn't isolate all websites from
|
||||
each other. We had to use heuristics to say, here are the sites that need it
|
||||
the most, so sites where users log in, in general, or sites where this
|
||||
particular user is logged in or other signals that this site probably needs
|
||||
some protection, we'll give those isolation, and then other ones can share a
|
||||
renderer process. So we've tried to improve those heuristics and isolate as
|
||||
many sites as we can there. And then things that we weren't initially isolating
|
||||
from each other, we have been able to. So extensions was an example where we
|
||||
started by just making sure extensions didn't share a process with web pages,
|
||||
but now, we make sure that no extensions can share a process with each other.
|
||||
And we're trying to get to where we could isolate all origins from each other,
|
||||
depending on what resources are available, but there's some changes with,
|
||||
basically, deprecating document.domain that are in flight that might make that
|
||||
possible.
|
||||
|
||||
30:57 SHARON: So say I have a fancy computer, and I just want maximum site
|
||||
isolation because I care about security. How do I go get that?
|
||||
|
||||
31:03 CHARLIE: Yeah, so there are some experimental ways to do that. You can go
|
||||
into the chrome://flags page, where you can turn on and off different features
|
||||
and experiments that are in progress. And there's one there called strict
|
||||
origin isolation, which will ensure that all origins within various sites are
|
||||
isolated from each other, and that works on desktop and Android. It'll just
|
||||
create slightly more processes than we do today. Similarly, on Android, if you
|
||||
wanted to isolate all sites, there is an option for full site isolation there
|
||||
called site-per-process, which you could use that or strict origin isolation to
|
||||
get maximum site isolation today.
|
||||
|
||||
31:51 SHARON: So another platform that Chrome does exist on is iOS. So can we
|
||||
do anything there? Why is that not in [INAUDIBLE]
|
||||
|
||||
31:58 CHARLIE: So Chrome for iOS has to use Apple's WebKit rendering engine
|
||||
today, and current versions doesn't have site isolation, and we don't have the
|
||||
ability to run our own rendering engine that has support for it. So we don't
|
||||
have it today, but my understanding is that WebKit is working on site isolation
|
||||
as well, and actually, Firefox has also shipped their version of site
|
||||
isolation, which is pretty cool to see other browser vendors building this as
|
||||
well. And so if that were made available to other third-party browsers on iOS,
|
||||
then maybe it could be used there. But at the moment, we're constrained, and we
|
||||
can't ship it on that platform.
|
||||
|
||||
32:47 SHARON: In terms of how the internet happens, this seems like a good
|
||||
thing to just have generally. So is it possible that this could be a spec one
|
||||
day that any browser should implement, or is it - because it's under the hood
|
||||
and it's not something that's maybe necessarily visible to websites, maybe
|
||||
that's not part of it, but is this an option?
|
||||
|
||||
33:04 CHARLIE: Yeah. I think it ties back to the earlier question about web
|
||||
security model versus browser security model, where the web visible parts of
|
||||
this, it's meant to be transparent to the websites. There's no behavior changes
|
||||
to the web platform by turning on site isolation. There's not meant to be. And
|
||||
so it's not really a spec visible thing, it's more part of the browser's
|
||||
architecture, the same way that there's no spec for sandboxes in a browser. You
|
||||
could build a browser that doesn't have a sandbox, but today, the best practice
|
||||
is to have better security by having a sandbox. So I think the relevant thing
|
||||
for web specs is just that we don't introduce APIs that don't work when
|
||||
different origins are in different processes. And that sounds like, well OK,
|
||||
that makes sense, and thankfully, we were sort of in that state to begin with,
|
||||
and in some places we got lucky. Like postmessage is asynchronous, which is a
|
||||
mechanism for sending a message to another origin, but they don't need to run
|
||||
in the same process because that message will be delivered at a later time. So
|
||||
we can send it to a different process running on a different thread. Some
|
||||
places we got unlucky, like document.domain, where web APIs said that different
|
||||
origins can script each other if they agree that it's OK, as long as they're in
|
||||
the same site, and that constrained us in the process model. So we're trying to
|
||||
improve things about the web spec. You could almost say that deprecating
|
||||
document.domain is a way of seeing that the browser security model and the web
|
||||
security model aligning with each other to say, OK, we want to use processes.
|
||||
We want this asynchronous boundary. You shouldn't be able to script other
|
||||
origins from the same site. So I think that's the closest is making sure that
|
||||
specced APIs fit well with this multi-process site isolation world.
|
||||
|
||||
35:12 SHARON: There are some headers and tags and whatever that websites can
|
||||
use to alter how the browser handles things though, right?
|
||||
|
||||
35:23 CHARLIE: Yes, absolutely. And those are both good ways that websites can
|
||||
more effectively isolate themselves, in general, both from web visible behavior
|
||||
and from the browser's architecture and ways that browsers that don't have
|
||||
full-site isolation, that don't have out-of-process iframes in all cases, web
|
||||
pages might still be able to get some of the isolation benefits using those
|
||||
APIs. And so those are things like cross-origin opener policies that says, for
|
||||
example, if I open a pop up to a different website, there's not going to be any
|
||||
communication between me and that pop up. So it's OK to put them in different
|
||||
processes, and they can be better isolated from each other. That's good from an
|
||||
architecture perspective. It's also nice from a web perspective in that you
|
||||
don't have to worry about is the window.opener variable in the pop up able to
|
||||
be used to do sneaky things to the page that opened it. So there's nice,
|
||||
web-visible reasons to use something like a cross-origin opener policy to keep
|
||||
them protected from each other. So that's one example of that. There's others
|
||||
as well.
|
||||
|
||||
36:46 SHARON: Something I've seen around that is a web spec is content security
|
||||
policy. Is that related to any of this at all?
|
||||
|
||||
36:52 CHARLIE: It kind of is. Yeah, so content security policy is another way
|
||||
for websites to tell the browser better ways to secure that site. And so some
|
||||
of it is useful for saying I want to do a better job preventing cross-site
|
||||
scripting attacks on my page, so don't run a script if you find it in these
|
||||
random places. It should only come from these URLs or in these contexts on my
|
||||
page. So that's more about what happens in a given renderer process, but there
|
||||
are some places where content security policy does overlap a bit with site
|
||||
isolation. There is a sandbox value you can put into a content security policy
|
||||
header that makes it get treated like a sandbox iframe. And while we don't yet
|
||||
have support for putting sandbox iframes in another process, that was work
|
||||
that's in progress and we're hoping to ship before long. And so CSP headers
|
||||
that say sandbox will also be able to be isolated from the rest of their site.
|
||||
So if they have some kind of untrustworthy content in them, that won't be able
|
||||
to attack the rest of the site.
|
||||
|
||||
38:04 SHARON: OK. Yeah, so it's that difference between the web versus browser,
|
||||
what's visible, what's an option versus how it's actually implemented.
|
||||
|
||||
38:11 CHARLIE: Right.
|
||||
|
||||
38:11 SHARON: Cool. So a lot of this, we've talked about security a lot, and I
|
||||
think for people who don't know about security, the image you have is people
|
||||
trying to break into - like I'm in, that whole thing, and that's very much not
|
||||
what's going on here, because we're not trying to break things. So can you tell
|
||||
us just a bit about the difference between offensive and defensive security and
|
||||
how this is one of those.
|
||||
|
||||
38:38 CHARLIE: Yeah, so a lot of attention in the security space goes to big,
|
||||
exciting, flashy attacks that are found. On the offensive side, look, I found a
|
||||
way to break the security of this thing, and we have big vulnerability reward
|
||||
bounties to reward when people find these things so we can get them fixed. So
|
||||
even on the defensive side, you want people working on offensive security,
|
||||
looking for these bugs, looking for things that need to be fixed so we can
|
||||
defend users. But the defensive side is super important and I find it a
|
||||
satisfying place to be, even if it isn't always as glamorous. It's like, you
|
||||
have to have all the defenses in place and all of these different attacks that
|
||||
are found, it's like, yeah, we need to fix them, and we need to find ways to
|
||||
make that less likely. But ultimately, this is the real goal, is we want to
|
||||
have systems that we can trust, that are safe to use, and that we can go and
|
||||
visit untrustworthy web content and not have to worry about it. You need these
|
||||
extra lines of defense. You need all these different ways of defending the
|
||||
product and shipping security fixes fast, all the things that security works on
|
||||
in a defensive sense so that people can use these systems and depend on them in
|
||||
their lives. So that's the fun and fulfilling part of this, even if it isn't
|
||||
quite as glamorous as I found a sandbox escape, but those are fun to look at
|
||||
too.
|
||||
|
||||
40:17 SHARON: I heard security described as a bunch of layers of Swiss cheese.
|
||||
So you have all these different layers of mitigations to try to keep bad things
|
||||
from happening, but each of them is not perfect. And if the holes in those
|
||||
layers line up, then that's where you get a vulnerability. So in this very
|
||||
approximate metaphor, what are the neighboring slices of cheese to site
|
||||
isolation? What other defensive things are related to this and are trying to
|
||||
achieve the same goal sure?
|
||||
|
||||
40:46 CHARLIE: Sure. Yeah, so there's going to be holes in any layer that you
|
||||
build we. Have bugs in software, and in site isolation's case, it's trying to
|
||||
put this boundary between the renderer process, where we assume everything is
|
||||
compromised already and the data that the attacker wants to get to, other
|
||||
websites, data on your machine and so on. So the adjacent layers of Swiss
|
||||
cheese would be within the render process, we do have security checks that try
|
||||
to say we have same origin policy checks, things that try to keep certain data
|
||||
opaque to a web page so the JavaScript can't look at it. Those checks in the
|
||||
renderer process do matter. Today, we do have multiple origins from the same
|
||||
site in the same process. The renderer process' job is to make sure that they
|
||||
don't attack each other. But there's some fairly large Swiss cheese holes in
|
||||
that layer that we try to fix whenever we find them. And so site isolation's
|
||||
job is to be the next layer, which won't have holes in the same places,
|
||||
hopefully. Its holes, site isolation bypasses, might be, oh, there's some way
|
||||
for the renderer process to ask the browser process for something it shouldn't
|
||||
have access to, and it tricks it, and it gets access to that. We hope that it's
|
||||
tough to line those holes up, that an attacker has to find both a bug in the
|
||||
renderer process and a bug in site isolation and luck out in that those bugs
|
||||
line up and you can get to one from the other in order to get access to another
|
||||
website's data. And then the next layer of Swiss cheese would be all the things
|
||||
that the browser process does to keep the renderer isolated from the user's
|
||||
machine and the sandbox itself that you shouldn't have access to the OS APIs
|
||||
and so on. So those would be other ways to try and get beyond site isolation to
|
||||
other things.
|
||||
|
||||
42:48 SHARON: That makes sense. Yeah, when I first heard about it, I was like,
|
||||
oh, that's such a fun way to think about it, really. It's a good visual seeing,
|
||||
OK, this is how things go wrong. All right, cool. Do you have any other fun
|
||||
stories about site isolation, making it happen, stuff since then?
|
||||
|
||||
43:08 CHARLIE: I mean, it's been a really fun journey the whole way. There's
|
||||
been different projects and different exploratory phases, where we weren't sure
|
||||
what was going to work or what we needed to get done. I've worked with a bunch
|
||||
of great interns and people who have been on the team on early phases like
|
||||
getting postmessage to work across renderer processes, later phases about what
|
||||
would it look like to build out a process iframes using something like the
|
||||
plugin infrastructure, just is this feasible? Or what is it that we could
|
||||
protect that a particular renderer process is allowed to ask for. If can we
|
||||
keep allowing JavaScript data from other websites into a renderer process,
|
||||
while blocking your bank account information from getting it, those both look
|
||||
like network responses from different websites, but one has to be let through
|
||||
for compatibility reasons, and one has to be blocked. Can we build that? Are we
|
||||
doing a good job of keeping that sensitive data out? These are things that. We
|
||||
had some great PhD interns working with us on, and ultimately, got us to where
|
||||
we could ship this and protect a lot of data. So it's fun working with all
|
||||
those people along the way.
|
||||
|
||||
44:35 SHARON: Yeah, that sounds very cool. These days, so earlier on, you
|
||||
mentioned people whose questions were like, why doesn't this already happen? So
|
||||
these days, it does happen more or less like that. So what kind of questions or
|
||||
misconceptions do you still see folks who typically work on Chrome still have
|
||||
when it comes to this kind of stuff?
|
||||
|
||||
44:52 CHARLIE: I think it's often assuming that navigation is simpler than it
|
||||
is and not realizing how many corner cases matter and how all of these
|
||||
different features that have built on top of navigation interact with each
|
||||
other. So I think that's where we spend a lot of our time these days beyond the
|
||||
we want to improve site isolation. We want to make these abstractions easier
|
||||
for other people to understand. So I think that's one of the big challenges now
|
||||
is how many different directions the navigation code has been pulled and how
|
||||
those things interact with each other.
|
||||
|
||||
45:24 SHARON: Right. And that's kind of - was intentional initially, right? You
|
||||
don't want everyone who works on Chrome to have to know how all of this works,
|
||||
but then when you hide it so well, they're like, oh, this is fine. I'll just do
|
||||
my thing. It'll just be my one thing, but then everyone has such a thing, and
|
||||
then it becomes too many things. Yeah, I used to work on a different part of
|
||||
Chrome that was not related to this, and you see some of these big classes,
|
||||
like web content or whatever. You're like, oh, I'll just get what I need from
|
||||
that, and things will be fine, but you just don't even have any idea of all the
|
||||
things that could go wrong. So it's cool that someone is out here trying to
|
||||
keep that under control.
|
||||
|
||||
46:00 CHARLIE: And I'm glad there's a lot of efforts to try to improve the APIs
|
||||
for how we expose these things, web content to web content, observer which is
|
||||
growing into quite a large API with many users, looking at ways to make these
|
||||
APIs easier to use and harder to make mistakes with. So I think those are
|
||||
worthwhile efforts.
|
||||
|
||||
46:20 SHARON: OK. Cool. Well, I think that covers all of it. Now, folks know
|
||||
how isolation works. Problem solved. This is great. All right, thank you very
|
||||
much. Great.
|
||||
|
||||
46:34 CHARLIE: Thanks. Oh, no. What? OK, hold on.
|
Reference in New Issue
Block a user