[docs] add "What's Up With That" transcripts

Change-Id: Ie7f34cd19b5f97f9330e914d13de0f6e3ea2d7de
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4886394
Commit-Queue: Nigel Tao <nigeltao@chromium.org>
Reviewed-by: Sharon Yang <yangsharon@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1202896}

This commit is contained in:

Nigel Tao

2023-09-28 22:30:44 +00:00

committed by

Chromium LUCI CQ

parent 238a174121

commit 187a479a8a

10 changed files with 7266 additions and 0 deletions

docs

README.md

transcripts

wuwt-e01-pointers.md wuwt-e02-dchecks.md wuwt-e03-content.md wuwt-e04-tests.md wuwt-e05-build-gn.md wuwt-e06-open-source.md wuwt-e07-mojo.md wuwt-e08-processes.md wuwt-e09-site-isolation.md

									
										16

docs/README.md
									
				@ -438,6 +438,22 @@ used when committed.

				### UI

				*   [Chromium UI Platform](ui/index.md) - All things user interface

				### What's Up With That Transcripts

				These are transcripts of [What's Up With

				That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq),

				a video series of interviews with Chromium software engineers.

				*   [What's Up With Pointers - Episode 1](transcripts/wuwt-e01-pointers.md)

				*   [What's Up With DCHECKs - Episode 2](transcripts/wuwt-e02-dchecks.md)

				*   [What's Up With //content - Episode 3](transcripts/wuwt-e03-content.md)

				*   [What's Up With Tests - Episode 4](transcripts/wuwt-e04-tests.md)

				*   [What's Up With BUILD.gn - Episode 5](transcripts/wuwt-e05-build-gn.md)

				*   [What's Up With Open Source - Episode 6](transcripts/wuwt-e06-open-source.md)

				*   [What's Up With Mojo - Episode 7](transcripts/wuwt-e07-mojo.md)

				*   [What's Up With Processes - Episode 8](transcripts/wuwt-e08-processes.md)

				*   [What's Up With Site Isolation - Episode 9](transcripts/wuwt-e09-site-isolation.md)

				### Probably Obsolete

				*   [TPM Quick Reference](tpm_quick_ref.md) - Trusted Platform Module notes.

				*   [System Hardening Features](system_hardening_features.md) - A list of

									
										601

docs/transcripts/wuwt-e01-pointers.md
									
										Normal file
									
				@ -0,0 +1,601 @@

				# What’s Up With Pointers

				This is a transcript of [What's Up With

				That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)

				Episode 1, a 2022 video discussion between [Sharon (yangsharon@chromium.org)

				and Dana (danakj@chromium.org)](https://www.youtube.com/watch?v=MpwbWSEDfjM).

				The transcript was automatically generated by speech-to-text software. It may

				contain minor errors.

				---

				Welcome to the first episode of What’s Up With That, all about pointers! Our

				special guest is C++ expert Dana. This talk covers smart pointer types we have

				in Chrome, how to use them, and what can go wrong.

				Notes:

				- https://docs.google.com/document/d/1VRevv8JhlP4I8fIlvf87IrW2IRjE0PbkSfIcI6-UbJo/edit

				Links:

				- [Life of a Vulnerability](https://www.youtube.com/watch?v=HAJAEQrPUN0)

				- [MiraclePtr](https://www.youtube.com/watch?v=WhI1NWbGvpE)

				---

				0:00 SHARON: Hi, everyone, and welcome to the first installment of "What's Up

				With That", the series that demystifies all things Chrome. I'm your host,

				Sharon, and today's inaugural episode will be all about pointers. There are so

				many types of types - which one should I use? What can possibly go wrong? Our

				guest today is Dana, who is one of our Base and C++ OWNERS and is currently

				working on introducing Rust to Chromium. Previously, she was part of bringing

				C++ 11 support to the Android NDK and then to Chrome. Today, she'll be telling

				us what's up with points. Welcome, Dana!

				00:31 DANA: Thank you, Sharon. It's super exciting to be here. Thank you for

				letting me be on your podcast thingy.

				00:36 SHARON: Yeah, thanks for being the first episode. So let's just jump

				right in. So when you use pointers wrong, what can go wrong? What are the

				problems? What can happen?

				00:48 DANA: So pointers are a big cause in security problems for Chrome, and

				that's what we mostly think about when things go wrong with pointers. So you

				have a pointer to some thing, like you've pointed to a goat. And then you

				delete the goat, and you allocate some new thing - a cow. And it gets stuck in

				the same spot. Your pointer didn't change. It's still pointing to what it

				thinks is a goat, but there's now a cow there. And so when you go to use that

				pointer, you use something different. And this is a tool that malicious actors

				use to exploit software, like Chrome, in order to gain access to your system,

				your information, et cetera.

				01:39 SHARON: And we want to avoid those. So what's that general type of attack

				called?

				01:39 DANA: That's a Use-After-Free because you have freed the goat and

				replaced it with a cow. And you're using your pointer, but the thing it pointed

				to was freed. There are other kinds of pointer badness that can happen. If you

				take a pointer and you add to it some number, or you go to an offset off the

				pointer, and you have an array of five things, and you go and read 20, or minus

				2, or something, now you're reading out of bounds of that memory allocation.

				And that's not good. these are both memory safety bugs that occur a lot with

				pointers.

				02:23 SHARON: Today, we'll be mostly looking at the Use-After-Free kind of

				bugs. We definitely see a lot of those. And if you want to see an example of

				one being used, Dana has previously done a talk called, "Life of a

				Vulnerability." It'll be linked below. You can check that out. So that being

				said, should we ever be using just a regular raw pointer in C++ in Chrome?

				02:41 DANA: First of all, let's call them native pointers. You will see them

				called raw pointers a lot in literature and stuff. But later on, we'll see why

				that could be a bit ambiguous in this context. So we'll call them a native

				pointer. So should you use a native pointer? If you don't want to

				Use-After-Free, if you don't want a problem like that, no. However, there is a

				performance implication with using smart pointers, and so the answer is yes.

				The style guide that we have right now takes this pragmatic approach of saying

				you should use raw pointers for giving access to an object. So if you're

				passing them as a function parameter, you can share it as a pointer or a

				reference, which is like a pointer with slightly different rules. But you

				should not store native pointers as fields and objects because that is a place

				where they go wrong a lot. And you should not use a native pointer to express

				ownership. So before C++ 11, you would just say, this is my pointer, use a

				comment, say this one is owning it. And then if you wanted to pass the

				ownership, you just pass this native pointer over to something else as an

				argument, and put a comment and say this is passing ownership. And you just

				kind of hope it works out. But then it's very difficult. It requires the

				programmer to understand the whole system to do it correctly. There is no help.

				So in C++ 11, the type called `std::optional_ptr` - or sorry, `std::unique_ptr`

				- was introduced. And this is expressing unique ownership. That's why it's

				called `unique_ptr`. And it's just going to hold your pointer, and when it goes

				out of scope, it gets deleted. It can't be copied because it's unique

				ownership. But it can be moved around. And so if you're going to express

				ownership to an object in the heap, you should use a `unique_ptr`.

				04:48 SHARON: That makes sense. And that sounds good. So you mentioned smart

				pointers before. You want to tell us a bit more about what those are? It sounds

				like `unique_ptr` is one of those.

				04:55 DANA: Yes, so a smart pointer, which can also be referred to as a

				pointer-like object, perhaps as a subset of them, is a class that holds inside

				of it a pointer and mediates access to it in some way. So unique pointer

				mediates access by saying I own this pointer, I will delete this pointer when I

				go away, but I'll give you access to it. So you can use the arrow operator or

				the star operator to get at the underlying pointer. And you can construct them

				out of native pointers as well. So that's an example of a smart pointer.

				There's a whole bunch of smart pointers, but that's the general idea. I'm going

				to add something to what a native pointer is, while giving you access to it in

				some way.

				05:40 SHARON: That makes sense. That's kind of what our main thing is going to

				be about today because you look around in Chrome. You'll see a lot of these

				wrapper types. It'll be a `unique_ptr` and then a type. And you'll see so many

				types of these, and talking to other people, myself, I find this all very

				confusing. So we'll cover some of the more common types today. We just talked

				about unique pointers. Next, talk about `absl::optional`. So why don't you tell

				us about that.

				06:10 DANA: So that's actually a really great example of a pointer-like object

				that's not actually holding a pointer, so it's not a smart pointer. But it

				looks like one. So this is this distinction. So `absl::optional`, also known as

				`std::optional`, if you're not working in Chromium, and at some point, we will

				hopefully migrate to it, `std::optional` and `absl::optional` hold an object

				inside of it by value instead of by pointer. This means that the object is held

				in that space allocated for the `optional`. So the size of the `optional` is

				the size of the thing it's holding, plus some space for a presence flag.

				Whereas a `unique_ptr` holds only a pointer. And its size is the size of a

				pointer. And then the actual object lives elsewhere. So that's the difference

				in how you can think about them. But otherwise, they do look quite similar. An

				`optional` is a unique ownership because it's literally holding the object

				inside of it. However, an `optional` is copyable if the object inside is

				copyable, for instance. So it doesn't have quite the same semantics. And it

				doesn't require a heap allocation, the way unique pointer does because it's

				storing the memory in place. So if you have an `optional` on the stack, the

				object inside is also right there on the stack. That's good or bad, depending

				what you want. If you're worried about your object sizes, not so good. If

				you're worried about the cost of memory allocation and free, good. So this is

				the trade-off between the two.

				07:51 SHARON: Can you give any examples of when you might want to use one

				versus the other? Like you mentioned some kind of general trade-offs, but any

				specific examples? Because I've definitely seen use cases where `unique_ptr` is

				used when maybe an `optional` makes more sense or vise versa. Maybe it's just

				because someone didn't know about it or it was chosen that way. Do you have any

				specific examples?

				08:14 DANA: So one place where you might use a `unique_ptr`, even though

				`optional` is maybe the better choice, is because of forward declarations. So

				because an `optional` holds the type inside of it, it needs to know the type

				size, which means it needs to know the full declaration of that type, or the

				whole definition of that type. And a `unique_ptr` doesn't because it's just

				holding a pointer, so it only needs to know the size of a pointer. And so if

				you have a header file, and you don't want to include another header file, and

				you just want to forward declare the types, you can't stick an optional of that

				type right there because you don't know how big it's supposed to be. So that

				might be a case where it's maybe not the right choice, but for other

				constraining reasons, you choose to use a `unique_ptr` here. And you pay the

				cost of a heap allocation and free as a result. But when would you use an

				`optional`? So `optional` is fantastic for returning a value sometimes. I want

				to do this thing, and I want to give you back a result, but I might fail. Or

				sometimes there's no value to give you back. Typically, before C++ - what are

				we on now, was it came in 14? I'm going to say it wrong. That's OK. Before we

				had `absl::optional`, you would have to do different tricks. So you would pass

				in a native pointer as a parameter and return a bool as the return value to say

				did I populate the pointer. And yes, that works. But it's easy to mess it up.

				It also generates less optimal code. Pointers cause the optimizer to have

				troubles. And it doesn't express as nicely what your intention is. A return,

				this thing, sometimes. And so in place of using this pointer plus bool, you can

				put that into a single type, return an `optional`. Similar for holding

				something as a field, where you want it to be held in line in your class, but

				you don't always have it present, you can do that with an `optional` now, where

				you would have probably used a pointer before. Or a `union` or something, but

				that gets even more tricky. And then another place you might use it as a

				function argument. However, that's usually not the right choice for a function

				argument. Why? Because the `optional` holds the value inside of it.

				Constructing an `optional` requires constructing the whole object inside of it.

				And so that's not free. It can be arbitrarily expensive, depending on what your

				type is. And if your caller to your function doesn't have already an

				`optional`, they have to go and construct it to pass it to you. And that's a

				copy or move of that inner type. So generally, if you're going to receive a

				parameter, maybe sometimes, the right way to spell that is just to pass it as a

				pointer because a native pointer, which can be null, when it's not present.

				11:29 SHARON: Hopefully that clarifies some things for people who are trying to

				decide which one best suits their use case. So moving on from that, some people

				might remember from a couple of years ago that instead of being called

				`absl::optional`, it used to be called `base::optional`. And do you want to

				quickly mention why we switched from `base` to `absl`? And you mentioned even

				switching to `std::optional`. Why this transition?

				11:53 DANA: Yeah, absolutely. So as the C++ standards come out, we want to use

				them, but we can't until our toolchain is ready. What's our toolchain? So our

				compiler, our standard library, and unfortunately, we have more than one

				compiler that we need to worry about. So we have the NaCl compiler. Luckily, we

				just have Clang for the compiler choice we really have to worry about. But we

				do have to wait for these things to be ready, and for a code base to be ready

				to turn on the new standard because sometimes there are some non-backwards

				compatible changes. But we can forward port stuff out of the standard library

				into base. And so we've done that. We have a bunch of C++ 20 backport in base

				now. We had 17 back ports before. We turned on 17, now they should hopefully be

				gone. And so `base::optional` was an example of a backport, while `optional`

				was still considered experimental in the standard library. We adopted use of

				`absl` since then, and `absl` had also, essentially, a backport of the

				`optional` type` inside of it for presumably the same reasons. And so why have

				two when you can have one? That's a pretty good rule. And so we deprecated the

				`base` one, removed it, and moved everything to the `abslq one. One thing to

				note here, possibly interest, is we often add security hardening to things in

				`base`. And so sometimes there is available in the standard library something.

				But we choose not to use it and use something in `base` or `absl`, but we use

				it in `base` instead, because we have extra hardening checks. And so part of

				the process of removing `base::optional` and moving to `absl::optional` was

				ensuring those same security hardening checks are present in `absl`. And we're

				going to have to do the same thing to stop using `absl` and start using the

				standard one. And that's currently a work in progress.

				13:48 SHARON: So let's go through some of the `base` types because that's

				definitely where the most of these kind of wrapper types live. So let's just

				start with one that I learned about recently, and that's a `scoped_refptr`.

				What's that? When should we use it?

				13:59 DANA: So `scoped_refptr` is kind of your Chromium equivalent to

				`shared_ptr` in the standard library. So if you're familiar with that, it's

				quite similar, but it has some slight differences. So what is `scoped_refptr`?

				It gives you share ownership of the underlying object. And it's a smart

				pointer. It holds a pointer to an object that's allocated in the heap when all

				`scoped_refptr` that point to the same object are gone, it'll be deleted. So

				it's like `unique_ptr`, except it can be copied to add to your ref count,

				basically. And when all of them are gone, it's destroyed. And it gives access

				to the underlined pointer in exactly the same ways. Oh, but why is it different

				than `shared_ptr`? I did say it is. `scoped_refptr` requires the type that is

				held inside of it to inherit from `RefCounted` or `RefCountedThreadSafe`.

				`shared_ptr` doesn't require this. Why? So `shared_ptr` sticks an allocation

				beside your object and then puts your object here. So the ref count is

				externalized to your object being stored and owned by the shared pointer.

				Chromium took this position to be doing intrusive ref counting. So because we

				inherit from a known type, we stick the ref count in that base class,

				`RefCounted` or `RefCountedThreadSafe`. And so that is enforced by the

				compiler. You must inherit from one of these two in order to be stored and

				owned in a `scoped_refptr`. What's the difference? Ref counted is the default

				choice, but it's not thread safe. So the ref counting is cheap. It's the more

				performant one, but if you have a `scoped_refptr` on two different threads

				owning the same object, their ref counting will race, can be wrong, you can end

				up with a double free - which is another way that pointers can go wrong, two

				things free in the same thing - or you could end up with potentially not

				freeing it at all, probably. I guess I've never checked if that's possible. But

				they can race, and then bad things happen. Whereas, ref counted thread safe

				gives you atomic ref counting. So atomic means that across all threads, they're

				all going to have the same view of the state. And so it can be used across

				threads and be owned across threads. And the tricky part there is the last

				thread that owns that object is where it's going to be destroyed. So if your

				objects destructor does things that you expect to happen on a specific thread,

				you have to be super careful that you synchronize which thread that last

				reference is going away on, or it could explode in a really flakey way.

				17:02 SHARON: This sounds useful in other ways. What are some kind of more

				design things to consider, in terms of when a scope ref pointer is useful and

				does help enforce things that you want to enforce, like relative lifetimes of

				certain objects?

				17:15 DANA: Generally, we recommend that you don't use ref counting if you can

				help it. And that's because it's hard to understand when it's going to be

				destroyed, like I kind of alluded to with the thread situation. Even in a

				single thread situation, how do you know which one is the last reference? And

				is this object going to outlive that other object? Maybe sometimes. It's not

				super obvious. It's a little more clear with a `unique_ptr`, at least local to

				where that `unique_ptr`'s destruction is. But there's usually no

				`scoped_refptr`. You can say this is the last one. So I know it's gone after

				this thing is gone. Maybe it is, maybe it's not often. So it's a bit tricky.

				However, there are scenarios when you truly want a bunch of things to have

				access to a piece of data. And you want that data to go away when nobody needs

				it anymore. And so that is your use case for a `scoped_refptr`. It is nicer

				when that thing being with shared ownership is not doing a lot of interesting

				things, especially in its destructor because of the complexity that's involved

				in shared ownership. But you're welcome to shoot yourself in the foot with this

				one if you need to.

				18:33 SHARON: We're hoping to help people not shoot themselves in the foot. So

				use `scoped_refptr` carefully, is the lesson there. So you mentioned

				`shared_ptr`. Is that something we see much of in Chrome, or is that something

				that we generally try to avoid in terms of things from the standard library?

				18:51 DANA: That is something that is banned in Chrome. And that's just

				basically because we already have `scoped_refptr`, and we don't want two of the

				same thing. There's been various times where people have brought up why do we

				need to have both? Can we just use `shared_ptr` now? And nobody's ever done the

				kind of analysis needed to make that kind of decision. And so we stay with what

				we're at.

				19:18 SHARON: If you want to do that, there's someone that'll tell you what to

				do. So something that when I was using `scoped_refptr`, I came across that you

				need a WeakPtrFactory to create such a pointer. So weak pointers and WeakPtr

				factories are one of those things that you see a lot in Chrome and one of these

				base things. So tell us a bit about weak pointers and their factories.

				19:42 DANA: So WeakPtr and WeakPtrFactory have a bit of an interesting history.

				Their major purpose is for asynchronous work. Chrome is basically a large

				asynchronous machine, and what does that mean? It means that we break all of

				the work of Chrome up into small pieces of work. And every time you've done a

				piece, you go and say, OK, I'm done. And when the next piece is ready, run this

				thing. And maybe that next thing is like a user input event, maybe that's a

				reply from the network, whatever it might be. And there's just a ton of steps

				in things that happen in Chrome. Like, a navigation has a request, a response,

				maybe another request - some redirects, whatever. That's an example of tons of

				smaller asynchronous tasks that all happen independently. So what goes on with

				asynchronous tasks? You don't have a continuous stack frame. What does that

				mean? So if you're just running some synchronous code, you make a variable, you

				go off and you do some things, you come back. Your variable is still here

				right. You're in this stack frame. And you can keep using it. You have

				asynchronous tasks. You make a variable, you go and do some work, and you are

				done your task Boop, your stack's gone. You come back later, you're going to

				continue. You don't have your variable anymore. So any state that you want to

				keep across your various tasks has to be stored and what we call bound in with

				that task. If that's a pointer, that's especially risky. So we talked earlier

				about Use-After-Frees. Well, you can, I hope, imagine how easy it is to stick a

				pointer into your state. This pointer is valid, I'm using it. I go away, I come

				back when? I don't know, sometime in the future. And I'm going to go use this

				pointer. Is it still around? I don't own it. I didn't use a `unique_ptr`. So

				who owns it. How do they know that I have a task waiting to use it? Well,

				unless we have some side channel communicating that, they don't. And how do I

				know if they've destroyed it if we don't have some side channel communicating

				that. I don't know. And so I'm just going to use this pointer and bad things

				happen. Your bank account is gone.

				22:06 SHARON: No! My bank account!

				22:06 DANA: I know. So what's the side channel? The side channel that we have

				is WeakPtr. So a WeakPtr and WeakPtrFactory provide this communication

				mechanism where WeakPtrFactory watches an object, and when the object gets

				destroyed, the WeakPtrFactory inside of it is destroyed. And that sends this

				little bit that says, I'm gone. And then when your asynchronous task comes back

				with its pointer, but it's a WeakPtr inside of it and tries to run, it can be

				like, am I still here? If the WeakPtrFactory was destroyed, no, I'm not. And

				then you have a choice of what to do at that point. Typically, we're like,

				abandon ship. Don't do anything here. This whole task is aborted. But maybe you

				do something more subtle. That's totally possible.

				22:59 SHARON: I think the example I actually meant to say that uses a

				WeakPtrFactory is a SafeRef, which is another base type. So tell us a bit about

				SafeRefs.

				23:13 DANA: WeakPtr is cool because of the side channel that you can examine.

				So you can say are you still alive, dear object? And it can tell you, no, it's

				gone. Or yeah, it's here. And then you can use it. The problem with this is

				that in places where you as the code author want to believe that this object is

				actually always there, but you don't want a security bug if you're wrong. And

				it doesn't mean that you're wrong now, even. Sometime later, someone can change

				code, unrelated to where this is, where the ownership happens, and break you.

				And maybe they don't know all the users of a given object and change in its

				lifetime in some subtle way, maybe not even realizing they are. Suddenly you're

				eventually seeing security bugs. And so that's why native pointers can be

				pretty scary. And so SafeRef is something we can use instead of a native

				pointer to protect you against this type of bug. It's built on top of WeakPtr

				and WeakPtrFactory. That's its relationship, but its purpose is not the same.

				so what SafeRef does is it says - SafePtr?

				24:31 SHARON: SafeRef.

				24:31 DANA: SafeRef.

				24:31 SHARON: I think there's also a safe pointer, but there -

				24:38 DANA: We were going to add it. I'm not sure if it's there yet. But so two

				differences between SafeRef and WeakPtr then, ref versus ptr, it can't be null.

				So it's like a reference wrapper. But the other difference is you can't observe

				whether the object is actually alive or not. So it has the side channel, but it

				doesn't show it to you. Why would you want that? If the information is there

				anyway, why wouldn't you want to expose it? And the reason is because you are

				documenting that you as the author understand and expect that this pointer is

				always valid at this time. It turns out it's not valid. What do you do? If it's

				a WeakPtr, people tend to say, we don't know if it's valid. It's a WeakPtr.

				Let's check. Am I valid? And if I'm not, return. And what does that result in?

				It results in adding a branch to your code. You do that over, and over, and

				over, and over, and static analysis, which is what we as humans have to do -

				we're not running the program, we're reading the code - can't really tell what

				will happen because there's so many things that could happen. We could exit

				here, we could exit there, we could exit here. Who knows. And that makes it

				increasingly hard to maintain and refactor the code. So SafeRef you the option

				to say this is always going to be valid. You can't check it. So if it's not

				valid, go fix that bug somewhere else. It should be valid here.

				26:16 SHARON: So what kind of -

				26:16 DANA: The assumptions are broken.

				26:16 SHARON: So what kind of errors happen when that assumption is broken? Is

				that a crash? Is that a DCHECK kind of thing?

				26:22 DANA: For SafeRef and for WeakPtr, if you try to use it without checking

				it, or write it incorrectly, they will crash. And crashing in this case means a

				safe crash. It's not going to lead to a security bug. It's literally just

				terminating the program.

				26:41 SHARON: Does that also mean you get a sad tab as a user? Like when the

				little sad file comes up?

				26:47 DANA: Yep. It would. If you're in the render process, you take it down.

				It's a sad tab. So that's not great. It's better than a security bug. Because

				your options here are don't write bugs. Ideal. I love that idea, but we know

				that bugs happen. Use a native pointer, security problem. Use a WeakPtr, that

				makes sense if you wanted it to sometimes not be there. But if you want it to

				always be there - because you have to make a choice now of what you're supposed

				to do if it's not, and it makes the code very hard to understand. And you're

				only going to find out it can't be there through a crash anyhow. Or use a

				SafeRef. And it's going to just give you the option to crash. You're going to

				figure out what's wrong and make it no longer do that.

				27:38 SHARON: I think wanting to guarantee the lifetime of some other things

				seems like a pretty common thing that you might come across. So I'm sure there

				are many cases for many people to be adding SafeRefs to make their code a bit

				safer, and also ensure that if something does go wrong, it's not leading to a

				memory bug that could be exploited in who knows how long. Because we don't

				always hear about those. If it crashes, and they can reliably crash, at least

				you know it's there. You can fix it. If it's not, we're hoping that one of our

				VRP vulnerability researchers find it and report it, but that doesn't always

				happen. So if we can know about these things, that's good. So another new type

				in base that people might have been seeing recently is a `raw_ptr` which is

				maybe why earlier we were saying let's call them native pointers, not raw

				pointers. Because the difference between `raw_ptr` and raw pointer, very easy

				to mix those up. So why don't you tell us a bit about `raw_ptr`s?

				28:40 DANA: So `raw_ptr` is really cool. It's a non-owning smart pointer. So

				that's kind of WeakPtr or SafeRef. These are also non-owning. And it's actually

				very similar in inspiration to what WeakPtr is. So it has a side channel where

				it can see if the thing It's pointing to is alive or gone. So for WeakPtr, it

				talks to the WeakPtrFactory and says am I deleted? And for `raw_ptr`, what it

				does is it keeps a reference count, kind of like `scoped_refptr`, but it's a

				weak reference count. It's not owning. And it keeps this reference count in the

				memory allocator. So Chrome has its own memory allocator for new and delete

				called PartitionAlloc. And that lets us do some interesting stuff. And this is

				one of them. And so what happens is as long as there is `raw_ptr` around, this

				reference count is non-zero. So even if you go and you delete the object, the

				allocator knows there is some pointer to it. It's still out there. And so it

				doesn't free it. It holds it. And it poisons the memory, so that just means

				it's going to write some bit pattern over it, so it's not really useful

				anymore. It's basically re-initialized the memory. And so later, if you go and

				use this `raw_ptr`, you get access to just dead memory. It's there, but it's

				not useful anymore. You're not going to be able to create security bugs in the

				same way. Because when we first started talking about a Use-After-Free - you

				have your goat, you free it, a cow is there, and now your pointer is pointing

				at the wrong thing - you can't do that because as long as there's this

				`raw_ptr` to your goat, the goat can be gone, but nothing else is going to come

				back here. It's still taken by that poisoned memory until all the `raw_ptr`s

				are gone. So that's their job, to protect us from a Use-After-Free being

				exploitable. It doesn't necessarily crash when you use it incorrectly, you just

				get to use this bad memory inside of it. If you try to use it as a pointer,

				then you're using a bad pointer, you're going to probably crash. But it's a

				little bit different than a WeakPtr, which is going to deterministically crash

				as soon as you try to use it when it's gone. It's really just a protection or a

				mitigation against security exploits through Use-After-Free. And then we

				recently just added `raw_ref`, which is really the same as `raw_ptr`, except

				addressing null ability. So smart pointers in C++ have historically all allowed

				a null state. That's representative of what native pointers did in C and C++.

				And so this is kind of just bringing this along in this obvious, historical

				way. But if you look at other languages that have been able to break with

				history and make their own choices kind of fresh, we see that they make choices

				like not having null pointers, not having null smart pointers. And that

				increases the readability and the understanding of your code greatly. So just

				like for WeakPtr, how we said, we just check if it's there or not. And if it's

				not, we return, and so on. It's every time you have a WeakPtr, if you were

				thinking of a timeline, every time you touch a WeakPtr, your timeline splits.

				And so you get this exponential timeline of possible states that your

				software's in. That's really intense. Whereas every time you cannot do that,

				say this can't be null, so instead of WeakPtr, you're using SafeRef. This can't

				be not here or null, actually. WeakPtr can't just be straight up null. This is

				always present. Then you don't have a split in your timeline, and that makes it

				a lot easier to understand what your software is doing. And so for `raw_ptr`,

				it followed this historical precedent. It lets you have a null value inside of

				it. And `raw_ref` is our kind of modern answer to this new take on nullability.

				And so `raw_ref` is a reference wrapper, meaning it holds a reference inside of

				it, conceptually, meaning it just can't be null. That is just basically - it's

				a pointer, but it can't be null.

				33:24 SHARON: So these do sound the most straightforward to use. So basically,

				if you're not sure - or your class members at least - any time you would use a

				native pointer or an ampersand, basically you should always just put those in

				either a `raw_ptr` or a `raw_ref`, right?

				33:45 DANA: Yeah, that's what our style guide recommends, with one nuance. So

				because `raw_ptr` and `raw_ref` interact with the memory allocator, they have

				the ability to be like, turn on or off dynamically at runtime. And there's a

				performance hit on keeping this reference count around. And so at the moment,

				they are not turned on in the renderer process because it's a really

				performance-critical place. And the impact of security bugs, there is a little

				less than in the browser process where you just immediately get access to the

				whole system. And so we're working on turning it on there. But if you're

				writing code that's only in the renderer process, then there's no point to use

				it. And we don't recommend that you use it. But the default rule is yes. Don't

				use a native pointer, don't use a native reference. As a field to an object,

				use a `raw_ptr`, use a `raw_ref`. Prefer something with less states, always,

				because you get less branches in your timeline. And then you can make it cost

				if you don't want it to be able to rebound to an object, if you don't want the

				pointer to change. Or you can make it mutable if you wanted to be able to.

				34:58 SHARON: So you did mention that these types are ref counted, but earlier

				you said that you should avoid ref counting things. So

				35:04 DANA: Yes.

				35:11 SHARON: So what's the balance there? Is it because with a

				`scoped_refptr`, you're a bit more involved in the ref counting, or is it just,

				this is we've done it for you, you can use it. This is OK.

				35:19 DANA: No, this is a really good question. Thank you for asking that. So

				there's two kinds of ref counts going on here. I tried to kind of allude to it,

				but it's great to make it clear. So `scoped_refptr` is a strong ref count,

				meaning the ref count owns the object. So the destructor runs, the object is

				gone and deleted when that ref count goes to 0. `raw_ref` and `raw_ptr` are a

				witchcraft count. They could be pointing to something owned in a

				`scoped_refptr` even. So they can exist at the same time. You can have both

				kind of ref counts going at the same time. A weak ref count, in this case, is

				holding the memory alive so that it doesn't get re-used. But it's not keeping

				the object in that memory alive. And so from a programming state point-of-view,

				the weak refs don't matter. They're helping protect you from security bugs.

				They're helping to make - when things go wrong, when a bug happens, they're

				helping to make it less impactful. But they don't change your program in a

				visible way. Whereas, strong references do. That destrutor's is based on when

				the ref count goes to 0 for a strong reference. So that's the difference

				between these two.

				36:46 SHARON: So when you say don't use ref counting, you mean don't use strong

				ref counting.

				36:46 DANA: I do, yes.

				36:51 SHARON: And if you want to learn more about the raw pointer, `raw_ptr`,

				`raw_ref`, that's all part of the MiraclePtr project, and there's a talk about

				that from BlinkOn. I'll link that below also. So in terms of other base types,

				there's a new one that's called `base::expected`. I haven't even really seen

				this around. So can you tell us a bit more about how we use that, and what

				that's for?

				37:09 DANA: `base::expected` is a backport from C++ 23, I want to say. So the

				proposal for `base::expected` actually cites a Rust type as inspiration, which

				is called `std::result` in Rust. And it's a lot like `optional`, so it's used

				for return values. And it's more or less kind of a replacement for exceptions.

				So Chrome doesn't compile with exceptions enabled even, so we've never relied

				on exceptions to report errors. But we have to do complicated things, like with

				`optional` to return a bool or an enum. And then maybe some value. And so this

				kind of compresses all that down into a single type, but it's got more state

				than just an option. So `expected` gives you two choices. It either returns

				your value, like `optional` can, or it returns an error. And so that's the

				difference between `optional` and `expected`. You can give a full error type.

				And so this is really useful when you want to give more context on what went

				wrong, or why you're not returning the value. So it makes a lot of sense in

				stuff like File IO. So you're opening a file, and it can fail for various

				reasons, like I don't have permission, it doesn't exist, whatever. And so in

				that case, the way you would express that in a modern way would be to return

				`base::expected` of your file handle or file class. And as an error, some

				enumerator, perhaps, or even an object that has additional state beyond just I

				couldn't open the file. But maybe a string about why you couldn't open the file

				or something like this. And so it gives you a way to return a structured error

				result.

				39:05 SHARON: That's found useful in lots of cases. So all of these types are

				making up for basically what is lacking in C++, which is memory safety. C++, it

				does a lot. It's been around for a long time. Most of Chrome is written in it.

				But there are all these memory issues. And a lot of our security bugs are a

				result of this. So you are working on bringing Rust to Chromium. Why is that a

				good next step? Why does that solve these problems we're currently facing?

				39:33 DANA: So Rust has some very cool properties to it. Its first property

				that is really important to this conversation is the way that it handles

				pointers, which in Rust would be treated pretty much exclusively as references.

				And what Rust does is it requires you to tell the compiler the relationships

				between the lifetimes of your references. And the outcome of this additional

				knowledge to the compiler is memory safety. And so what does that mean? It

				means that you can't write a Use-After-Free bug in Rust unless you're going

				into the unsafe part of the language, which is where scariness exists. But you

				don't need to go there to write a normal program. So we'll ignore it. And so

				what that means is you can't write the bug. And so that doesn't just mean I

				also like to believe I can write C++ without a bug. That's not true. But I

				would love to believe that. But it means that later, when I come back and

				refactor my code, or someone comes who's never seen this before and fixes some

				random bug somewhere related to it, they can't introduce a Use-After-Free

				either. Because if they do, the compiler is like, hey - it's going to outlive

				it. You can't use it. Sorry. And so there's this whole class of bugs that you

				never have to debug, you never ship, they never affect users. And so this is a

				really nice promise, really appealing for a piece of software like Chrome,

				where our basic purpose is to handle arbitrary and adversarial data. You want

				to be able to go on some web page, maybe it's hostile, maybe not. You just get

				a link. You want to be able to click that link and trust that even if it's

				really hostile and wanting to destroy you, it can't. Chrome is that safety net

				for you. And so Rust is that kind of safety net for our code, to say no matter

				how you change it over time, it's got your back. You can't introduce this kind

				of bug.

				42:03 SHARON: So the first project sounds really cool. If people want to learn

				more or get involved - if you're into the whole languages, memory kind of thing

				- where can people go to learn more?

				42:09 DANA: So if you're interested in helping out with our Rust experiment,

				then you can look for us in the Rust channel on Slack. If you're interested in

				C++ language stuff, you can find us in the CXX channel on Slack, as well. As

				well as the same CXX@chromium.org mailing list. And there is, of course, the

				rust-dev@chromium.org mailing list if you want to use email to reach us as

				well.

				42:44 SHARON: Thank you very much, Dana. There will be notes from all of this

				also linked in the description box. And thank you very much for this first

				episode.

				42:52 DANA: Thanks, Sharon This was fun.

									
										453

docs/transcripts/wuwt-e02-dchecks.md
									
										Normal file
									
				@ -0,0 +1,453 @@

				# What’s Up With DCHECKs

				This is a transcript of [What's Up With

				That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)

				Episode 2, a 2022 video discussion between [Sharon (yangsharon@chromium.org)

				and Peter (pbos@chromium.org)](https://www.youtube.com/watch?v=MpwbWSEDfjM).

				The transcript was automatically generated by speech-to-text software. It may

				contain minor errors.

				---

				You've seen DCHECKs around and been asked to use them in code review, but what

				are they? What's the difference between a CHECK and a DCHECK? How do you use

				them? Here to answer that is special guest is Peter, who works on UI and

				improving crash reports.

				Notes:

				- https://docs.google.com/document/d/146LoJ1E3N3E6fb4zDh92HPQc6yhRpNI7DSKlJjaYlLw/edit

				Links:

				- [What's Up With Pointers](https://www.youtube.com/watch?v=MpwbWSEDfjM)

				---

				00:00 SHARON: Hello, and welcome to What's Up With That?, the series that

				demystifies all things Chrome. I'm your host, Sharon. And today, we're talking

				about DCHECKs. You've seen them around. You've probably been told to add one in

				code review before. But what are they? What are they for, and what do they do?

				Our guest today is Peter, who works on desktop UI and Core UI. He's also

				working on improving Chrome's crash reports, which includes DCHECKs. Today

				he'll help us answer, what's up with DCHECKs? Welcome, Peter.

				00:30 PETER: Thanks for having me.

				00:32 SHARON: Yeah. Thanks for being here. So the most obvious question to

				start with, what is a DCHECK?

				00:39 PETER: So a CHECK and a DCHECK are both sort of things that make sure

				that what you think is true is true. Right? So this should never be called with

				an empty vector. You might add a CHECK for it, or you might add a DCHECK for

				it. And it's sort of similar to a search, which you may have hit during earlier

				programming outside of Chrome. And what it means is when this line gets hit, we

				check and see if it's true. And if it's not true, we crash. DCHECKs differ from

				CHECKs in that they are traditionally only in debug builds, or local

				development builds, or on our try-bots. So they have zero overhead when Chrome

				hits stable, because the CHECK just won't be there.

				01:24 SHARON: OK. So like if the D stands for Debug. That make sense.

				01:28 PETER: Yeah. I want debug to turn into developer, because now we have

				them by default if you're no longer - if you're doing a release build, and

				you're not turning them off, and you're not doing an official build, you get

				them.

				01:42 SHARON: OK. Well, you heard it here first, or maybe you heard it before.

				I heard it here first. So you mentioned asserts. So something that I've seen a

				couple times in Chrome, and also is part of the standard library, is

				`static_assert`. So how is that similar or different to DCHECKs? And why do we

				use or not use them?

				02:00 PETER: Right. So `static_assert`s are - and you're going to have to ask

				C++ experts, who can probably take some of the sharp edges off of this - but

				it's basically, if you can assert something in compile time, then you can use a

				`static_assert`, which means that you don't have to hit a code path where it's

				wrong. It sort of has to always hold true. And whenever you can use a

				`static_assert`, use a `static_assert`, because it's free. And basically, you

				can't compile the program if it's not true.

				02:31 SHARON: OK. That's good to know, because I definitely thought that was

				one of the C++ standard library things we should avoid, because we have a

				similar thing in Chromium. But I guess that's not the case.

				02:41 PETER: Yeah. Assert is the one that is - OK, so this is a little

				complicated, right? `static_assert` is a language feature, not a library

				feature. And someone will tell me that I'm wrong about something about this.

				Asserts are just sort of a poorer version of DCHECKs. So they won't go through

				our crash handling. It won't print the pretty stacks, et cetera.

				`static_assert`s, on the other hand, are a compile time feature. And we don't,

				as far as I know, have our own wrapper around it. We just use `static_assert`.

				So what you would maybe use this for is like if you have a constant - like, say

				you have an array, and the code makes an assumption that some constant is the

				size of this array, you can assert that in compile time, and that would be a

				good use of a `static_assert`.

				03:26 SHARON: OK. Cool. So you mentioned that some things have changed with how

				DCHECKs work. So can you give us a brief overview of the history of DCHECKs -

				what they used to be, people who have been using them for a while, how might

				they have changed from the idea of what they have as a DCHECK in their mind?

				03:43 PETER: Sure. So this is as best I know. I'm just sort of extrapolating

				from what I've seen. And what I think originally was true is that a CHECK used

				to be this logging statement, where you essentially compile the file name and

				the line number. And if this ever hits, then we'll log some stuff and then

				crash. Right? Which comes with a little bit of overhead, especially on size,

				that you basically take the file name and line number for every instance, and

				that generates a bunch of strings and numbers that essentially add to Chrome's

				binary size. I don't know how many steps between that and where we currently

				are. But right now, our CHECKs are just, if condition is false, crash, which

				means that you won't, out of the CHECK, get file name and line number. We'll

				get those out of debugging symbols. And you also won't get any of the logging

				messages that you can add to the end of a CHECK, which means that your debug

				info will be poorer, but it will be cheaper to use. So they've gotten from

				being pretty heavy CHECKs to being really cheap.

				05:01 SHARON: OK. So that kind of leads us into the question that I think most

				people want to have answered, which is, when should I use a DCHECK? When should

				I use a CHECK? When should I use neither?

				05:13 PETER: I would say that historically, we've said CHECKs are expensive.

				Don't use them unless you sort of have to. And I don't think that holds true

				anymore. So basically, unless you are in really performance-critical code, then

				use a CHECK. If there's anything that you care about where the program state

				will be unpredictable from this point on if it's not true, CHECK it. It's not

				that expensive. Right? We have a lot of code where we push a string onto a

				vector, and that never gets flagged in code review. And it's probably like 10

				times more expensive, if not 100 times more expensive, than adding a CHECK. The

				exception to that is if you're in a really hot loop where you don't want to

				dereference a pointer, then a CHECK might add some cost. And the other is if

				the condition that you're trying to validate is really expensive. It's not the

				CHECK itself that's expensive. It's the thing you're evaluating. And if that's

				expensive, then you might not afford doing a CHECK. If you don't know that it's

				expensive, it's probably not expensive.

				06:20 SHARON: Can you give us an example of something expensive to evaluate for

				a CHECK?

				06:24 PETER: Right. So say that you have something in video code that for every

				video frame, for every pixel validates the alpha value as opaque, or something.

				That would probably make video conferencing a little bit worse performance.

				Another thing would just be if you have to traverse a graph on every frame, and

				it will sort of jump all over memory to see if some reachability problem in

				your graph is true, that's going to be a lot more expensive. But CHECKing that

				index is less than some vector bounds, I think that should fall under cheap.

				And -

				07:02 SHARON: OK.

				07:02 PETER: culturally, we've tried to avoid doing a lot of these. And I think

				it's just hurting us.

				07:09 SHARON: OK. So since most places we should use CHECKs, are there any

				places where a DCHECK would be better then? Or any time you would have normally

				previously used a DCHECK, you should just make that a check?

				07:23 PETER: So we have a new construct that's called `EXPENSIVE_DCHECK`s, or

				if `EXPENSIVE_DCHECK`s are on, I think we should add a corresponding macro for

				`EXPENSIVE_DCHECK`. And then you should be able to just say, either it's

				expensive and has to be a DCHECK, so use `EXPENSIVE_DCHECK`; otherwise, use

				CHECK. And my hunch would be like 95% of what we have as DCHECKs would probably

				serve us better as CHECKs. But your code owner and reviewer might disagree with

				that. And it's not yet documented policy that we say CHECKs are cheap; just add

				a billion of them. But I would like to get there eventually.

				08:04 SHARON: OK. So if you put in a CHECK, and your reviewer tells them this

				should be a DCHECK, the person writing the CL can point them to this video, and

				then they can discuss from there.

				08:13 PETER: I mean, yeah, you can either say Peter disagrees with you, or I

				can get further along this and say we make policy that CHECKs are cheap, so

				they are preferable. So a lot of foot-shooters with DCHECKs is that you expect

				this property to hold true, but you never effectively CHECK it. And that can

				lead to all sorts of bad stuff, right? Like if you're trying to DCHECK that

				some origin for some frame makes some assumptions of site iso - I don't know

				site isolation well enough to say this. But basically, if you're DCHECKing that

				the code that you're running runs under some sort of permissions, then that is

				effectively unchecked in stable, right? And we do care about those properties,

				and it would be really good if we crashed rather than leaked information

				between sites.

				09:12 SHARON: Right.

				09:14 PETER: Yeah.

				09:16 SHARON: So that seems like a good tie-in for the fact that within some

				security people, they don't have the most positive impression of DCHECKs, shall

				we say? So a couple examples of this, for listeners who maybe aren't familiar

				with this, is one person previously on security saying DCHECKs are pronounced

				as "code that's not tested". Someone else I told about this episode - I said,

				we're going to talk about DCHECKs - they immediately said, is it going to be

				about why DCHECKs are bad? So amongst the Chrome security folks, they are not a

				huge fan of DCHECKs. Can you tell us maybe why that is?

				09:51 PETER: So if we go back a little bit in time, it used to be that DCHECKs

				were only built for developers if they do a debug build. And Chrome has gotten

				so big that you don't want to do a debug build or the UI is incredibly slow.

				Unfortunately, it's sort of not that great an experience to work in a debug

				build. So people work in a release build. That doesn't mean that they don't

				care about the things they put under DCHECK. It just means they want to go on

				with their lives and not wait x minutes for the browser to launch, or however

				bad it is nowadays. And that means that they, unfortunately, lose coverage for

				the DCHECKs. So this means that if your code is not exercised well under tests,

				then this is completely not enforced. But it's slightly better than a comment,

				in that you're really expecting this thing to hold true, and that's clearly an

				expectation. But how good is the expectation if you don't look at it? So last

				year, I believe, we made it so that DCHECKs are on by default if you're not

				doing an official build. And this included release builds. So now, it's like at

				least if you're doing development and you hit this condition, it's going to

				explode, which is really good, because then you can find a lot of issues, and

				we can prevent a lot of issues from ever happening in the first place. It is

				really hard for you, as a developer, to make the assumption that if this

				invariant is ever false, I will find it during development, and it will never

				happen in the wild. And DCHECKs are essentially either, I will find this

				locally before I submit it, or all bets are off; or it is I don't care that

				much if this thing doesn't hold true, which is sort of a weird assertion to

				make. So I think we're in this little awkward in-between state. And this

				in-between state, remember, mostly exists as a performance optimization from

				when CHECKs used to be a lot more expensive, in terms of code size. So did I

				cover most of this?

				12:06 SHARON: Yeah. I think, based on that, I think it's pretty easy to see why

				people who are more concerned about security are not a fan of this.

				12:13 PETER: I mean, if you care about it, especially if it causes privacy or

				security or user-harm sort of things, just CHECK. Just CHECK, right? If it

				makes your code animate a thing slightly weirder, like it will just jump to the

				end position instead of going through your fence load, whatever. Maybe you can

				make that a DCHECK. Maybe it doesn't matter. Like it's wrong, but it's not that

				bad. But most of the cases, you DCHECK something, where it's like the program

				is going to be in some indeterminate state, and we actually care about if it's

				ever false. So maybe we can afford to make it a CHECK. Maybe we should look

				more about our sort of vector pushbacks than we should look at our CHECKs, and

				then just have more CHECKs. More CHECKs. Because it's also like when things

				break, it's a lot cheaper to debug a DCHECK than your program is in some

				indeterminate state, because it was allowed to pass through a DCHECK that you

				thought was - and when you read the code, unless you're used to reading it as

				DCHECKs - oh, that just didn't get enforced - it's sort of hard to try to

				figure out why the thing was doing the wrong thing in the first place.

				13:22 SHARON: OK. How is this as a summary? When in doubt, CHECK it out.

				13:27 PETER: I like that. I like that. And you might get pushback by reviewers,

				who aren't on my side of the fence yet. And then you can decide on which hill

				you want to die on, at least until we've made policy to just not complain about

				DCHECKs, or not complain about CHECKs.

				13:45 SHARON: All right. That sounds good. So you mentioned stuff failing in

				the wild. And for people who might not know, do you want to just briefly

				explain what failing in the wild means?

				13:54 PETER: OK. So there's two things. Just failing in the wild just means

				that when this thing rolls out to Canary, Dev, Beta, Stable, if you have a

				CHECK that will crash and generate a crash report as if you had a memory bug,

				but it crashes in a deterministic way, at a deterministic spot - so you can

				find out exactly what assumption was violated. Say that this should never be

				called with a null pointer. Then you can say, look at this line where it

				crashed. It clearly got hit with a null pointer. And then you can try to figure

				out, from the stack, why that happened, rather than after you post this pointer

				to a task, it crashes somewhere completely irrelevant from the actual call

				site. Well, so in the wild specifically means it generates a crash report so

				you can look at it, or in the wild means it crashes at a user computer rather

				than - in the wildness outside of development. And as for the other part of in

				the wild, it's that we have started running non-crashy DCHECKs for a percentage

				of Windows Canary. And we're looking to expand that. And we're gathering

				information, basically, about which assertions or invariants that we have are

				violated in practice in the wild, even though we don't think that they should

				be. And that will sort of also culturally move the needle so that we do care

				about DCHECKs. And when we care about DCHECKs, sort of similarly to how we care

				about CHECKs, is it really that important to make the big distinction between

				the two? Except for the case where you have really expensive DCHECKs, they

				might still be worth keeping separate. And those will be things like, if you do

				things for - say that you zero out memory or something for every memory block

				that you allocate and free, or you do things for every audio sample, or for

				every video frame pixel, those sort of things. And then we can sort of keep

				expensive stuff gated out from CHECKs. And then maybe we don't need this

				in-between where people don't know whether they can trust a DCHECK or not.

				16:04 SHARON: So you mentioned that certain release builds now have DCHECKs

				enabled. So for those in the wild versus regular CHECKs in the wild, if those

				happen to fail, do the reports for those look the same? Are they in the same

				place? Can they be treated the same?

				16:20 PETER: Yeah. Well, they are uploaded to the same crash-reporting thing.

				They show up under a special branch. And you likely will get bugs filed to you

				if they hit very frequently, just like you would with crashes. There's a sort

				of slight difference, in that they say dump without crashing. And that's just

				sort of a rollout strategy for us. Because if we made DCHECK builds incredibly

				crashy, because they hit more than CHECKs, then we can never roll this thing

				out. Or it gets a lot scarier for us to put this on 5% of a new platform that

				we haven't tested. But as it is right now, the first DCHECK that gets hit for

				every process gets a crash dump uploaded.

				17:07 SHARON: OK. So I've been definitely told to use dump without crashing at

				certain points in CLs, where it's like, OK, we think that this shouldn't

				happen. But if it does, we don't necessarily want to crash the browser because

				of it. With the changes you've mentioned to DCHECKs happening, should those

				just be CHECKs instead now or should those still be dump without crashing?

				17:29 PETER: So if you want dump without crashing, and you made those a DCHECK,

				then you would only have coverage in the Canary channels that we are testing.

				Right? So if you want to get dump reports from the platforms that we're not

				currently testing, including all the way up to Stable, you probably still want

				to keep that a dump without crashing. You want to make sure that you're not

				using the sort of - you want to make sure that you triage these, because you

				don't want to keep these generating crash dumps n forever. You should still

				treat them as if they were crashes. And I think the same thing should hold true

				for DCHECKs. You should only add them for an invariant that you care about

				being violated, right? So as it is violated, you should either figure out why

				your invariant was wrong, or you should try to fix the breakage. And you can

				probably add more information to logging to figure out why that happened.

				18:41 SHARON: So when you have a CHECK, and it crashes in the wild, you get a

				stack trace. And that's what you have to work on to figure out what went wrong

				for debugging. Right? So what are some things that you can do, as a developer,

				to make these CHECKs a bit more useful for you - ways to incorporate other

				information that you can use to help yourself debug?

				19:01 PETER: So some of the stuff that we have is we have something called

				crash keys, which are essentially, you can write a piece of string data,

				essentially - there's probably some other data types - and if you write those

				before you're running dump without crashing, or before you hit a CHECK, or

				before you hit a DCHECK, then those will be uploaded along the crash dump. And

				if you talk to someone who knows where to find them, you can basically go in

				under a crash report, and then under field product data, or something like

				that, you should be able to find your key-value pair. And if you have

				information in there, you'll be able to look at it. The other thing that I like

				to do, which is probably the more obvious thing, is if you have somewhat of a

				hypothesis that this thing should only fail if a or b or c is not true, then

				you can add CHECKs for those. Like, if a CHECK is failing, you can add more

				CHECKs to see why the CHECK was failing. In general, you're not going to get as

				much out of a mini-dump that you want. You're not going to have the full heap

				available to you, because that would be a mega-dump. You can usually find

				whatever is on the stack if you go in with a debugger. And I know that you

				wanted to lead me into talking about CHECK\_GT and CHECK\_EQ, which are

				essentially, if you want to check that x is greater than y, then you should use

				CHECK\_GT(x,y). The problem with those, in this sort of context, is that,

				similarly to CHECKs - so CHECK\_GT gets compiled into, basically, if not x is

				greater than y, crash. So unfortunately, the values of x and y are optimized

				out when you're doing an official build.

				21:02 SHARON: So this makes me think of some stuff we mentioned in the last

				episode, which was with Dana. Check it out if you haven't. But one of the types

				we mentioned there was SafeRef, which enforces a certain condition. And if that

				fails - so in the case of a SafeRef, it ensures that the value you have there

				is not null. And if that's ever not true, then you do get a crash similar to if

				a CHECK fails. So in general, would you say it's better practice to enforce and

				make sure your assumptions are held in these other, more structural ways than

				relying on CHECKs instead?

				21:41 PETER: So let me see if I can get at what you actually want out of that

				one. So if we look at - there's a RawRef type, right? So what's good with the

				RawRef is that you have a type that annotates that this thing cannot possibly

				be null. So if you assign to it, and you're assigning a null pointer, your

				program is going to crash, and you don't need to think about whether you throw

				a null pointer in or not. If you keep passing a RawRef around, then that's

				essentially you passing around a non-null pointer. And therefore, you don't

				have to check that it's not null pointer in every step of the way. You only

				need to do it when you're - I mean, the type will do it for you, but it only

				needs to happen when you're converting from a pointer to a ref, essentially, or

				a RawRef. And what's so good about that is now you have the - previously, you

				might just CHECK that this isn't called with null pointer or whatever. But then

				you would do that for four or five arguments. And you'd be like, null pointer

				CHECKs are this part of the function body. And then it just gets super-noisy.

				But if you're using the RawRef types, then the semantics of the type will

				enforce that for you. And you don't have to think about that when reading the

				code, because usually when you read the code, you're going to be like, it's a

				pointer. Can it be null or not? What does it point to? And this thing will at

				least tell you, it can't be null. And you still have the question of, what does

				it point to? And that's fine. So I like enforcing this through types more than

				checking those assumptions, and then checking inside of what happens. If you

				were assigned to this RawRef, then it's going to crash in the constructor if

				you have a null pointer. And then based on that stack trace, if we have good

				stack data, you're going to know at what line you created the RawRef. And

				therefore, it's equivalent to checking for not null pointer, because you can

				trust the type to do the checking. And since I know Dana made this, I can

				probably with 200% certainty say that it's a CHECK and not a DCHECK. But we do

				have a couple of other places where you have a WeakPtr that shouldn't be

				dereferenced on the wrong sequence. And those are complicated words. And that,

				unfortunately, is a DCHECK. So we're hitting some sort of - I don't know if

				that CHECK is actually expensive, or if it should be a CHECK, or if it could be

				a CHECK. I think, especially, if you're in core types, the size overhead of

				adding a CHECK is negligible, because all of the users of it benefit from that

				CHECK. So unless it's incredibly -

				24:28 SHARON: What do you mean by core types?

				24:30 PETER: Say that you make a `scoped_refptr` something, that ref pointer is

				used everywhere. So if you CHECKed in the destructor, then you're validating

				all of the clients of your scope ref pointer. So for one CHECK, you get the

				price of a lot of CHECKing. Whereas if in your client code you're validating

				some parameters of an API call that only gets called once, then that's one

				CHECK you add for one case. But if you're re-use, then your CHECK gets a lot

				more value. And it's also easier to get parameters wrong sometimes if you have

				500 clients that are calling your API. You can't trust all of them to get it

				right. Whereas if you're just developing your feature, and it's only used by

				your feature, then you can be a little bit more certain with how it's being

				called. I would say, still add CHECKs, because code evolves over time. It's

				sort of like how you can add unit tests to make sure that no one breaks your

				code in the future. If you add CHECKs, then no one can break your code in the

				future.

				25:37 SHARON: Mm-hmm. OK. So you mentioned a few things about how CHECKs and

				DCHECKs are changing. [AUDIO OUT] what is currently in the works, and what is

				the long-term goal and plan for CHECKs and DCHECKs.

				25:53 PETER: So currently what's in the work is we've made sure that some

				libraries that we use, like Abseil and WebRTC, which is a first-party

				third-party library, that they both use Chrome's crashing report system, which

				means that you get more predictable crash stacks because it's using the

				immediate crash macro. But also, you get the fatal logging field that I talked

				about. That gets logged as part of crash dumps. So you hopefully have more

				glanceable, actionable crash reports whenever a CHECK is violated inside of

				Abseil, or in WebRTC, as it were. And then upcoming is we want to make sure

				that we keep an eye out for our DCHECKs on other platforms, such as Mac. I know

				that there's some issues with getting that fatal log field in the GPU process,

				and I'm working on fixing that as well. So hopefully, it just means more

				reports for the things you care about and easier to action on reports. That's

				what we're hoping.

				27:03 SHARON: If people think that this sounds really cool, want to have some

				more involvement, or want to ask more questions, what's a good place for them

				to do that?

				27:11 PETER: I like Slack as a thing for this. So the #cxx channel on Slack,

				the #base channel on Slack, the #halp channel on Slack is really good. #halp is

				really, I think, unintimidating. You can just throw whatever question you have

				in there, and I happen to be around there. If you can find out what my last

				name is through sheer force of will, you can send me an email to my Chromium

				username. What else would we have? I think if they want to get involved, just

				add CHECKs to your code. That's a really good way to do it. Just make sure that

				your code does what you expect it to in more cases.

				27:48 SHARON: Maybe if you have a CL, and you're just doing some drive-by

				cleanup, you can turn some DCHECKs into CHECKs also?

				27:56 PETER: If your reviewer is cool with that, I'm cool with that. Otherwise,

				you can just try to hope for us making that policy that we use CHECKs - if it's

				something we care about, we use a CHECK instead of a DCHECK, unless we have a

				really good reason to use a DCHECK. And that would be performance.

				28:15 SHARON: That sounds good. And one last question is, what do you want

				people to take away as their main takeaway from this discussion?

				28:26 PETER: I think validating code assumptions is really valuable. So you

				think that you're pretty smart when you're writing something, or you remember -

				I mean, you're sometimes kind of smart when you're writing something. And

				you're like, this can't possibly be wrong. And in practice, looking at crash

				reports, these things are wrong all the time. So please validate any

				assumptions that you make. It's also, I would say, better than a comment,

				because it's a comment that doesn't get outdated without you noticing it. So, I

				think, validate your assumptions to make sure that your code is more robust.

				And validate properties you care about. And don't be afraid to use CHECKs.

				29:13 SHARON: All right. That sounds like a good summary. Thank you very much

				for being here, Peter. It was great to learn about DCHECKs.

				29:18 PETER: Yeah. Thanks for having me.

				29:24 SHARON: Action. Hello.

				29:26 PETER: Oh. Take four.

				29:29 SHARON: [LAUGHS] Take four. And action.

									
										488

docs/transcripts/wuwt-e03-content.md
									
										Normal file
									
				@ -0,0 +1,488 @@

				# What’s Up With //content

				This is a transcript of [What's Up With

				That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)

				Episode 3, a 2022 video discussion between [Sharon (yangsharon@chromium.org)

				and John (jam@chromium.org)](https://www.youtube.com/watch?v=SD3cjzZl25I).

				The transcript was automatically generated by speech-to-text software. It may

				contain minor errors.

				---

				What lives in the content directory? What is the content layer? How does it fit

				into Chrome and the web at large? Here to answer all that and more is today’s

				special guest, John, who not only is a Content owner, but actually split the

				codebase to create the Content layer.

				Notes:

				- https://docs.google.com/document/d/1EJnG5gK8rQwHkdZTKl8vIwx9oScP8TaKBgwzBafIh9M/edit

				Links:

				- [//content/README.md](https://crsrc.org/c/content/README.md)

				- [//content/public/README.md](https://crsrc.org/c/content/public/README.md)

				- [What's Up With Pointers](https://www.youtube.com/watch?v=MpwbWSEDfjM)

				---

				00:00 SHARON: Hello, and welcome to "What's Up with That", the series that

				demystifies all things Chrome. I'm your host, Sharon, and today, we're talking

				about content. What lives in the content directory? What is the content layer?

				How does it fit into Chrome and the web at large? Here to answer all of that

				and more is today's special guest, John. He's not only a content owner, but

				actually split the code base to create the content layer. Since then, a theme

				of his work has been Chrome's architecture, and how to make it usable by

				others. He's been involved far and wide across Chrome, but today, we're

				focusing on content. John, welcome to the program.

				00:33 JOHN: Hi, everyone, and thanks for setting this up, Sharon. My name's

				John, and I'm happy to try to shed some light and history on this part of the

				Chrome codebase. I've had the pleasure of working on a lot of different parts

				of Chrome over a number of years I've worked on it. A theme of my work has been

				on the architecture of Chrome and making it reusable by other products. And one

				of the projects has been splitting up the codebase and helping create this

				content layer.

				01:02 SHARON: So, can you tell us what the content layer is? Because content is

				a very overloaded term, and we're going to say it a lot today. So you mentioned

				the content layer. Can you tell us what that is?

				01:10 JOHN: Yes. The content layer is a part of the Chrome codebase that's

				responsible for the multiprocess sandbox implementation of our platform.

				01:24 SHARON: And another term that I had heard a lot tossed around before I

				really understood what was going on was the content public API. So is that the

				same as the content layer, or is that different?

				01:36 JOHN: It's part of it. So the content component is very large, and so,

				we've surrounded it by this small public API. So that you hide the

				implementation details and the private directories, and then, embedders just

				only have access to a small public layer.

				01:56 SHARON: How did we end up with this content layer? Can you give us a bit

				of history of how we came up with it? And also, maybe why it's called content?

				02:02 JOHN: Sure. The history is - in the beginning, Chrome, like all software

				projects begins nice and easy to understand. But over time, as you add a lot

				more features to go from zero users to billions of users, it becomes harder to

				understand. Small files, small classes become much larger small functions kind

				of get numerous hooks to talk to every feature, because they want to know when

				something happens. And so, this idea started that let's separate the product.

				Things that make Google Chrome what it is from the platform, which is what any

				browser, any minimal browser doing the latest HTML specs would need to

				implement them in a sandbox, a multiprocess way. And so, content was the lower

				part, and that's how it started.

				02:58 SHARON: How did we get the name content?

				02:58 JOHN: The name is like a pun. And when we started Chrome, one of the

				ideas was, we'll focus on content and not Chrome, and so, the browser will get

				out of the way. Chrome is a term used to refer to all the user interface parts

				of the browser. And so, we said, it's going to be content and not Chrome. And

				so, when you open Chrome, you just see a very small UI. Most of what you see is

				the content. And so, when we split the directory, it was originally called

				Source Chrome, and so, the content part, that's the pun. That's where it came

				from.

				03:34 SHARON: That's fun. Earlier, you mentioned embedders of content. Can you

				tell us what an embedder of content is? And this is part of why I was very

				excited about this episode, because I was working on a team where we were

				embedders of content for a long time. Well over a year, and it took me a long

				time to really understand what that was. Because, as you mentioned now,

				Chrome's grown a lot. You work on a very specific thing understanding these

				more general concepts of what is content? What is a content embedder are less

				important to what you do day-to-day. But can you tell us what an embedder of

				content is?

				04:13 JOHN: Sure. An embedder of content is simply anybody who chooses to use

				that code to build a browser on top of it. And so, in the beginning, right when

				we did this, the goal was just to have one embedder. Or not the goal, what we

				had was just one embedder. It was Chrome. But then, right away, we were like,

				you know what? It would be nice for people who work on content and not the

				feature part to build a smaller binary. It builds faster. It debugs faster,

				runs faster. And so, we built this minimal example also to other people called

				content shell. And then, we started running tests against that, and that was

				the first - or the second embedder of content. And then since then, what was

				unexpected, what we started for code health reasons turned out to be very

				useful for other projects to restart - or start building their browser from.

				And so, things like Android webview, which was using its own fork of web kit,

				then started using content. That was one first-party example. But then, other

				projects came along. Things like Electron and content-embedded framework, all

				started building not just products on top of it, but other frameworks.

				05:30 SHARON: That was really surprising to learn about, because it seems

				unsurprising that you would build another browser based on Chromium. And people

				have heard about this when Edge switched over to Chromium. But to learn that

				things like Electron are built around content seem really surprising, because

				that's very different from what a browser is.

				05:52 JOHN: But they have common needs. They have some HTML data, and they want

				to render it and do so in a safe, and stable, and secure way. And that's not

				their value add, working on that code. So it's better for them to use something

				else.

				06:11 SHARON: That makes sense. You also mentioned that Chrome is dependent on

				content. And when I first started working on Chrome as an intern, I had - it

				told to me so many times because I couldn't remember that Chrome can depend on

				content, but not the other way around. So can you tell us a bit about this

				layering, and why it's there?

				06:31 JOHN: I should also start by saying, content is not just - when we say

				content, often what we mean, you embed content. You embed content in everything

				that sits below it in the layer tree. So that includes things like Blink, our

				rendering engine. V8, our JavaScript Engine. Net, our networking library, and

				so on. And there's also you can talk to the content public APIs, but also,

				sometimes, you talk to the Blink API and the files, and V8, and so on.

				07:07 SHARON: So you have this many layer API or product? And, at the bottom,

				we have things like Net, Blink, and those probably have dependencies on them

				that I don't know about. And on top of that, we have content, and then, on top

				of that, we have Chrome?

				07:23 JOHN: Right. And so, Chrome as an embedder content can include directory

				in the content public API. But since content can have multiple embedders, it

				can't include Chrome. If content reached out directly to Chrome, then other

				people wouldn't be able to use it. Because if you try to bring in this code, it

				includes files from a directory that you're not using. So, instead, the content

				public API, it has APIs going two different directions. One direction is going

				into content, and then, one direction are these abstract interfaces that go out

				from content. And any embedder has to implement them. And so, these usually end

				up in terms like client or delegate. And these are implemented by Chrome, and

				that's how content is able to call back to it. But then, any other, of course,

				product or embedder can also implement these same interfaces.

				08:23 SHARON: You mentioned link and also some things called delegate and

				whatever. So we have a lot of things called something something host in

				content. Can you talk a bit about what the relationship between content and

				Blink is? Because there's a lot of mirroring in terms of how they might be set

				up, and how they relate to each other.

				08:37 JOHN: So Blink was the rendering engine that originally started as Web

				Kit. And we forked, and we named it Blink a number of years ago. And that did

				not have any concept of processes. So it was something that you call it in one

				process, and it does its job. And you give it whatever data it needs, and it

				gives you back the rendered data. And you can poke at it or whatever you want

				to do with it. But you needed to wrap that with some - you needed a bunch of

				code around it to make it multi-process. And also, to figure out when it needs

				something that's not available in the sandbox that it runs in, you have to

				provide that data. And so, this is where the content layer comes in. It's the

				one that wraps the rendering engine and uses the networking library and other

				things to be able to create a fully working browser.

				09:33 SHARON: More about processes. So it's easy to think, maybe, that the

				content - the relationship between the content layer and the browser process.

				So can you just talk a bit about how processes work in content? And what the

				content API provides in terms of accessing these processes?

				09:54 JOHN: So the content code runs in - it's the initial process that runs.

				Content starts up, and then - and so, it's in the browser process. But it also

				creates the render processes for where Blink runs. It creates a GPU process

				that talks to the GPU and where a bunch of the compositing happens. It creates

				a network process where we do networking. It creates other processes, things

				like audio on some platforms, storage process to isolate storage. And then, a

				lot of short lived processes for security and stability reasons. And so, you

				can have processes that run content code, but, sometimes, an embedder wants to

				run its own code in a different process. So it could re-use the same helpers

				that content has for creating a process, and we'll use that. And then, I think

				I didn't fully answer your previous question yet, which was the host part. So,

				often, you'll have classes in Blink that are running in the renderer process,

				and you need an equivalent class to drive it from the browser process. And

				that's where we often have the host suffix. So it'd be like a class for -

				11:11 SHARON: Can you give an example of -

				11:11 JOHN: Yes. So, for example, every renderer process has a class in content

				browser called render process host. And then, every tab object in Blink will

				have this class called render view, and then, in content browser, it will have

				this class called render view host.

				11:36 SHARON: Those are classes that, depending on what you work on, you might

				see pop up quite a bit. And there's a lot of them. They're all called render

				something host, and it's a bit tough to keep them straight. But that makes

				sense as to why they're called render and - why render and host are in the

				names for them. So you just listed a bunch of different process types. The GPU

				process, the browser process, render processes. And, usually, whenever we have

				different processes, we have some security boundary between them. Can you talk

				a bit about how security and the content layer overlap? Is the content API a

				security boundary? What happens if someone calls it maliciously? What could go

				wrong if they do and do it successfully?

				12:26 JOHN: So the security boundaries in any browser built on top of content

				is the processes. We separate things to not just have render processes per tab,

				but there are multiple render processes per tab thanks to the amazing work of

				the Site Isolation project. And that's what split up different iframes into

				different processes. And so, how they talk, all these processes talk through

				IPC, and our current IPC system's called Mojo. And so, any time you talk, you

				use Mojo between processes. You're usually talking from between processes of

				different privileges. And so, one could be sandboxed and the other one not

				sandboxed. Or one could be sandboxed, and the other one only partially

				sandboxed. So you have to scrutinize any time you use these Mojo calls to make

				sure that they can't inadvertently lead to a security vulnerability. Now, even

				those, as hard as you can, people could still misuse code. Or, also, embedders

				like Chrome or other content embedders can add their own IPCs. So content

				obviously doesn't know about the IPCs from other layers, and so, it's possible

				that it could be an embedder of content that has security vulnerability in

				their own Mojo calls. And so, content doesn't know about them, so it can't do

				anything about them. You could write insecure code in content. You can also

				write in secure code in an embedder, and if someone finds a vulnerability - so

				let's say someone finds a vulnerability in Blink, and maybe they're only

				running their code in a minimal content shell. Maybe they can't find any other

				Mojo calls that they can abuse to be able to get access to the browser process.

				But maybe someone else, an embedder, is a more full-featured browser. It has

				more IPC service, and that could be more of an attack surface for that - to

				start with that Blink vulnerability and then to hop into the browser process.

				14:38 SHARON: And if you gain control of the browser process, that's a very

				highly privileged process.

				14:44 JOHN: Because that has full access to your system. So that's the point

				where you can leave persistent changes to the user system, which is pretty bad.

				14:55 SHARON: That sounds not great. So if you're an average, say, Chrome

				engineer, that could be anyone. This is probably not too much of a concern. All

				the stuff we mentioned, this is good to know. How would a Chrome engineer who

				doesn't directly work on content or in the content directory interact with the

				content layer?

				15:20 JOHN: Well, they might need a signal from Blink, for example. That's

				often how someone will do that. They'll be working on a feature in the browser,

				and everything works great. But then, they'll be like, I just need something

				from Blink. But it's not there. And so, sometimes, they'll have to add an IPC

				between processes, and that might interact. They'll be like, how do I get it?

				It's in Blink. It's in the render view class. so I need an interface that talks

				between each render view host and each render view. And that's how they might

				get - well, that would be how they get interaction with the multiprocessor part

				of it. But if someone is just working on something only in a browser process,

				they might still be trying to get information about the current tab. And that's

				represented by a web content's class and content. So they'll look in content

				public browser, and they'll see web contents. And there will be a lot of

				interfaces that hang off it. So they'll be looking at it, going through a trail

				of interfaces and classes to be able to get more information on what's going on

				in the current tab.

				16:29 SHARON: Can you give us a quick overview of the Web Content class?

				Because it is one, massive, and two, called something like web contents. Which

				suggests it's important because content plus the web, and it's also something

				you see all over the place. So can you just give us a quick overview of what

				that class does? What it's for? What it represents?

				16:46 JOHN: Yes. Things now are a lot more complicated than before, but if you

				go back in a time machine and see how these things started, you can roughly

				think in initial Chrome. Every tab had a class to represent the content in that

				tab, and that was called web contents. And then, it was called web contents

				because we had other classes. We used to be able to put native stuff in a tab.

				And so, that would be called tab contents. But that's gone now, and we just

				have web contents. So that's where the name comes from. And then even, for

				example, there was render process host, which I mentioned earlier. And then,

				each tab, each web contents roughly translate into one render process. And so,

				now, it's a bit more complicated. There are examples where you can have web

				contents inside of web contents, and that's more esoteric that most people

				don't have to deal with. And then, so that's what web contents is for. It will

				do things like take input and feed it to the page. Every time there's a

				permission prompt, you usually go through that. If a page wants to access to a

				microphone, or video, and so on. It keeps track of this navigation going on.

				What's the current URL? What's the pending URL? It uses other classes to drive

				all that stuff as you send out the network request and get it back. And that's

				not inside of web contents itself, but it's driven by other helper classes.

				18:28 SHARON: I tend to think of content as being the home of navigation, which

				I think is a decent way to think about it and also is maybe biased because of

				the stuff I've been working on. But you have Chrome, and navigation, and

				content, and all the stuff here. And then, separately, you have the actual web,

				the internet. And that has things like actual websites. And there are web

				standards, and there's things like HTML. And these two things somehow have to

				intersect. But being on the Chrome side, working on Chrome, apart from writing

				some browser tests, maybe, you never really interact with any of the more web

				things. JavaScript, you don't really touch. That's more Blink and HTML only in

				a test kind of thing. So how do these web standards - there's navigation web

				standards and all that. How do we actually make sure that they're implemented

				in Chrome? And where does that happen?

				19:32 JOHN: So that happens all over the code, but there's a few critical

				directories. If you look at net at a low level, a lot of IETF - and some

				aspects will be implemented there at that layer. Either net or in the network

				service, which is a code that runs inside the network process. Then you've got

				V8, of course, our JavaScript engine, and that has to follow the ECMAScript

				standards. And then, there's a lot of the platform standards. Either some of

				them only don't need multiple processes to be - to implement them, so they'll

				just be completely inside Blink. But some of them require multiple processes,

				things that need access to devices and so on. And so, that implementation will

				be split across Blink and content browser. But then, how do you ensure that,

				not only do you implement this correctly, but also that you don't regress it?

				So there's a whole slew of tests. There's the Blink tests, which used to be

				called the layout tests. And those run across the simple, simple test cases for

				many features to make sure that each one works. And there's also this cool

				thing where we share now a lot of these tests with other embedders, and that

				way, you run the same test in every browser. And so, when you write a test, you

				don't have to write it n times. You can just write it once. So that's how we

				ensure that we meet the specs.

				21:10 SHARON: That makes sense. Because I've been pointed - when I was looking

				into a class. What does this do? I've been linked to, say, one of the HTML

				specs or web specs. But the whole time, I'm just thinking, how do we make

				sure - or who's checking that we're actually implementing this and correctly?

				But these tests seem like a good way to do it and also ensure some level of

				consistency across browsers. Assuming you know whether or not the browser you

				use chooses to run these tests or not, I guess.

				21:41 JOHN: And as an engineer on a project like that, the first time you'll

				hit them is when you're breaking them. You'll make a change, and I think this

				is fine. And then, you send it to the commit queue, and you break some layout

				tests. What's happening to me today? And then, you have to drill into it. And

				the nice thing about layout test is because each one is small, you - it's

				faster to figure out what you broke because it's just like, hopefully, you only

				broke a small number of tests.

				22:06 SHARON: For sure, and it's a good example of why we have all these tests,

				is to make sure things don't break. So that is pretty much all the questions I

				have written down. Is there anything else generally content layer, content

				public API-ish related that is interesting that maybe we didn't get a chance to

				cover?

				22:31 JOHN: Yes. The most common questions is people will be like, well, does

				this belong in content or not? So I can have a chance to point people towards

				their README files and content/README that describes what's supposed to go in

				or not. And then, there's also a content/public/README that describes the

				guidelines we have for the API to make it consistent.

				22:59 SHARON: I've definitely seen those questions before. You're updating one

				of the content public APIs. Does this belong? While we're here, can you give us

				a quick breakdown heuristic of what things generally would belong in the

				content public API versus you put it up for review, and the reviewer's like,

				no. This does not belong in content public?

				23:24 JOHN: So sometimes, for example, for convenience, maybe the Chrome layer

				wants to call other parts of Chrome layer, but they don't have a direct

				connection. Or maybe a Chrome layer wants to talk to a different component. And

				so, they'll be like, we'll add something to the content API, and then, that

				way, Chrome can talk to this other part of Chrome or this other component

				through content as a shortcut. We don't allow that, and the reason for that is

				anybody who's gone through the content public directory, it's already huge. And

				so, we feel that if Chrome wants to talk to Chrome or to another layer, they

				should have their own API to each other directly instead of hopping through

				content. Just because the content API's already very large, very complex, hard

				to understand. So we don't want to add things that are absolutely not necessary

				to it. And another thing we try to do is to not add multiple ways of doing

				something. We only add something to the content API when there's no other way

				of getting this data from inside content, or there's no other way of getting

				this data from them better to content. But if there's something similar that

				can do the same thing, we push back on that.

				24:39 SHARON: And also, test-only things? Are those generally OK, or do you

				want to generally avoid those?

				24:45 JOHN: Well, yes. test-only methods, we try really hard - not just for the

				public API, but inside, because we don't want to bloat the binary. But we do

				have content public tests, which is - gives you a lot more leeway to poke at

				things in your browser test, for example, or your unit tests. Another thing is,

				we also have guidelines for how the API should be. We don't have, really,

				concrete classes. It's mostly abstract interfaces. And so, there's a bunch of

				rules there, and they're all listed in content/public/README. Just so people

				know the guidelines we have for interfaces there.

				25:28 SHARON: On the Chrome binary point, how much is the size of the binary

				dependent on the size of the content public API? Is that a big part of the

				binary, or is it small enough where, sure, we want to keep it from being

				unnecessarily large but not too much of an issue?

				25:48 JOHN: The size is not going to come as much from the content/public API

				but just from the entire content and all its dependencies. And those are in the

				tens of megabytes. So, sometimes, for example, if you're bundling the content

				layer, you're not going to be a small binary. You'll just start off in the 30

				megabyte range or 40 megabyte range once you put everything together.

				26:12 SHARON: And I guess that's something you have to be more conscious of if

				you're working in content versus another directory even in Chrome. is that you

				have to be wary of your dependencies more so than anywhere else. Not only for

				Chrome, but also, any other embedders who might want to use content.

				26:31 JOHN: Yes. And so, for example, if someone's trying to add something in

				Chrome, we also ask, does this have to be in content? Of can this be part of

				Chrome, so that not every embedder has to pay that cost if they don't need it?

				Maybe we'll have an interface, and the embedder can plug the data in through

				that way but still not have it in content. Another problem, of course, with

				having data inside content is that not all embedders update at the same speed.

				So if you're putting something in content, it can quickly go stale, the

				content, whatever the data is if you're not updating quickly.

				27:08 SHARON: That make sense. So we mentioned a bit of what content is, a bit

				of the history of it. Can you tell us anything about what are upcoming changes

				that might happen in content? What is the future of the content directory, the

				layer, the API?

				27:28 JOHN: Well, it's always changing. It's not static, driven by the needs of

				the product. And so, you look at big changes happening today like MPArch to

				support various use cases that we didn't have, or we never thought about

				initially. And that's where the web contents, inside web content, some of that

				comes in. There are big changes like banning, for example, pointers and

				replacing them with a raw pointer. So we can try to address some of the

				security problems we have with Use-After-Frees. So that's where, when you look

				at the content code or the Chrome code in general, too, you might see a little

				bit different than that average C++ project that you see. You'll be like, I'm

				getting errors if I try to have a raw pointer, and that's why.

				28:15 SHARON: Check out episode one for more on that. We'll link it below.

				Anything else random content-related or otherwise you would like to share with

				us?

				28:27 JOHN: I think the only other thing I would add is familiarize yourself

				with the READMEs in content/README and content/public/README before making

				changes. That will make the author and reviewer's time more efficient. And if

				you're working on content and below, you can build Content Shell instead of

				Chrome. That would be faster to build and debug and hopefully make you more

				productive.

				28:52 SHARON: Good tips. Hopefully, our viewers follow them. They would never

				try to change a content/public API without reading the READMEs first. Well,

				thank you so much, John, for sitting down and chatting with me about content.

				This was great, and, hopefully, people find it useful.

				29:14 JOHN: And thank you for hosting me, Sharon.

				29:23 SHARON: Did you start working on Chrome from the very start, or just -

				obviously, pre-launch. Because, I think, based on your profile pictures, the

				picture of that comic book that released when Chrome did - which I was lucky

				enough to get a copy of when I was an intern. Shout-out Peter. But that

				obviously suggests you were a major contributor before the public launch of

				Chrome. So were you working on Chrome from the very beginning?

				29:47 JOHN: I was not. It took about six months. I tried to join from the

				beginning, but I couldn't join right at the beginning. So my sneaky way was I

				found another project under that same director who was running Chrome, and

				then, once that project finished in six months, then I jumped into Chrome.

				30:09 SHARON: And do you ever think about how crazy it is from this thing that

				you worked on, effectively, from the start before the public launch? To what it

				is now where Chrome is one of the foundational pieces of the internet at large?

				Any time the internet gets run period, probably something in Chrome is running

				like the next stack, if not, obviously, the browser? Do you ever think about

				that, and how crazy that is? And your place in that?

				30:38 JOHN: Yes. It's amazing how far Chrome has come, and it's really humbling

				to see it be the number one browser, the most widely-used browser. Because when

				we were working on Chrome at the beginning, we were just trying to guess what

				market share it would have. And people would be like, it'll be 10%, and we're

				like, no way. Even the people working on it, we didn't think that was going to

				be possible. So to see users really enjoy using it, and for us to keep

				demonstrating value by sticking to our four principles, security and stability,

				simplicity and speed. And seeing people not just adopt Chrome as a product, but

				Chromium as a platform is - it's beyond our wildest dreams. And it's a

				responsibility that we have every time we make a change to Chrome to all these

				users and developers using it. You were asking earlier, how does it feel to be

				here from the start? There's almost a sense of feeling super lucky. But also

				this humbling feeling where we started in Chrome when it was really small, and

				our knowledge built up incrementally as it got more complicated. But so, it's

				like, well, what if I was to jump in Chrome today? It seems like way too many -

				the code is so complicated now compared to before. This almost responsibility

				we have as being in Chrome for a long time to share knowledge, to help people

				pick it up. Because we would ourselves struggle if we were to jump in now.

				32:22 SHARON: Yes. As those people, we certainly did struggle. But people are

				pretty smart, I think, and they can figure it out. But that doesn't mean you

				can't make it easier for the people in the future figuring it out. Or even

				people who - you just work on a different part. If I were to do anything in

				Blink, I'm just like -

				32:44 JOHN: Same. I've been on it for a long time. I don't touch Blink.

				32:50 SHARON: Yes. Yes.

									
										968

docs/transcripts/wuwt-e04-tests.md
									
										Normal file
									
				@ -0,0 +1,968 @@

				# What’s Up With Tests

				This is a transcript of [What's Up With

				That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)

				Episode 4, a 2022 video discussion between [Sharon (yangsharon@chromium.org)

				and Stephen

				(smcgruer@chromium.org)](https://www.youtube.com/watch?v=KePsimOPSro).

				The transcript was automatically generated by speech-to-text software. It may

				contain minor errors.

				---

				Testing is important! What kinds of tests do we have in Chromium? What are they

				all about? Join in as Stephen, who led Chrome's involvement in web platform

				tests, tells us all about them.

				Notes:

				- https://docs.google.com/document/d/1SRoNMdPn78vwZVX7YzcdpF4cJdHTIV6JLGiVC2dJUaI/edit

				---

				00:00 SHARON: Hello, everyone, and welcome to "What's Up With That," the series

				that demystifies all things Chrome. I'm your host, Sharon. And today we're

				talking testing. Within Chrome, there are so many types of tests. What are they

				all? What's the difference? What are the Chromium-specific quirks? Today's

				guest is Stephen. He previously led Chrome's involvement in web platform tests.

				Since then, he's worked on rendering, payments, and interoperability. As a fun

				aside, he's one of the first people I met who worked on Chrome and is maybe

				part of why I'm here today. So welcome, Stephen.

				00:33 STEPHEN: Well, thank you very much for having me, Sharon, I'm excited to

				be here.

				00:33 SHARON: Yeah, I'm excited to have you here. So today, we're in for maybe

				a longer episode. Testing is a huge topic, especially for something like

				Chrome. So grab a snack, grab a drink, and let's start. We'll start with what

				are all of the things that we have testing for in Chrome. What's the purpose of

				all these tests we have?

				00:51 STEPHEN: Yeah. It's a great question. It's also an interesting one

				because I wanted to put one caveat on this whole episode, which is that there

				is no right answer in testing. Testing, even in the literature, never mind in

				Chromium itself, is not a solved problem. And so you'll hear a lot of different

				opinions. People will have different thoughts. And I'm sure that no matter how

				hard we try, by the end of this episode, our inbox will be filled with angry

				emails from people being like, no, you are wrong. So all of the stuff we're

				saying here today is my opinion, albeit I'll try and be as useful as possible.

				But yeah, so why do we test was the question, right? So there's a lot of

				different reasons that we write tests. Obviously, correctness is the big one.

				You're writing some code, you're creating a feature, you want it to be correct.

				Other reasons we write them, I mean, tests can be useful as a form of

				documentation in itself. If you're ever looking at a class and you're like,

				what does - why is this doing this, why is the code doing this, the test can

				help inform that. They're also useful - I think a topic of this podcast is sort

				of security. Tests can be very useful for security. Often when we have a

				security bug, we go back and we write what are called regression tests, so at

				least we try and never do that security failure again. And then there are other

				reasons. We have tests for performance. We have tests for - our launch process

				uses tests. There's lots and lots of reasons we have tests.

				02:15 SHARON: Now that you've covered all of the different reasons why we test,

				how do we do each of these types of tests in Chromium? What are the test types

				we have?

				02:27 STEPHEN: Yeah. So main test types we have in Chromium, unit tests,

				browser tests, what we call web tests, and then there's a bunch of more

				specialized ones, performance tests, testing on Android, and of course manual

				testing.

				02:43 SHARON: We will get into each of these types now, I guess. The first type

				of test you mentioned is unit tests. Why don't you tell us a quick rundown of

				what unit tests are. I'm sure most people have encountered them or heard of

				them before. But just a quick refresher for those who might not.

				02:55 STEPHEN: Yeah, absolutely. So as the name implies, a unit test is all

				about testing a unit of code. And what that is not very well defined. But you

				can usually think of it as just a class, a file, a small isolated component

				that doesn't have to talk to all the other bits of the code to work. Really,

				the goal is on writing something that's testing just the code under test - so

				that new method you've added or whatever. And it should be quick and easy to

				run.

				03:22 SHARON: So on the screen now we have an example of a pretty typical unit

				test we see in Chrome. So there's three parts here. Let's go through each of

				them. So the first type - the first part of this is `TEST_P`. What is that

				telling us?

				03:38 STEPHEN: Yeah. So that is - in Chromium we use a unit testing framework

				called Google test. It's very commonly used for C++. You'll see it all over the

				place. You can go look up documentation. The test macros, that's what this is,

				are essentially the hook into Google test to say, hey, the thing that's coming

				here is a test. There's three types. There is just test, which it just says

				here is a function. It is a test function. `TEST_F` says that you basically

				have a wrapper class. It's often called a test fixture, which can do some

				common setup across multiple different tests, common teardown, and that sort of

				thing. And finally, `TEST_P` is what we call a parameterized test. And what

				this means is that the test can take some input parameters, and it will run the

				same test with each of those values. Very useful for things like when you want

				to test a new flag. What happens if the flag is on or off?

				04:34 SHARON: That's cool. And a lot of the things we're mentioning for unit

				test also apply to browser test, which we'll cover next. But the

				parameterization is an example of something that carries over to both. So

				that's the first part. That's the `TEST_P`, the macro. What's the second part,

				PendingBeaconHostTest? What is that?

				04:54 STEPHEN: Yeah. So that is the fixture class, the test container class I

				was talking about. So in this case, we're assuming that in order to write a

				beacon test, whatever that is, they have some set up, some teardown they need

				to do. They might want to encapsulate some common functionality. So all you

				have to do to write one of these classes is, you declare a C++ class and you

				subclass from the Google test class name.

				05:23 SHARON: So this is a `TEST_P`, but you mentioned that this is a fixture.

				So are fixture tests a subset of parameterized tests?

				05:35 STEPHEN: Parameterized tests are a subset of fixture test, is that the

				right way around to put it? All parameterized tests are fixtures tests. Yes.

				05:41 SHARON: OK.

				05:41 STEPHEN: You cannot have a parameterized test that does not have a

				fixture class. And the reason for that is how Google test actually works under

				the covers is it passes those parameters to your test class. You will have to

				additionally extend from the `testing::WithParamInterface`. And that says, hey,

				I'm going to take parameters.

				06:04 SHARON: OK. But not all fixture tests are parameterized tests.

				06:04 STEPHEN: Correct.

				06:04 SHARON: OK. And the third part of this, SendOneOfBeacons. What is that?

				06:10 STEPHEN: That is your test name. Whatever you want to call your test,

				whatever you're testing, put it here. Again, naming tests is as hard as naming

				anything. A lot of yak shaving, finding out what exactly you should call the

				test. I particularly enjoy when you see test names that themselves have

				underscores in them. It's great.

				06:30 SHARON: Uh-huh. What do you mean by yak shaving?

				06:35 STEPHEN: Oh, also known as painting a bike shed? Bike shed, is that the

				right word? Anyway, generally speaking -

				06:40 SHARON: Yeah, I've heard -

				06:40 STEPHEN: arguing about pointless things because at the end of the day,

				most of the time it doesn't matter what you call it.

				06:46 SHARON: OK, yeah. So I've written this test. I've decided it's going to

				be parameterized. I've come up with a test fixture for it. I have finally named

				my test. How do I run my tests now?

				06:57 STEPHEN: Yeah. So all of the tests in Chromium are built into different

				test binaries. And these are usually named after the top level directory that

				they're under. So we have `components_unittests`, `content_unittests`. I think

				the Chrome one is just called `unit_tests` because it's special. We should

				really rename that. But I'm going to assume a bunch of legacy things depend on

				it. Once you have built whichever the appropriate binary is, you can just run

				that from your `out` directory, so `out/release/components_unittests`, for

				example. And then that, if you don't pass any flags, will run every single

				components unit test. You probably don't want to do that. They're not that

				slow, but they're not that fast. So there is a flag `--gtest_filter`, which

				allows you to filter. And then it takes a test name after that. The format of

				test names is always test class dot test name. So for example, here

				PendingBeaconHostTest dot SendOneOfBeacons.

				08:04 SHARON: Mm-hmm. And just a fun aside for that one, if you do have

				parameterized tests, it'll have an extra slash and a number at the end. So

				normally, whenever I use it, I just put a star before and after. And that

				generally does - covers the cases.

				08:17 STEPHEN: Yeah, absolutely.

				08:23 SHARON: Cool. So with the actual test names, you will often see them

				prefixed with either `MAYBE_` or `DISABLED_`, or before the test, there will be

				an ifdef with usually a platform and then depending on the cases, it'll prefix

				the test name with something. So I think it's pretty clear what these are

				doing. Maybe is a bit less clear. Disabled pretty clear what that is. But can

				you tell us a bit about these prefixes?

				08:51 STEPHEN: Yeah, absolutely. So this is our way of trying to deal with that

				dreaded thing in testing, flake. So when a test is flaky, when it doesn't

				produce a consistent result, sometimes it fails. We have in Chromium a whole

				continuous integration waterfall. That is a bunch of bots on different

				platforms that are constantly building and running Chrome tests to make sure

				that nothing breaks, that bad changes don't come in. And flaky tests make that

				very hard. When something fails, was that a real failure? And so when a test is

				particularly flaky and is causing sheriffs, the build sheriffs trouble, they

				will come in and they will disable that test. Basically say, hey, sorry, but

				this test is causing too much pain. Now, as you said, the `DISABLED_` prefix,

				that's pretty obvious. If you put that in front of a test, Google test knows

				about it and it says, nope, will not run this test. It will be compiled, but it

				will not be run. `MAYBE_` doesn't actually mean anything. It has no meaning to

				Google test. But that's where you'll see, as you said, you see these ifdefs.

				And that's so that we can disable it on just one platform. So maybe your test

				is flaky only on Mac OS, and you'll see basically, oh, if Mac OS, change the

				name from maybe to disabled. Otherwise, define maybe as the normal test name.

				10:14 SHARON: Makes sense. We'll cover flakiness a bit later. But yeah, that's

				a huge problem. And we'll talk about that for sure. So these prefixes, the

				parameterization and stuff, this applies to both unit and browser tests.

				10:27 STEPHEN: Yeah.

				10:27 SHARON: Right? OK. So what are browser tests? Chrome's a browser. Browser

				test, seems like there's a relation.

				10:34 STEPHEN: Yeah. They test the browser. Isn't it obvious? Yeah. Browser

				tests are our version - our sort of version of an integration or a functional

				test depending on how you look at things. What that really means is they're

				testing larger chunks of the browser at once. They are integrating multiple

				components. And this is somewhere that I think Chrome's a bit weird because in

				many large projects, you can have an integration test that doesn't bring your

				entire product up and in order to run. Unfortunately, or fortunately, I guess

				it depends on your viewpoint, Chrome is so interconnected, it's so

				interdependent, that more or less we have to bring up a huge chunk of the

				browser in order to connect any components together. And so that's what browser

				tests are. When you run one of these, there's a massive amount of machinery in

				the background that goes ahead, and basically brings up the browser, and

				actually runs it for some definition of what a browser is. And then you can

				write a test that pokes at things within that running browser.

				11:42 SHARON: Yeah. I think I've heard before multiple times is that browser

				tests launch the whole browser. And that's -

				11:47 STEPHEN: More or less true. It's - yeah.

				11:47 SHARON: Yes. OK. Does that also mean that because you're running all this

				stuff that all browser tests have fixtures? Is that the case?

				11:59 STEPHEN: Yes, that is the case. Absolutely. So there is only - I think

				it's - oh my goodness, probably on the screen here somewhere. But it's

				`IN_PROC_BROWSER_TEST_F` and `IN_PROC_BROWSER_TEST_P`. There is no version that

				doesn't have a fixture.

				12:15 SHARON: And what does the in proc part of that macro mean?

				12:15 STEPHEN: So that's, as far as I know - and I might get corrected on this.

				I'll be interested to learn. But it refers to the fact that we've run these in

				the same process. Normally, the whole Chromium is a multi-process architecture.

				For the case of testing, we put that aside and just run everything in the same

				process so that it doesn't leak, basically.

				12:38 SHARON: Yeah. There's flags when you run them, like `--single-process`.

				And then there's `--single-process-test`. And they do slightly different

				things. But if you do run into that, probably you will be working with people

				who can answer and explain the differences between those more. So something

				that I've seen quite a bit in browser and unit tests, and only in these, are

				run loops. Can you just briefly touch on what those are and what we use them

				for in tests?

				13:05 STEPHEN: Oh, yeah. That's a fun one. I think actually previous on an

				episode of this very program, you and Dana talked a little bit around the fact

				that Chrome is not a completely synchronous program, that we do we do task

				splitting. We have a task scheduler. And so run loops are part of that,

				basically. They're part of our stack for handling asynchronous tasks. And so

				this comes up in testing because sometimes you might be testing something

				that's not synchronous. It takes a callback, for example, rather than returning

				a value. And so if you just wrote your test as normal, you call the function,

				and you don't - you pass a callback, but then your test function ends. Your

				test function ends before that callback ever runs. Run loop gives you the

				ability to say, hey, put this callback into some controlled run loop. And then

				after that, you can basically say, hey, wait on this run loop. I think it's

				often called quit when idle, which basically says keep running until you have

				no more tasks to run, including our callback, and then finish. They're

				powerful. They're very useful, obviously, with asynchronous code. They're also

				a source of a lot of flake and pain. So handle with care.

				14:24 SHARON: Yeah. Something a tip is maybe using the `--gtest_repeat` flag.

				So that one lets you run your test however number of times you've had to do it.

				14:30 STEPHEN: Yeah.

				14:36 SHARON: And that can help with testing for flakiness or if you're trying

				to debug something flaky. In tests, we have a variety of macros that we use. In

				the unit test and the browser tests, you see a lot of macros, like `EXPECT_EQ`,

				`EXPECT_GT`. These seem like they're part of maybe Google test. Is that true?

				14:54 STEPHEN: Yeah. They come from Google test itself. So they're not

				technically Chromium-specific. But they basically come in two flavors. There's

				the `EXPECT_SOMETHING` macros. And there's the `ASSERT_SOMETHING` macros. And

				the biggest thing to know about them is that expect doesn't actually cause - it

				causes a test to fail, but it doesn't stop the test from executing. The test

				will continue to execute the rest of the code. Assert actually throws an

				exception and stops the test right there. And so this can be useful, for

				example, if you want to line up a bunch of expects. And your code still makes

				sense. You're like, OK, I expect to return object, and it's got these fields.

				And I'm just going to expect each one of the fields. That's probably fine to

				do. And it may be nice to have output that's like, no, actually, both of these

				fields are wrong. Assert is used when you're like, OK, if this fails, the rest

				of the test makes no sense. Very common thing you'll see. Call an API, get back

				some sort of pointer, hopefully a smart pointer, hey. And you're going to be

				like, assert that this pointer is non-null because if this pointer is null,

				everything else is just going to be useless.

				15:57 SHARON: I think we see a lot more expects than asserts in general

				anecdotally from looking at the test. Do you think, in your opinion, that

				people should be using asserts more generously rather than expects, or do we

				maybe want to see what happens - what does go wrong if things continue beyond a

				certain point?

				16:15 STEPHEN: Yeah. I mean, general guidance would be just keep using expect.

				That's fine. It's also not a big deal if your test actually just crashes. It's

				a test. It can crash. It's OK. So use expects. Use an assert if, like I said,

				that the test doesn't make any sense. So most often if you're like, hey, is

				this pointer null or not and I'm going to go do something with this pointer,

				assert it there. That's probably the main time you'd use it.

				16:45 SHARON: A lot of the browser test classes, like the fixture classes

				themselves, are subclass from other base classes.

				16:53 STEPHEN: Mm-hmm.

				16:53 SHARON: Can you tell us about that?

				16:53 STEPHEN: Yeah. So basically, we have one base class for browser tests. I

				think its `BrowserTestBase`, I think it's literally called, which sits at the

				bottom and does a lot of the very low level setup of bringing up a browser. But

				as folks know, there's more than one browser in the Chromium project. There is

				Chrome, the Chrome browser that is the more full-fledged version. But there's

				also content shell, which people might have seen. It's built out of content.

				It's very simple browser. And then there are other things. We have a headless

				mode. There is a headless Chrome you can build which doesn't show any UI. You

				can run it entirely from the command line.

				17:32 SHARON: What's the difference between headless and content shell?

				17:39 STEPHEN: So content shell does have a UI. If you run content shell, you

				will actually see a little UI pop up. What content shell doesn't have is all of

				those features from Chrome that make Chrome Chrome, if you will. So I mean,

				everything from bookmarks, to integration with having an account profile, that

				sort of stuff is not there. I don't think content shell even supports tabs. I

				think it's just one page you get. It's almost entirely used for testing. But

				then, headless, sorry, as I was saying, it's just literally there is no UI

				rendered. It's just headless.

				18:13 SHARON: That sounds like it would make -

				18:13 STEPHEN: And so, yeah. And so - sorry.

				18:13 SHARON: testing faster and easier. Go on.

				18:18 STEPHEN: Yeah. That's a large part of the point, as well as when you want

				to deploy a browser in an environment where you don't see the UI. So for

				example, if you're running on a server or something like that. But yeah. So for

				each of these, we then subclass that `BrowserTestBase` in order to provide

				specific types. So there's content browser test. There's headless browser test.

				And then of course, Chrome has to be special, and they called their version in

				process browser test because it wasn't confusing enough. But again, it's sort

				of straightforward. If you're in Chrome, `/chrome`, use

				`in_process_browser_test`. If you're in `/content`, use `content_browsertest`.

				It's pretty straightforward most of the time.

				18:58 SHARON: That makes sense. Common functions you see overridden from those

				base classes are these set up functions. So they're set, set up on main thread,

				there seems to be a lot of different set up options. Is there anything we

				should know about any of those?

				19:13 STEPHEN: I don't think that - I mean, most of it's fairly

				straightforward. I believe you should mostly be using setup on main thread. I

				can't say that for sure. But generally speaking, setup on main thread, teardown

				on main thread - or is it shutdown main thread? I can't remember - whichever

				the one is for afterwards, are what you should be usually using in a browser

				thread. You can also usually do most of your work in a constructor. That's

				something that people often don't know about testing. I think it's something

				that's changed over time. Even with unit tests, people use the setup function a

				lot. You can just do it in the constructor a lot of the time. Most of

				background initialization has already happened.

				19:45 SHARON: I've definitely wondered that, especially when you have things in

				the constructor as well as in a setup method. It's one of those things where

				you just kind of think, I'm not going to touch this because eh, but -

				19:57 STEPHEN: Yeah. There are some rough edges, I believe. Set up on main

				thread, some things have been initialized that aren't around when your class is

				being constructed. So it is fair. I'm not sure I have any great advice unless -

				other than you may need to dig in if it happens.

				20:19 SHARON: One last thing there. Which one gets run first, the setup

				functions or the constructor?

				20:19 STEPHEN: The constructor always happens first. You have to construct the

				object before you can use it.

				20:25 SHARON: Makes sense. This doesn't specifically relate to a browser test

				or unit test, but it does seem like it's worth mentioning, which is the content

				public test API. So if you want to learn more about content and content public,

				check out episode three with John. But today we're talking about testing. So

				we're talking about content public test. What is in that directory? And how

				does that - how can people use what's in there?

				20:48 STEPHEN: Yeah. It's basically just a bunch of useful helper functions and

				classes for when you are doing mostly browser tests. So for example, there are

				methods in there that will automatically handle navigating the browser to a URL

				and actually waiting till it's finished loading. There are other methods for

				essentially accessing the tab strip of a browser. So if you have multiple tabs

				and you're testing some cross tab thing, methods in there to do that. I think

				that's probably where the content browser test - like base class lives there as

				well. So take a look at it. If you're doing something that you're like, someone

				should write - it's the basic - it's the equivalent of base in many ways for

				testing. It's like, if you're like, someone should have written a library

				function for this, possibly someone has already. And you should take a look.

				And if they haven't, you should write one.

				21:43 SHARON: Yeah. I've definitely heard people, code reviewers, say when you

				want to add something that seems a bit test only to content public, put that in

				content public test because that doesn't get compiled into the actual release

				binaries. So if things are a bit less than ideal there, it's a bit more

				forgiving for a place for that.

				22:02 STEPHEN: Yeah, absolutely. I mean, one of the big things about all of our

				test code is that you can actually make it so that it's in many cases not

				compiled into the binary. And that is both useful for binary size as well as

				you said in case it's concerning. One thing you can do actually in test, by the

				way, for code that you cannot avoid putting into the binary - so let's say

				you've got a class, and for the reasons of testing it because you've not

				written your class properly to do a dependency injection, you need to access a

				member. You need to set a member. But you only want that to happen from test

				code. No real code should ever do this. You can actually name methods blah,

				blah, blah for test or for testing. And this doesn't have any - there's no code

				impact to this. But we have pre-submits that actually go ahead and check, hey,

				are you calling this from code that's not marked as test code? And it will then

				refuse to - it will fail to pre-submit upload if that happens. So it could be

				useful.

				23:03 SHARON: And another thing that relates to that would be the friend test

				or friend something macro that you see in classes. Is that a gtest thing also?

				23:15 STEPHEN: It's not a gtest thing. It's just a C++ thing. So C++ has the

				concept of friending another class. It's very cute. It basically just says,

				this other class and I, we can access each other's internal states. Don't

				worry, we're friends. Generally speaking, that's a bad idea. We write classes

				for a reason to have encapsulation. The entire goal of a class is to

				encapsulate behavior and to hide the implementation details that you don't want

				to be exposed. But obviously, again, when you're writing tests, sometimes it is

				the correct thing to do to poke a hole in the test and get at something. Very

				much in the schools of thought here, some people would be like, you should be

				doing dependency injection. Some people are like, no, just friend your class.

				It's OK. If folks want to look up more, go look up the difference between open

				box and closed box testing.

				24:00 SHARON: For those of you who are like, oh, this sounds really cool, I

				will learn more.

				24:00 STEPHEN: Yeah, for my test nerds out there.

				24:06 SHARON: [LAUGHS] Yeah, Stephen's got a club. Feel free to join.

				24:06 STEPHEN: Yeah. [LAUGHTER]

				24:11 SHARON: You get a card. Moving on to our next type of test, which is your

				wheelhouse, which is web tests. This is something I don't know much about. So

				tell us all about it.

				24:22 STEPHEN: [LAUGHS] Yeah. This is my - this is where hopefully I'll shine.

				It's the area I should know most about. But web tests are - they're an

				interesting one. So I would describe them is our version of an end-to-end test

				in that a web test really is just an HTML file, a JavaScript file that is when

				you run it, you literally bring up - you'll remember I said that browser tests

				are most of a whole browser. Web tests bring up a whole browser. It's just the

				same browser as content shell or Chrome. And it runs that whole browser. And

				the test does something, either in HTML or JavaScript, that then is asserted

				and checked. And the reason I say that I would call them this, I have heard

				people argue that they're technically unit tests, where the unit is the

				JavaScript file and the entire browser is just, like, an abstraction that you

				don't care about. I guess it's how you view them really. I view the browser as

				something that is big and flaky, and therefore these are end-to-end tests. Some

				people disagree.

				25:22 SHARON: In our last episode, John touched on these tests and how that

				they're - the scope and that each test covers is very small. But how you run

				them is not. And I guess you can pick a side that you feel that you like more

				and go with that. So what are examples of things we test with these kind of

				tests?

				25:49 STEPHEN: Yeah. So the two big categories of things that we test with web

				tests are basically web APIs, so JavaScript APIs, provided by the browser to do

				something. There are so many of those, everything from the fetch API for

				fetching stuff to the web serial API for talking to devices over serial ports.

				The web is huge. But anything you can talk to via JavaScript API, we call those

				JavaScript tests. It's nice and straightforward. The other thing that web tests

				usually encompass are what are called rendering tests or sometimes referred to

				as ref tests for reference tests. And these are checking the actual, as the

				first name implies, the rendering of some HTML, some CSS by the browser. The

				reason they're called reference tests is that usually the way you do this to

				check whether a rendering is correct is you set up your test, and then you

				compare it to some image or some other reference rendering that you're like,

				OK, this should look like that. If it does look like that, great. If it

				doesn't, I failed.

				26:54 SHARON: Ah-ha. And are these the same as - so there's a few other test

				names that are all kind of similar. And as someone who doesn't work in them,

				they all kind of blur together. So I've also heard web platform tests. I've

				heard layout tests. I've heard Blink tests, all of which do - all of which are

				JavaScript HTML-like and have some level of images in them. So are these all

				the same thing? And if not, what's different?

				27:19 STEPHEN: Yeah. So yes and no, I guess, is my answer. So a long time ago,

				there were layout tests basically. And that was something we inherited from the

				WebKit project when we forked there, when we forked Chromium from WebKit all

				those years ago. And they're exactly what I've described. They were both

				JavaScript-based tests and they were also HTML-based tests for just doing

				reference renderings. However, web platform test came up as an external project

				actually. Web platform test is not a Chromium project. It is external upstream.

				You can find them on GitHub. And their goal was to create a set of - a test

				suite shared between all browsers so that all browsers could test - run the

				same tests and we could actually tell, hey, is the web interoperable? Does it

				work the same way no matter what browser you're on? The answer is, no. But

				we're trying. And so inside of Chromium we said, that's great. We love this

				idea. And so what we did was we actually import web platform test into our

				layout tests. So web platform test now becomes a subdirectory of layout tests.

				OK?

				28:30 SHARON: OK. [LAUGHS]

				28:30 STEPHEN: To make things more confusing, we don't just import them, but we

				also export them. We run a continuous two-way sync. And this means that

				Chromium developers don't have to worry about that upstream web platform test

				project most of the time. They just land their code in Chromium, and a magic

				process happens, and it goes up into the GitHub project. So that's where we

				were for many years - layout tests, which are a whole bunch of legacy tests,

				and then also web platform tests. But fairly recently - and I say that knowing

				that COVID means that might be anything within the last three years because who

				knows where time went - we decided to rename layout test. And partly, the name

				we chose was web tests. So now you have web tests, of which web platform tests

				are a subset, or a - yeah, subset of web test. Easy.

				29:20 SHARON: Cool.

				29:20 STEPHEN: [LAUGHS]

				29:20 SHARON: Cool. And what about Blink tests? Are those separate, or are

				those these altogether?

				29:27 STEPHEN: I mean, if they're talking about the JavaScript and HTML, that's

				going to just be another name for the web tests. I find that term confusing

				because there is also the Blink tests target, which builds the infrastructure

				that is used to run web tests. So that's probably what you're referring, like

				`blink_test`. It is the target that you build to run these tests.

				29:50 SHARON: I see. So `blink_test` is a target. These other ones, web test

				and web platform tests, are actual test suites.

				29:57 STEPHEN: Correct. Yes. That's exactly right.

				30:02 SHARON: OK. All right.

				30:02 STEPHEN: Simple.

				30:02 SHARON: Yeah. So easy. So you mentioned that the web platform tests are

				cross-browser. But a lot of browsers are based on Chromium. Is it one of the

				things where it's open source and stuff but majority of people contributing to

				these and maintaining it are Chrome engineers?

				30:23 STEPHEN: I must admit, I don't know what that stat is nowadays. Back when

				I was working on interoperability, we did measure this. And it was certainly

				the case that Chromium is a large project. There were a lot of tests being

				contributed by Chromium developers. But we also saw historically - I would like

				to recognize Mozilla, most of all, who were a huge contributor to the web

				platform test project over the years and are probably the reason that it

				succeeded. And we also - web platform test also has a fairly healthy community

				of completely outside developers. So people that just want to come along. And

				maybe they're not able to or willing to go into a browser, and actually build a

				browser, and muck with code. But they could write a test for something. They

				can find a broken behavior and be like, hey, there's a test here, Chrome and

				Firefox do different things.

				31:08 SHARON: What are examples of the interoperability things that you're

				testing for in these cross-browser tests?

				31:17 STEPHEN: Oh, wow, that's a big question. I mean, really everything and

				anything. So on the ref test side, the rendering test, it actually does matter

				that a web page renders the same in different browsers. And that is very hard

				to achieve. It's hard to make two completely different engines render some HTML

				and CSS exactly the same way. But it also matters. We often see bugs where you

				have a lovely - you've got a lovely website. It's got this beautiful header at

				the top and some content. And then on one browser, there's a two-pixel gap

				here, and you can see the background, and it's not a great experience for your

				users. So ref tests, for example, are used to try and track those down. And

				then, on the JavaScript side, I mean really, web platform APIs are complicated.

				They're very powerful. There's a reason they are in the browser and you cannot

				do them in JavaScript. And that is because they are so powerful. So for

				example, web USB to talk to USB devices, you can't just do that from

				JavaScript. But because they're so powerful, because they're so complicated,

				it's also fairly easy for two browsers to have slightly different behavior. And

				again, it comes down to what is the web developer's experience. When I try and

				use the web USB API, for example, am I going to have to write code that's like,

				if Chrome, call it this way, if Fire - we don't want that. That is what we do

				not want for the web. And so that's the goal.

				32:46 SHARON: Yeah. What a team effort, making the whole web work is. All

				right. That's cool. So in your time working on these web platform tests, do you

				have any fun stories you'd like to share or any fun things that might be

				interesting to know?

				33:02 STEPHEN: Oh, wow. [LAUGHS] One thing I like to bring up - I'm afraid it's

				not that fun, but I like to repeat it a lot of times because it's weird and

				people get tripped up by it - is that inside of Chromium, we don't run web

				platform tests using the Chrome browser. We run them using content shell. And

				this is partially historical. That's how layout tests run. We always ran them

				under content shell. And it's partially for I guess what I will call

				feasibility. As I talked about earlier, content shell is much simpler than

				Chrome. And that means that if you want to just run one test, it is faster, it

				is more stable, it is more reliable I guess I would say, than trying to bring

				up the behemoth that is Chrome and making sure everything goes correctly. And

				this often trips people up because in the upstream world of this web platform

				test project, they run the test using the proper Chrome binary. And so they're

				different. And different things do happen. Sometimes it's rendering

				differences. Sometimes it's because web APIs are not always implemented in both

				Chrome and content shell. So yeah, fun fact.

				34:19 SHARON: Oh, boy. [LAUGHTER]

				34:19 STEPHEN: Oh, yeah.

				34:19 SHARON: And we wonder why flakiness is a problem. Ah. [LAUGHS]

				34:19 STEPHEN: Yeah. It's a really sort of fun but also scary fact that even if

				we put aside web platform test and we just look at layout test, we don't test

				what we ship. Layout test running content shell, and then we turn around and

				we're like, here's a Chrome binary. Like uh, those are different. But, hey, we

				do the best we can.

				34:43 SHARON: Yeah. We're out here trying our best. So that all sounds very

				cool. Let's move on to our next type of test, which is performance. You might

				have heard the term telemetry thrown around. Can you tell us what telemetry is

				and what these performance tests are?

				34:54 STEPHEN: I mean, I can try. We've certainly gone straight from the thing

				I know a lot about into the thing I know very little about. But -

				35:05 SHARON: I mean, to Stephen's credit, this is a very hard episode to find

				one single guest for. People who are working extensively usually in content

				aren't working a ton in performance or web platform stuff. And there's no one

				who is - just does testing and does every kind of testing. So we're trying our

				best. [INAUDIBLE]

				35:24 STEPHEN: Yeah, absolutely. You just need to find someone arrogant enough

				that he's like, yeah, I'll talk about all of those. I don't need to know the

				details. It's fine. But yeah, performance test, I mean, the name is self

				explanatory. These are tests that are trying to ensure the performance of

				Chromium. And this goes back to the four S's when we first started Chrome as a

				project - speed, simplicity, security, and I've forgotten the fourth S now.

				Speed, simplicity, security - OK, let's not reference the four S's then.

				[LAUGHTER] You have the Comet. You tell me.

				36:01 SHARON: Ah. Oh, I mean, I don't read it every day. Stability. Stability.

				36:08 STEPHEN: Stability. God damn it. Let's literally what the rest of this is

				about. OK, where were we?

				36:13 SHARON: We're leaving this in, don't worry. [LAUGHTER]

				36:19 STEPHEN: Yeah. So the basic idea of performance test is to test

				performance because as much as you can view behavior as a correctness thing, in

				Chromium we also consider performance a correctness thing. It is not a good

				thing if a change lands and performance regresses. So obviously, testing

				performance is also hard to do absolutely. There's a lot of noise in any sort

				of performance testing. An so, we do it essentially heuristically,

				probabilistically. We run whatever the tests are, which I'll talk about in a

				second. And then we look at the results and we try and say, hey, OK, is there a

				statistically significant difference here? And there's actually a whole

				performance sheriffing rotation to try and track these down. But in terms of,

				yeah, you mentioned telemetry. That weird word. You're like, what is a

				telemetry test? Well, telemetry is the name of the framework that Chromium

				uses. It's part of the wider catapult project, which is all about different

				performance tools. And none of the names, as far as I know, mean anything.

				They're just like, hey, catapult, that's a cool name. I'm sure someone will

				explain to me now the entire history behind the name catapult and why it's

				absolutely vital. But anyway, so telemetry basically is a framework that when

				you give it some input, which I'll talk about in a second, it launches a

				browser, performs some actions on a web page, and records metrics about those

				actions. So the input, the test essentially, is basically a collection of go to

				this web page, do these actions, record these metrics. And I believe in

				telemetry that's called a story, the story of someone visiting a page, I guess,

				is the idea. One important thing to know is that because it's sort of insane to

				actually visit real websites, they keep doing things like changing - strange.

				We actually cache the websites. We download a version of the websites once and

				actually check that in. And when you go run a telemetry test, it's not running

				against literally the real Reddit.com or something. It's running against a

				version we saved at some point.

				38:31 SHARON: And how often - so I haven't really heard of anyone who actually

				works on this and that we can't - you don't interact with everyone. But how -

				as new web features get added and things in the browser change, how often are

				these tests specifically getting updated to reflect that?

				38:44 STEPHEN: I would have to plead some ignorance there. It's certainly also

				been my experience as a browser engineer who has worked on many web APIs that

				I've never written a telemetry test myself. I've never seen one added. My

				understanding is that they are - a lot of the use cases are fairly general with

				the hope that if you land some performance problematic feature, it will regress

				on some general test. And then we can be like, oh, you've regressed. Let's

				figure out why. Let's dig in and debug. But it certainly might be the case if

				you are working on some feature and you think that it might have performance

				implications that aren't captured by those tests, there is an entire team that

				works on the speed of Chromium. I cannot remember their email address right

				now. But hopefully we will get that and put that somewhere below. But you can

				certainly reach out to them and be like, hey, I think we should test the

				performance of this. How do I go about and do that?

				39:41 SHARON: Yeah. That sounds useful. I've definitely gotten bugs filed

				against me for performance stuff. [LAUGHS] Cool. So that makes sense. Sounds

				like good stuff. And in talking to some people in preparation for this episode,

				I had a few people mention Android testing specifically. Not any of the other

				platforms, just Android. So do you want to tell us why that might be? What are

				they doing over there that warrants additional mention?

				40:15 STEPHEN: Yeah. I mean, I think probably the answer would just be that

				Android is such a huge part of our code base. Chrome is a browser, a

				multi-platform browser, runs on multiple desktop platforms, but it also runs on

				Android. And it runs on iOS. And so I assume that iOS has its own testing

				framework. I must admit, I don't know much about that at all. But certainly on

				Android, we have a significant amount of testing framework built up around it.

				And so there's the option, the ability for you to test your Java code as well

				as your C++ code.

				40:44 SHARON: That makes sense. And yeah, with iOS, because they don't use

				Blink, I guess there's - that reduces the amount of test that they might need

				to add, whereas on Android they're still using Blink. But there's a lot of

				differences because it is mobile, so they're just, OK, we actually can test

				those things. So let's go more general now. At almost every stage, you've

				mentioned flakiness. So let's briefly run down, what is flakiness in a test?

				41:14 STEPHEN: Yes. So flakiness for a test is just - the definition is just

				that the test does not consistently produce the same output. When you're

				talking about flakiness, you actually don't care what the output is. A test

				that always fails, that's fine. It always fails. But a test that passes 90% of

				the time and fails 10%, that's not good. That test is not consistent. And it

				will cause problems.

				41:46 SHARON: What are common causes of this?

				41:46 STEPHEN: I mean, part of the cause is, as I've said, we write a lot of

				integration tests in Chromium. Whether those are browser tests, or whether

				those are web tests, we write these massive tests that span huge stacks. And

				what comes implicitly with that is timing. Timing is almost always the

				problem - timing and asynchronicity. Whether that is in the same thread or

				multiple threads, you write your test, you run it on your developer machine,

				and it works. And you're like, cool, my test works. But what you don't realize

				is that you're assuming that in some part of the browser, this function ran,

				then this function run. And that always happens in your developer machine

				because you have this CPU, and this much memory, and et cetera, et cetera. Then

				you commit your code, you land your code, and somewhere a bot runs. And that

				bot is slower than your machine. And on that bot, those two functions run in

				the opposite order, and something goes horribly wrong.

				42:50 SHARON: What can the typical Chrome engineer writing these tests do in

				the face of this? What are some practices that you generally should avoid or

				generally should try to do more often that will keep this from happening in

				your test?

				43:02 STEPHEN: Yeah. So first of all, write more unit tests, write less browser

				tests, please. Unit tests are - as I've talked about, they're small. They're

				compact. They focus just on the class that you're testing. And too often, in my

				opinion - again, I'm sure we'll get some nice emails stating I'm wrong - but

				too often, in my opinion people go straight to a browser test. And they bring

				up a whole browser just to test functionality in their class. This sometimes

				requires writing your class differently so that it can be tested by a unit

				test. That's worth doing. Beyond that, though, when you are writing a browser

				test or a web test, something that is more integration, more end to end, be

				aware of where timing might be creeping in. So to give an example, in a browser

				test, you often do things like start by loading some web contents. And then you

				will try and poke at those web contents. Well, so one thing that people often

				don't realize is that loading web contents, that's not a synchronous process.

				Actually knowing when a page is finished loading is slightly difficult. It's

				quite interesting. And so there are helper functions to try and let you wait

				for this to happen, sort of event waiters. And you should - unfortunately, the

				first part is you have to be aware of this, which is just hard to be. But the

				second part is, once you are aware of where these can creep in, make sure

				you're waiting for the right events. And make sure that once those events have

				happened, you are in a state where the next call makes sense.

				44:28 SHARON: That makes sense. You mentioned rewriting your classes so they're

				more easily testable by a unit test. So what are common things you can do in

				terms of how you write or structure your classes that make them more testable?

				And just that seems like a general good software engineering practice to do.

				44:50 STEPHEN: Yeah, absolutely. So one of the biggest ones I think we see in

				Chromium is to not use singleton accessors to get at state. And what I mean by

				that is, you'll see a lot of code in Chromium that just goes ahead and threw

				some mechanism that says, hey, get the current web contents. And as you, I

				think, you've talked about on this program before, web contents is this massive

				class with all these methods. And so if you just go ahead and get the current

				web contents and then go do stuff on that web contents, whatever, when it comes

				to running a test, well, it's like, hold on. That's trying to fetch a real web

				contents. But we're writing a unit test. What does that even look like? And so

				the way around this is to do what we call dependency injection. And I'm sure as

				I've said that word, a bunch of listeners or viewers have just recoiled in

				fear. But we don't lean heavily into dependency injection in Chromium. But it

				is useful for things like this. Instead of saying, go get the web contents,

				pass a web contents into your class. Make a web contents available as an input.

				And that means when you create the test, you can use a fake or a mock web

				contents. We can talk about difference between fakes and mocks as well. And

				then, instead of having it go do real things in real code, you can just be

				like, no, no, no. I'm testing my class. When you call it web contents do a

				thing, just return this value. I don't care about web contents. Someone else is

				going to test that.

				46:19 SHARON: Something else I've either seen or been told in code review is to

				add delegates and whatnot.

				46:25 STEPHEN: Mm-hmm.

				46:25 SHARON: Is that a good general strategy for making things more testable?

				46:25 STEPHEN: Yeah. It's similar to the idea of doing dependency injection by

				passing in your web contents. Instead of passing in your web contents, pass in

				a class that can provide things. And it's sort of a balance. It's a way to

				balance, if you have a lot of dependencies, do you really want to add 25

				different inputs to your class? Probably not. But you define a delegate

				interface, and then you can mock out that delegate. You pass in that one

				delegate, and then when delegate dot get web content is called, you can mock

				that out. So very much the same goal, another way to do it.

				47:04 SHARON: That sounds good. Yeah, I think in general, in terms of Chrome

				specifically, a lot of these testing best practices, making things testable,

				these aren't Chrome-specific. These are general software engineering-specific,

				C++-specific, and those you can look more into separately. Here we're mostly

				talking about what are the Chrome things. Right?

				47:24 STEPHEN: Yeah.

				47:24 SHARON: Things that you can't just find as easily on Stack Overflow and

				such. So you mentioned fakes and mocks just now. Do you want to tell us a bit

				about the difference there?

				47:32 STEPHEN: I certainly can do it. Though I want to caveat that you can also

				just go look up those on Stack Overflow. But yeah. So just to go briefly into

				it, there is - in testing you'll often see the concept of a fake version of a

				class and also a mock version of a class. And the difference is just that a

				fake version of the class is a, what I'm going to call a real class that you

				write in C++. And you will probably write some code to be like, hey, when it

				calls this function, maybe you keep some state internally. But you're not using

				the real web contents, for example. You're using a fake. A mock is actually a

				thing out of the Google test support library. It's part of a - Google mock is

				the name of the sub-library, I guess, the sub-framework that provides this. And

				it is basically a bunch of magic that makes that fake stuff happen

				automatically. So you can basically say, hey, instead of a web contents, just

				mock that web contents out. And the nice part about mock is, you don't have to

				define behavior for any method you don't care about. So if there are, as we've

				discussed, 100 methods inside web contents, you don't have to implement them

				all. You can be like, OK, I only care about the do Foobar method. When that is

				called, do this.

				48:51 SHARON: Makes sense. One last type of test, which we don't hear about

				that often in Chrome but does exist quite a bit in other areas, is manual

				testing. So do we actually have manual testing in Chrome? And if so, how does

				that work?

				49:03 STEPHEN: Yeah, we actually do. We're slightly crossing the boundary here

				from the open Chromium into the product that is Google Chrome. But we do have

				manual tests. And they are useful. They are a thing. Most often, you will see

				this in two cases as a Chrome engineer. You basically work with the test team.

				As I said, all a little bit internal now. But you work with the test team to

				define a set of test cases for your feature. And these are almost always

				end-to-end tests. So go to this website, click on this button, you should see

				this flow, this should happen, et cetera. And sometimes we run these just as

				part of the launch process. So when you're first launching a new feature, you

				can be like, hey, I would love for some people to basically go through this and

				smoke test it, make sure that everything is correct. Some things we test every

				release. They're so important that we need to have them tested. We need to be

				sure they work. But obviously, all of the caveats about manual testing out

				there in the real world, they apply equally to Chromium or to Chrome. Manual

				testing is slow. It's expensive. We require people - specialized people that we

				have to pay and that they have to sit there, and click on things, and that sort

				of thing, and file bugs when it doesn't work. So wherever possible, please do

				not write manual tests. Please write automated testing. Test your code, please.

				But then, yeah, it can be used.

				50:33 SHARON: In my limited experience working on Chrome, the only place that

				I've seen there actually be any level of dependency on manual test has been in

				accessibility stuff -

				50:38 STEPHEN: Yeah.

				50:38 SHARON: which kind of makes sense. A lot of that stuff is not

				necessarily - it is stuff that you would want to have a person check because,

				sure, we can think that the speaker is saying this, but we should make sure

				that that's the case.

				50:57 STEPHEN: Exactly. I mean, that's really where manual test shines, where

				we can't integration test accessibility because you can't test the screen

				reader device or the speaker device. Whatever you're using, we can't test that

				part. So yes, you have to then have a manual test team that checks that things

				are actually working.

				51:19 SHARON: That's about all of our written down points to cover. Do you have

				any general thoughts, things that you think people should know about tests,

				things that people maybe ask you about tests quite frequently, anything else

				you'd like to share with our lovely listeners?

				51:30 STEPHEN: I mean, I think I've covered most of them. Please write tests.

				Write tests not just for code you're adding but for code you're modifying, for

				code that you wander into a directory and you say, how could this possibly

				work? Go write a test for it. Figure out how it could work or how it couldn't

				work. Writing tests is good.

				51:50 SHARON: All right. And we like to shout-out a Slack channel of interest.

				Which one would be the - which one or ones would be a good Slack channel to

				post in if you have questions or want to get more into testing?

				52:03 STEPHEN: Yeah. It's a great question. I mean, I always like to - I think

				it's been called out before, but the hashtag #halp channel is very useful for

				getting help in general. There is a hashtag #wpt channel. If you want to go ask

				about web platform tests, that's there. There's probably a hashtag #testing.

				But I'm going to admit, I'm not in it, so I don't know.

				52:27 SHARON: Somewhat related is there's a hashtag #debugging channel.

				52:27 STEPHEN: Oh.

				52:27 SHARON: So if you want to learn about how to actually do debugging and

				not just do log print debugging.

				52:34 STEPHEN: Oh, I was about to say, do you mean by printf'ing everywhere in

				your code?

				52:41 SHARON: [LAUGHS] So there are a certain few people who like to do things

				in an actual debugger or enjoy doing that. And for a test, that can be a useful

				thing too - a tool to have. So that also might be something of interest. All

				right, yeah. And kind of generally, as you mentioned a lot of things are your

				opinion. And it seems like we currently don't have a style guide for tests or

				best practices kind of thing. So how can we -

				53:13 STEPHEN: [LAUGHS] How can we get there? How do we achieve that?

				53:19 SHARON: How do we get one?

				53:19 STEPHEN: Yeah.

				53:19 SHARON: How do we make that happen?

				53:19 STEPHEN: It's a hard question. We do - there is documentation for

				testing, but it's everywhere. I think there's `/docs/testing`, which has some

				general information. But so often, there's just random READMEs around the code

				base that are like, oh, hey, here's the content public test API surface. Here's

				a bunch of useful information you might want to know. I hope you knew to look

				in this location. Yeah, it's a good question. Should we have some sort of

				process for - like you said, like a style guide but for testing? Yeah, I don't

				know. Maybe we should enforce that people dependency inject their code.

				54:04 SHARON: Yeah. Well, if any aspiring test nerds want to really get into

				it, let me know. I have people who are also interested in this and maybe can

				give you some tips to get started. But yeah, this is a hard problem and

				especially with so many types of tests everywhere. I mean, even just getting

				one for each type of test would be useful, let alone all of them together. So

				anyway - well, that takes us to the end of our testing episode. Thank you very

				much for being here, Stephen. I think this was very useful. I learned some

				stuff. So that's cool. So hopefully other people did too. And, yeah, thanks for

				sitting and answering all these questions.

				54:45 STEPHEN: Yeah, absolutely. I mean, I learned some things too. And

				hopefully we don't have too many angry emails in our inbox now.

				54:52 SHARON: Well, there is no email list, so people can't email in if they

				have issues. [LAUGHTER]

				54:58 STEPHEN: If you have opinions, keep them to yourself -

				54:58 SHARON: Yeah. [INAUDIBLE]

				54:58 STEPHEN: until Sharon invites you on her show.

				55:05 SHARON: Yeah, exactly. Yeah. Get on the show, and then you can air your

				grievances at that point. [LAUGHS] All right. Thank you.

									
										923

docs/transcripts/wuwt-e05-build-gn.md
									
										Normal file
									
				@ -0,0 +1,923 @@

				# What’s Up With BUILD.gn

				This is a transcript of [What's Up With

				That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)

				Episode 5, a 2023 video discussion between [Sharon (yangsharon@chromium.org)

				and Nico (thakis@chromium.org)](https://www.youtube.com/watch?v=NcvJG3MqquQ).

				The transcript was automatically generated by speech-to-text software. It may

				contain minor errors.

				---

				Building Chrome is an integral part of being a Chrome engineer. What actually

				happens when you build Chrome, and what exactly happens when you run those

				build commands? Today, we have Nico, who was responsible for making Ninja the

				Chrome default build system, to tell us more.

				Notes:

				- https://docs.google.com/document/d/1iDFqA3cZAUo0TUFA69cu5wEKL4HjSoIGfcoLIrH3v4M/edit

				---

				00:00 SHARON: Hello, and welcome to "What's Up With That," the series that

				demystifies all things Chrome. I'm your host, Sharon, and today, we're talking

				about building Chrome. How do you go from a bunch of files on your computer to

				running a browser? What are all the steps involved? Our special guest today is

				Nico. He's responsible for making Ninja, the Chrome default build system, and

				he's worked on Clang and all sorts of areas of the Chrome build. If you don't

				know what some of those things are, don't worry. We'll get into it. Welcome,

				Nico.

				00:29 NICO: Hello, Sharon, and hello, internet.

				00:29 SHARON: Hello. We have lots to cover, so let's get right into it. If I

				want to build Chrome at a really quick overview, what are all the steps that I

				need to do?

				00:41 NICO: It's very easy. First, you download `depot_tools` and add that to

				your path. Then you run fetch Chromium. Then you type `cd source`, run `gclient

				sync`, `gn gen out/GN`, and `ninja -C out/GN chrome`. And that's it.

				00:53 SHARON: Wow. Sounds so easy. All right. We can wrap that up. See you guys

				next time. OK. All right. Let's take it from the start, then, and go over in

				more detail what some of those things are. So the first thing you mentioned is

				`depot_tools`. What is that?

				01:11 NICO: `depot_tools` is just a collection of random utilities for - like,

				back in the day, for managing subversion repositories, nowadays for pulling

				things from git. It contains Ninja and GN. Just adds a bunch of stuff to your

				path that you need for working on Chrome.

				01:25 SHARON: OK. Is this a Chrome-specific thing, or is this used elsewhere,

				too?

				01:33 NICO: In theory, it's fairly flexible. In practice, I think it's mostly

				used by Chromium projects.

				01:39 SHARON: OK, all right. And there, you mentioned Ninja and GN. And for

				people - I think most people who are watching this have built Chrome at some

				point. But what is the difference between Ninja and GN? Because you have your

				build files, which are generally called Build.gn, and then you run a command

				that has Ninja in it. So are those the same thing? Are those related?

				01:57 NICO: Yes. So GN is short for Generate Ninja. So Ninja is a build system.

				It's similar to Make. It basically gets a list of source files and a list of

				build outputs. And then when you run Ninja, Ninja figures out which build steps

				do I have to run, and then it runs them. So it's kind of like Make but simpler

				and faster. And then GN - and Ninja doesn't have any conditionals or anything,

				so GN is - just a built - it describes the build. And then it generates Ninja

				files.

				02:34 SHARON: OK.

				02:34 NICO: So if you want to do, like, add these files only if you're building

				for Windows, this is something you can do, say, in GN. But then it only

				generates a Windows-specific Ninja file.

				02:46 SHARON: All right. And in terms of when you mention OS, so there's a

				couple places that you can specify different arguments for how you build

				Chrome. So you have your gclient sync - sorry, your gclient file, and then you

				have a separate args.gn. And in both of these places, you can specify different

				arguments. And for example, the operating system you use - that can be

				specified in both places. There's an OS option in both. So what is the purpose

				of the gclient file, and what is the purpose of the args.gn file?

				03:25 NICO: Yes. So gclient reads the steps file that is at the root of the

				directory, and the DEPS file basically specifies dependencies that Chrome pulls

				in. It's kind of similar to git submodules, but it predates git, so we don't

				use git submodules also for other reasons. And so if you run gclient sync, that

				reads the DEPS file, the Chrome root, and that downloads a couple hundred

				repositories that Chrome depends on. And then it executes a bunch of so-called

				hooks, which are just Python scripts, which also download a bunch of more

				stuff. And the hooks and the dependencies are operating system dependent, so

				gclient needs to know the operating system. But the build also needs to know

				the operating system. And GN args are basic things that are needed for the

				builds. So the OS is something that's needed in both places, but many GN args

				gclient doesn't need to know about. For example, if you enable DCHECKs, like

				Peter discussed a few episodes ago, that's a GN-only thing.

				04:26 SHARON: All right. That sounds good. So let's see. When you actually -

				OK. So when you run Chrome and you - say you build Chrome, right? A typical

				example of a command to do that would be, say, `autoninja -C out/default

				content`, right? And let's just go through each part of that and say what each

				of those things is doing and what happens there. Because I think that's just an

				example from one of the starter docs. That's just the copy and paste command

				that they give you. So autoninja seems like it's based on Ninja. What is the

				auto they're doing for us?

				05:15 NICO: Yeah. So autoninja is also one of the things that's just

				`depot_tools`. It's a very - or it used to be a very thin wraparound Ninja. Now

				it's maybe a little thicker, but it's optional. You don't have to use autoninja

				if you don't want to. But what it does is basically - like, it helps - So

				Chrome contains a lot of code. So we have this system called Goma, which can

				run all the C++ compilations in a remote data center. And if you do use the

				system, then you want to build with a very high build parallelism. You want to,

				say, `-j 1000` or what and run, like, a thousand bit processes in parallel. But

				if you're building locally, you don't want to do that. So what autoninja

				basically does - it looks at your args.gn file, sees if you have enabled Goma,

				and if so, it runs Ninja with many processes, and else, it runs it with just

				one process per core, or something like that. So that's originally all that

				autoninja does. Nowadays, I think it also uploads a bunch of stuff. But you can

				just run which autoninja, and that prints some path, and you can just open that

				in the editor and read it. I think it's still short enough to fairly quickly

				figure out what it does.

				06:17 SHARON: OK. What does `-C` do? Because I think I've been using that this

				whole time because I copied and pasted it from somewhere, and I've just always

				had it.

				06:28 NICO: It says - it changes the current directory where Ninja runs, like

				in Make. So it basically says, change the current directory to out/GN, or

				whatever you build directory is, and then run the build from there. So for

				Chrome, the build always - the current directory during the build is always the

				build directory. And then Ninja looks for a file called build.ninja in the

				current directory, so GN writes build.ninja to out/GN, or whatever you build

				directory is. And then Ninja finds it there and reads it and does its thing.

				06:57 SHARON: All right. So the next part of this would be out/default, or out

				slash something else. So what are out directories, and how do we make use of

				them?

				07:11 NICO: An out directory - it's just a build directory. That's where all

				the build artifacts go to, all the generated objects files, executables, random

				things that are generated during the build. So it can be any directory, really.

				You can make up any directory name that you like. You can build your Chrome in,

				I don't know, fluffy/kitten, or whatever. But I think most people use out just

				because it's in the global `.gitignore` file already. Then you want to use

				something that's two directories deep so that the path from the directory to

				the source is always `../..`. And that makes sure that this is deterministic.

				We try to have a so-called deterministic build, where you get exactly the same

				binary when you build Chrome at the same revision, independent of the host

				machine, more or less. And the path from the build directory to the source file

				is something that goes into debug info. So if you want to have the same build

				output as everyone else, you want a build directory path that's two directories

				deep. And the names of those two directories doesn't really matter. So what

				some people do is they use out/debug for the debug builds and out/release for

				their release builds. But it's really up to you.

				08:26 SHARON: Right. Other common ones are, like, yeah. ASan is a common one,

				different -

				08:33 NICO: Right.

				08:33 SHARON: OSes. Right. So you mentioned having a deterministic build. And

				assuming you're on the same version of Chrome, at the same checkout,

				tip-of-tree, or whatever as someone else, I would have expected that all of the

				builds are just deterministic, but maybe that's because of work that people

				like you and the build team have done. But what are things that could cause

				that to be nondeterministic? Because you have all the same files. Where is the

				actual nondeterminism coming from? Or is it just different configurations and

				setups you have on your local machine?

				09:09 NICO: Yeah, that's a great question. I always thought this would be very

				easy to - but turns out it mostly isn't. We wrote a very long blog post that we

				can link to it from the show notes about this. But there's many things that can

				go wrong. Like for example, in C++, there's the preprocessor macro `__DATE__`,

				which embeds the current date into the build output. So if you do that, then

				you're time dependent already. By default, I think you end up with absolute

				paths to everything in debug information. So if you build under

				`/home/sharon/blah`, then that's already different from all the people who are

				not called Sharon. Then there's - we run tools as part of the build that

				produce output. For example, the protobuf compiler or whatnot. And so if that

				binary iterates over some map, some hash map, and that doesn't have

				deterministic iteration order, then the output might be different. And there's

				a long, long, long, long, long list of things. Making the build deterministic

				was a very big project, and there's still a few open things.

				10:08 SHARON: OK, cool. So I guess it's - yeah, it's not true nondeterminism,

				maybe, but there's enough factors that go into it that to a typical person

				interacting with it, it does seem -

				10:21 NICO: Yeah, but there's also true nondeterminism. Like, every now and

				then, when we update the compiler, the compiler will write different object

				files on every run just because the compiler internally iterates about some -

				over some hash map. And then we have to complain upstream, and then they fix

				it.

				10:34 SHARON: OK. Oh, wow. OK. That's very cool. Well, thank you for dealing

				with this kind of stuff so people like us don't have to worry about it. OK. And

				the last part of our typical build thing is content. So what is content in this

				context? If you want to learn about content more in general, check out

				episode 3. But in this case, what does that mean?

				10:58 NICO: So just a build target. So I think people - at least I usually

				build some executable. I usually build, I don't know, `base_unittests` or

				`unit_tests` or Chrome or content shell or what. And it's just - so in the

				Ninja files, there's basically - there's many, many lines that go, if you want

				to build this file, you need to have these inputs and then run this command. If

				you want to build this file, instead, you need these other files. You need to

				run this other command. So for example, if you want to build `base_unittests`,

				you need a couple thousand object files, and then you need to run the linkers,

				what's in there. And so if you tell Ninja - the last thing you give it -

				basically, it tells Ninja, what do you want to build? So if you say, `ninja -C

				out/GN content_shell` or what, then Ninja is like, let's look at the line that

				says `content_shell`. And then it checks - I need these files, so it builds all

				the prerequisites, which usually means compiling a whole bunch of files. And

				then it runs the final command and runs the linker. So Ninja basically decides

				what it needs to do and then invokes other commands to do the actual work.

				12:08 SHARON: OK, makes sense. So say I run the build - so say I built the

				target Chrome, which is the one that actually is an executable, and that's

				what - if you run that, the browser is built from it. So say I've built the

				Chrome build target. How do I run that now?

				12:31 NICO: Well, it's written - so normally, the thing you give to Ninja is

				actually a file name. And the `-C` change current directory. So if you say, `-C

				out/release chrome`, then this creates the file `out/release/chrome`. It just

				creates that file in the out directory. So to run that, you just run

				`out/release/chrome`, and hopefully it'll start up and work.

				12:54 SHARON: Great. Sounds so easy. So you mentioned earlier something called

				Goma, which had remote data centers and stuff. Is this something that's

				available to people who don't work at Google, or is this one of the

				Google-specific things? Because I think so far, everything mentioned is anyone,

				anywhere can do all this. Is that the case with Goma, also?

				13:14 NICO: Yeah. For the other things - so Ninja is actually something that

				started in Chrome land, but that's been fairly widely adopted across the world.

				Like, that's used by many projects. But yeah, Goma - I think it's kind of like

				distcc. Like, it's a distributed compiler thing. I think the source code for

				both the client and the server are open source. And we can link to that. But

				the access to the service, I think, isn't public. So they have to work at

				Google or at a partner company. I think we hand out access to a few partners.

				And as far as I know, there's a few independent implementations of the

				protocol, so other people also use something like Goma. But as far as I know,

				these other services also aren't public.

				13:53 SHARON: OK. Right. Yeah, because I think one of the main things is - I

				mean, as someone who did an internship on Chrome, after, I was like, I'll

				finish some of these remaining to do items once I go back to school, right? And

				then I started to build Chrome on my laptop, just a decent laptop, but still a

				laptop, and I was like, no, I guess I won't be doing that.

				14:17 NICO: No, it's doable. You just need to be patient and strategic. Like, I

				used to do that every now and then. You have to start the build at night, and

				then when you get up, it's done. And if you only change one or two CC files,

				it's reasonably fast. It's just, full builds take a very long time.

				14:29 SHARON: Yeah, well, yeah. There was enough stuff going on that I was

				like, OK. We maybe won't do this. Right. Going back to another thing you

				mentioned is the compiler and Clang. So can you tell us a bit more about Clang

				and how compiling fits into the build process?

				14:50 NICO: Yeah, sure. I mean, compiling just means - almost all of Chrome

				currently is written in C++, and compiling just means taking a CC file, like a

				C++ file, and turning it into - turning that into an object file. And there are

				a whole bunch of C++ compilers. And back in the day, we used to use many, many

				different C++ compilers, and they're all slightly different, so that was a

				little bit painful. And then the C++ language started changing more frequently,

				like with C++ 11, 14, 17, 20, and so on. And so that was a huge drain on

				productivity. Updating compilers was always a year-long project, and we had to

				update, like, seven different compilers, one on Android, iOS, Windows, macOS,

				Android, Fuchsia, whatnot. So over time, we moved to - we moved from using

				basically the system compiler to using a hermetically built Clang that we

				download as a gclient DEPS hook. So when you run gclient sync, that downloads a

				prebuilt Clang binary. And we use that Clang binary to build Chrome on all

				operating systems. So if one file builds for you on your OS, then chances are

				it'll build on all the other OSes because it's built by the same compiler. And

				that also enables things like cross builds, so you can build Chrome for Windows

				on Linux if you want to because your compiler is right there.

				16:11 SHARON: Oh, cool. All right. I didn't know that. Is there any reason,

				historically, that Clang beat out these other compilers as the compiler of

				choice?

				16:24 NICO: Yes. So it's basically - I think when we looked at this - so Clang

				is basically the native compiler on macOS and iOS, and GCC is kind of the

				system compiler on Linux, I suppose. But Clang has always had very good GCC

				compatibility. And then on Windows, the default choice is Visual Studio. And we

				still want to link against the normal Microsoft library, so we need a compiler

				that's ABI-compatible with the Microsoft ABI. And GCC couldn't do that. And

				Clang also couldn't do that, but we thought if we teach Clang to do that, then

				Clang basically can target all the operating systems we care about. And so we

				made Clang work on Windows, also with others. But there was a team funded by

				Chrome that worked on that for a few years. And also, Clang has pretty good

				tooling interface. So for code search, we also use Clang. So we now use the

				same code to compile Chrome and to index Chrome for code search.

				17:28 SHARON: Oh, cool. I didn't know that either, so very interesting. OK.

				We're just going to keep going back. And as you mention more things, we'll

				cover that, and then go back to something you previously mentioned. So next on

				the list is gclient sync. So I think for everyone who's ever worked on Chrome,

				ever, especially at the start, you're like, I'll build Chrome. You build your

				target, and you get these weird errors. And you look at it, and you think, oh,

				this isn't some random weird spot that I definitely didn't change. What's going

				on? And you ask a senior team member, and they say to you, did you run gclient

				sync? And you're like, oh, I did not. And then you run it, and suddenly, things

				are OK. So what else is going - you mentioned a couple of things that happen.

				So what exactly does gclient sync do?

				18:13 NICO: Yeah. So as I - that's this file at the source root called DEPS,

				D-E-P-S, all capital letters. And when you update - if you git pull the Chrome

				repository, then that also updates the DEPS file. And then this DEPS file

				contains a long list of revisions of dependent projects. And then when you run

				gclient sync, it basically syncs all these other git repositories that are

				mentioned in the DEPS file. And after that, it runs so-called hooks, which like

				do things download a new Clang compiler and download a bunch of other binaries

				from something called the CIPD, for example, GN. But yeah, basically makes sure

				that all the dependencies that are in Chrome but that aren't in the Chrome

				repository are also up to date. That's what it does.

				19:06 SHARON: OK. Do you have a rough ballpark guess of how many dependencies

				that includes?

				19:13 NICO: Its operating system dependent. I think on Android we have way

				more, but it's on the order of 200. Like, 150 to 250.

				19:25 SHARON: Sounds like a lot. Very cool. OK. In terms of - speaking of other

				dependencies, one of the top-level directories in Chrome is `//third_party`,

				and that seems in the same kind of direction. So how does stuff in

				`//third_party` work in terms of building? Can you just build them as targets?

				What kind of stuff is in there? What can you and can you not build? Like, for

				example, Blink is one of the things in `//third_party`, and lots of people -

				that's a big part of it, right? But a lot of things in there are less active

				and probably less big of a part of Chrome. So does `//third_party` just build

				anything else, or what's different about it?

				20:09 NICO: And that's a great question. So Blink being in `//third_party` is a

				bit of a historical artifact. Like, most things - almost all of the things in

				`//third_party` is basically code that's third-party code. That's code that we

				didn't write ourselves. And Chrome's secret mission is to depend on every other

				library out there in the world. No, we depend on things like libpng for reading

				PNG files, libjpeg for reading all of - libjpeg-turbo these days, I guess, for

				reading JPEG files, libxml for reading XML, and so on. And, well, that's many

				dependencies. I won't list them all. And some of these third-party dependencies

				are just listed in the DEPS file that we talked about. And so they basically -

				like, when gclient sync runs, it pulls the code from some git repository that

				contains the third-party code and puts it into your source tree. And for other

				third-party code, we actually check in the code into the Chrome main repository

				instead of DEPSing it in. There are trade-offs, which approach to choose. We do

				both from time to time. But yeah. Almost no third-party dependency has a GN

				file upstream, so usually what you do is you have to write your own BUILD.gn

				file for the third-party dependency you want to add. And then after that, it's

				fairly normal. So for a library, if you want to add a dependency on libfoo,

				usually what we do is you add - you create third-party libfoo, and you put

				BUILD.gn in there. And then you add a DEPS entry that syncs the actual code to

				a third-party libfoo source or something. Yes.

				21:37 SHARON: All right. Sounds good. Again, you mentioned BUILD.gn files, and

				that's, as, expected a big part of how building works. And that's probably the

				part that most people have interacted more with, outside of just actually

				running whatever command it is to build Chrome. Because if you create, delete,

				rename any files, you have to update it in some BUILD.gn file. So can you walk

				us through the different things contained in a BUILD.gn file? What are all the

				different parts?

				22:12 NICO: Sure. So there's a great presentation by Brett, who wrote GN, that

				we can link to. But in a summary, it's - BUILD.gn contains build targets, and

				the build target normally is like - it doesn't have to be, but usually, it's a

				list of CC files that belong together and that either make up a static library

				or a shared library on executable. So those are the main target types for CC

				code. But then you can also have custom build actions that run just arbitrary

				Python code, which, for example, if you compile a protobuf - proto files into

				CC and H - into C++ and header files, then we basically have a Python script

				that runs protoc, the proto compiler, to produce those. And so in that case,

				the action generates C++ files, and then those get built. But the other, simple

				answer is libraries or executables.

				23:11 SHARON: OK. One part of GN files that has caused me personally some

				confusion and difficulty - which I think is maybe, depending on the part of

				Chrome you work on, less of an issue - is DEPS. So you have DEPS in your GN

				files, and there's also something called external DEPS. And then you have

				separate DEPS files that are just called capital D-E-P-S.

				23:30 NICO: Yes. Yes, there, that's some redundant - that's, again, I guess for

				historical reasons. So in gclient, DEPS just means to build this target, you

				first have to build these other targets. Like, this target depends - uses this

				other code. And in different contexts, it kind of means different things. So

				for example - I think if an executable depends on some other target, then that

				external executable is linked - that other target is also linked in. If base

				unit test depends on the base library, which in a normal build is a static

				library - like in a normal build? Like in a release build, by default, it's a

				static library. And so if base unit test is built, it first creates a static

				library and then links to it. And then base itself might depend on a bunch of

				third-party things, libraries, which means when base unit tests is linked, it

				links base, but then it also links against basis dependencies. So that's one

				meaning of DEPS. Another meaning, like these capital DEPS files, that's

				completely distinct. Has nothing to do with GN, I'm sad to say. And that's just

				for enforcing layering. Those predate GN, and they are for enforcing layering

				at a fairly coarse level. They say, code in this directory may include code

				from this other directory but not from this third directory. For example, a

				third - like, Blink must not - may include stuff from base, but must not

				include anything from, I don't know, the Chrome layer or something.

				25:18 SHARON: Right, the classic content Chrome layering, where Chrome -

				25:18 NICO: Right. And I think -

				25:18 SHARON: content, but -

				25:18 NICO: Right. And there's a step called check-deps, and that checks the

				capital DEPS files.

				25:24 SHARON: OK. Yeah, because before, I worked on some Fuchsia stuff, and

				because we're adding a lot of new things, you're messing around with different

				DEPS and stuff a lot more than I think if you worked in a typical part. Like,

				now, I mostly just work within content. Unlikely that you're changing any

				dependencies. But that was always a bit unclear because, for the most part, the

				targets have very similar names - not exactly the same, but very similar. And

				if you miss one, you get all these weird errors. And it was, yeah, generally

				quite confusing.

				25:55 NICO: Yeah, that's pretty confusing. One thing of the capital DEPS things

				that they can do that the GN DEPS can't is if someone adds a DEPS on your

				library and they add an entry to their DEPS file, that means that now at code

				review time, you need to approve that they depend on you. And that's not

				something we can do at the GN level. And the advantage there is, I don't know,

				if you have some library and then 50 teams start depending on it without

				telling you, and now you're on the hook for keeping all these 50 things

				working, then with this system, you at least have to approve every time someone

				adds a dependency on you, you have to say, this is fine with me. Or you can

				say, actually, this is - we don't want this to be used by anyone else.

				26:45 SHARON: Is there an ideal state where we don't have these DEPS files and

				maybe that functionality is built into the BUILD.gn files, or is this something

				that's probably going to be sticking around for a while?

				26:52 NICO: That's a great question. I don't know. It seems weird, right? It's

				redundant. So I think the current system isn't ideal, but it's also not

				horrible enough that we have to fix it immediately. So maybe one day we'll get

				around to it.

				27:10 SHARON: Yeah. I think I've mostly just worked on Chrome, so I've gotten

				pretty used to it. But a common complaint is people who work in Google internal

				things or other, bigger - the main build system of whatever company they work

				on, they come to Chrome and they're like, oh, everything's so confusing. But if

				you - you just got to get used to it, but -

				27:27 NICO: Right. I think if you're confused by anything, it's great if you

				come to us and complain. Because you kind of become blind to these problems,

				right? I've been doing this for a long time. I'm used to all the foot guns. I

				know how to dodge them. And yeah. So if you're confused by anything, please

				tell me personally. And then if enough people complain about something, maybe

				we'll fix it.

				27:55 SHARON: All right. Yeah. That's what you said. The outcome of that -

				we'll see. We'll see how that goes. We'll see how many complaints you suddenly

				get. Right. OK. So another thing I was interested in is right now there's a lot

				of work around Rust, getting more Rust things, introducing that, memory safety,

				that's good. We like it. What is involved from a build perspective for getting

				a whole other language into Chrome and into the build? Because we have most of

				the things C++. There's some Java in all of the Android stuff. And in some

				areas, you see - you'll see a list of - you'll see a file name, and then you'll

				see file name underscore and then all the different operating systems, right?

				And most those are some version of C++. The Mac ones are .mm. And you have Java

				ones for Android. But if you want to add an entirely different language and

				still be able to build Chrome, at a high level, what goes into that?

				29:00 NICO: Yeah, there's also some Swift on iOS. It's many different things.

				So at first, you have to teach GN how to generate Ninja files for that

				language. So when a CC file is built, then basically the compiler writes out a

				file that says, here are all the header files I depend on. So if one of them

				gets touched, the compiler - or Ninja knows how to rebuild those. So you need

				to figure out how the Rust compiler or the Swift compiler track dependencies.

				You need to get that information out of the compiler into the build system

				somehow. And C++ is fairly easy to build. It's like a per-file basis. I think

				most languages are more on a module or package base, where you build a few

				files as a unit. Then you might want to think about, how can I make this work

				with Goma so that the compilation can work remotely instead of locally? So

				that's the build system part. Then also, especially for us, we want to use this

				for some performance critical things, so it needs to be very fast. And we use a

				bunch of toolchain optimization techniques to make Chrome very fast with

				three-letter acronyms, such as PGO and LTO and whatnot. And LTO in particular,

				that means a Link Time Optimization. That means the C++ or the Rust code is

				built - is compiled into something called "bitcode." And then all the bitcode

				files at link time are analyzed together so you can do cross-file in-lining and

				whatnot. And for that work, the bitcodes - all the bitcode versions need to be

				compatible, which means Clang and Rust need to be built against the same

				version of LLVM, which is some - it's some internal compiler machinery that

				defines the bitcode. So that means you have to - if you want to do

				cross-language LTO, you have to update your C++ compiler and your Rust compiler

				at the same time. And you have to build them at the same time. And when you

				update your LLVM revision, it must break neither the C++ compiler nor the Rust

				compiler. Yeah. And then you kind of want to build the Rust library from

				source, so you have bit code for all of that. So it's a fairly involved - but

				yeah, we've been doing a lot of work on that. Not me, but other people.

				31:24 SHARON: Right. Sounds hard. And what does LTO stand for, since you used

				it?

				31:30 NICO: Link Time Optimization.

				31:30 SHARON: All right.

				31:30 NICO: And there's a blog post on the Chromium blog about this that we can

				link to in the show notes that has a fairly understandable explanation what

				this does.

				31:43 SHARON: Yeah, all right. That sounds good. So linking, that was my next

				question. As you build stuff, you sort out all of your just compile errors, you

				got all your spelling mistakes out. The next type of error you might get is

				linking error. So how does - can you tell us a bit more about linking in

				general and how that fits into the build process?

				32:01 NICO: I mean, linking - like, for C++, the compiler basically produces

				one object file for every CC file. And then the linker takes, like, about

				50,000 to 100,000 object files and produces a single executable. And every

				object file has a list of functions that are defined in that object file and a

				list of functions that are undefined in that object file that it calls that are

				needed from elsewhere. And then the linker basically makes one long list of all

				the functions it finds. And at the end, all of them should be defined, and all

				the non-inline ones should be defined in exactly one object file. And if

				they're not - if that doesn't happen, then it emits an error, and else, it

				emits a binary. And the linker is kind of interesting because the only thing

				you really care about is that it does its job very quickly. But it has to read

				through gigabytes of data before it writes the executable. And currently, we

				use a linker called `ld`, which was also written by people on the Chrome team,

				and which is also fairly popular outside of Chrome nowadays. And so we wrote on

				ELF linker, which is the file format used on Linux and Android, and on COFF

				linker, which is the file system used on Windows, and our own Mach-O linker,

				which is the file system on Apple - macOS and iOS. And our linkers are way,

				way, way faster than the things that they replace. On Windows, we were, like,

				10 times faster than the Windows linker. And on Mac, we're, like, four times

				faster than the system linker and whatnot. The other linker vendors have caught

				up a little bit, but we - I feel like Chrome has really advanced the state and

				performance of linking binaries across the industry, which I think is really

				cool.

				33:44 SHARON: Yeah, that is really cool. And in a kind of similar vein to the

				different OSes and all that kind of stuff is 32- versus 64-bit. There's some

				stuff happening. I've seen people talk about it. It seems pretty important. Can

				you just tell us a bit more about this in general?

				34:04 NICO: Well, I guess most processors sold in the last decade or so are

				64-bit. So I think on some platforms, we only support 64-bit binaries, like -

				and the bit just means how wide is a pointer and has some implications on which

				instructions can the compiler use. But it's fairly transparent too, I think, at

				the C++ level. You don't have to worry about it all that much. On macOS, we

				only support 64-bit builds. Same on iOS. On Windows, we still have 32-bit and

				64-bit builds. On Linux, we don't publicly support 32-bit, but I think some

				people try to build it. But it's really on Windows where you have both 32-bit

				and 64-bit builds. But the default bits is 64-bit, and you can say, if you say

				target CPU equals x86, I think, in your args.gn, then you get a 32-bit build.

				But it should be fairly transparent to you as a developer, unless you write

				assembly.

				35:02 SHARON: How big of an effort would it be to get rid of 32-bit on Windows?

				Because Windows is probably the biggest Chrome-using platform, and also,

				there's a lot of versions out there, right? So -

				35:15 NICO: Oh, yeah.

				35:15 SHARON: How doable?

				35:15 NICO: I think that the biggest platform is probably Android. But yeah,

				Android is also 32-bit, at least on some devices at the moment. That's true. I

				don't know. I think we've looked into it and decided that we don't want to do

				that at the moment. But I don't know details.

				35:33 SHARON: And you mentioned ARM. So is there any - how much does the Chrome

				build team - are they concerned with the architecture of these processors? Is

				that something that, at the level that you and the build team have to worry

				about, or is it far enough - a few layers down that that's -

				35:47 NICO: It's something we have to worry about at the toolchain team. So we

				update the scaling compiler every two weeks or so, which means we pull in all -

				around 1,000 changes from upstream contributors that work on LVM spread across

				many companies. And we have to make sure this doesn't break from on 32-bit ARM,

				64-bit ARM, 32-bit Intel, 64-bit Intel, across seven different operating

				systems. And so fairly frequently, when we try to update Clang tests start

				failing on, I don't know, 32-bit Windows or on 64-bit iOS or some very specific

				configuration. And then we have to go and debug and dissect and figure out

				what's going on and work with upstream to get that fixed. So yeah. That's

				something we have to deal with at the toolchain team, but hopefully, it's -

				hopefully, like the normal Chrome developer is isolated from that for the most

				part.

				36:45 SHARON: I think so. It's not - if I weren't asking all these other

				questions, it's something that almost never crosses my mind, right? So that

				means you're all doing a very good job of that. Thank you very much. Much

				appreciated. And jumping way back, you mentioned earlier indexing the code

				base, code search. So I make a change. I submit it. I upload it. It eventually

				ends up in code search. So how does that process work? And what goes into

				indexing? Because before, when I was working on Fuchsia all the Fuchsia code

				wasn't indexed, so you couldn't do the handy thing of clicking a thing and

				seeing where it was defined. You had to actually look it up. And once you got

				that, it was like, oh my gosh, so much better. So can you just tell us a bit

				more about that process?

				37:30 NICO: Sure, yeah. The Chrome has a pretty good code search feature, I

				think, codesearch.chromium.org or cs.chromium.org. Basically, we have a bot

				that runs, I think, every six hours or so, pulls the latest code, bundles it

				up, sends it to some indexer service that then also uses Clang to analyze the

				code. Like, for C++, I think we also index Java. We probably don't index Rust

				yet, but eventually we will. And then it generates - for every word, it

				generates metadata that says, this is a class. This is an identifier. And so if

				you click on it, if you click on a function, you have the option of jumping to

				the definition of the function, to the declaration, to all the calls, all the

				overrides, and so on. And that updates ideally several times a day and is

				fairly up to date. And we built the index, I think, for most operating systems.

				So you can see this is called here on Linux, here on Windows, and what not.

				38:32 SHARON: OK. Sounds good. Very useful stuff. And I don't know if this is

				part of the build team's jurisdiction, but when you are working on things

				locally, you have some git commands, and then you have some git-cl commands.

				38:43 NICO: Mm-hmm.

				38:48 SHARON: So the git commands are your typical ones - git pull, git rebase,

				git stash, that kind of thing. And then you have git-cl commands, which relate

				more to your actual CL in Gerrit. So git-cl upload, git-cl status. That'll show

				you all your local branches and if they have a Gerrit change associated with

				them. So what's the difference between git and git-cl commands?

				39:18 NICO: I'm sorry. So this is basically a git feature. If you call git-foo,

				then git looks for git-foo on your path. So you can add arbitrary commands to

				git if you want to. And git-cl is just something that's in `depot_tools`.

				Again, there's git-cl in `depot_tools`, and you can open that and see what it

				does. And it'll redirect to `git_cl.py`, I think, which is a fairly long and

				hairy Python script. But yeah. It's basically Gerrit integration, as you say.

				So you can use that to send try jobs, `git cl try`. To upload, as you say, you

				can use `git cl issue` to associate your current branch with a remote Gerrit

				review, `git cl patch` to get a patch off Gerrit and patch it into your local

				thing, `git cl web` to open the current thing in a web browser. Yeah, git-cl is

				basically - git-cl help to see all the git-cl commands, or - yeah. If you have

				a change that touches, like, 1,000 files, you can run `git cl split`, and it'll

				upload 500 reviews. But that's usually too granular, and I wouldn't recommend

				doing that. But it's possible.

				40:25 SHARON: Right. Do you have a - [DOORBELL DINGS]

				40:25 NICO: Oops, sorry.

				40:25 SHARON: commonly - yeah.

				40:30 NICO: Oh, sorry. There was - the door just rang. Maybe you didn't hear

				it. Sorry.

				40:30 SHARON: All right. It's all good. Do you have a lesser known git or

				git-cl command that you use a lot or -

				40:41 NICO: Well, I -

				40:41 SHARON: is your favorite? [LAUGHS]

				40:46 NICO: It's not lesser known to me, so I wouldn't know. I don't know. I

				use `git cl upload` a lot.

				40:53 SHARON: Right. Well, you have to use `git cl upload`, right?

				40:53 NICO: I use -

				40:53 SHARON: Well, you don't - maybe not but -

				40:53 NICO: `git cl try` to send try jobs from my terminal, `git cl web` to see

				what's going on, `git cl patch` a lot to patch stuff in locally. If I'm doing a

				code review and I want to play with it, I patch it in, build a local, and see

				how things are working.

				41:12 SHARON: Yeah. When I patch in a thing, I go from the cl page on Gerrit

				and then click the down patch thing, but -

				41:21 NICO: No, even `git cl patch -b` and then some branch name, and then you

				just patch - paste the Gerrit review URL.

				41:28 SHARON: Oh, cool.

				41:28 NICO: So it's just, yeah, Control-L to focus the URL bar. Control-C

				Alt-Tab `git cl patch -b blah`, Paste, Enter, and then you have a local branch

				with the thing.

				41:36 SHARON: All right. Yeah, a lot of these things, once you learn about

				them - at first you're like, whoa, and then you use them, and then they're not

				lesser known to you, but you tell other people also a common - so another one

				would be `git cl archive`, which will -

				41:47 NICO: Oh, yeah, yeah.

				41:47 SHARON: get rid of any local branches associated with a closed Gerrit

				branch, so that's very handy, too.

				41:53 NICO: Yes.

				41:53 SHARON: So it's always fun to learn about things like that.

				41:59 NICO: Are you fairly tidy with your branches? How many open branches do

				you usually have?

				41:59 SHARON: [LAUGHS] I used to be more tidy. When I tried to do a cleanup

				thing, I had more branches. I think right now I've got around 20-something

				branches. I like having not very many. I think to some people, that's a lot. To

				some people, that's not very many. I mean, ideally, I have under five, right?

				[LAUGHS] But -

				42:18 NICO: I don't know. I usually have a couple 10, sometimes. Have a bunch

				of machines. I think on some of them it's over 100, but yeah. Every now and

				then, I run `git cl archive` and it removes half of them, but -

				42:29 SHARON: Yes. All right, cool. Is there anything that we didn't cover so

				far that you would like to share? So things that maybe you get asked all the

				time, things that people could do better when it comes to build-related things?

				Things that you can do that make the build better or don't make it worse, that

				kind of thing? Or just anything else you would like to get out there?

				42:58 NICO: I guess one thing that's maybe implicitly stated, but currently not

				explicitly documented, as far as I know, but I'm hoping to change that, is - so

				Chrome tries to have a quiet build. Like, if you build this zero build output,

				except that one Ninja file, Ninja line that's changing, right? There's, well,

				another code basis - I think it's fairly common - that there's many screenfulls

				of warning that scroll by. And we very explicitly try not to do that because if

				the build emits lots of warnings, then people just learn to ignore warnings. So

				we think something should either be a serious problem that people need to know

				about, then it should be an error, or it should be not interesting. Then it

				should be just quiet. So if you add a build step that adds a random script, the

				script shouldn't print anything, just about progress. Shouldn't say, doing

				this, doing this, doing this. Should either print something and say something's

				wrong and fail those build step or not say anything. So that's one thing.

				43:51 SHARON: That's - yeah, that's true.

				43:51 NICO: And the other thing -

				43:51 SHARON: Like, you only really get a bunch of terminal output if you have

				a compile or a linker error, whatever.

				43:57 NICO: Right.

				43:57 SHARON: I hadn't ever considered that. If you build something and it

				works, you get very few lines of output. And I hadn't ever thought that was

				intentional before, but you're right in that if it was a ton, you would just

				not look at any of it. So yeah, that's very cool.

				44:09 NICO: Yeah. And on that same note, we don't do deprecation warnings

				because we don't do any warnings. So if people - like, people like deprecating

				things, but people don't like tidying up calls to deprecated functions. So if

				you want to deprecate something in Chrome, the idea is basically, you remove

				all callers, and then you remove the deprecated thing. And we don't allow you

				to say - to add a warning that tells everyone, hey, please, everyone, remove

				your calls. The onus is on the person who wants to deprecate something instead

				of punting that to everyone else.

				44:46 SHARON: Yeah, I mean, the thing that I was working on has a deprecating

				effect, so removing callers, which is why I have so many branches. But I've

				also seen presubmit warnings for if you include something deprecated. So - oh,

				yeah, and there's presubmit, too. OK, we'll get to that also. [LAUGHS] Tell us

				more about all of this.

				45:05 NICO: About presubmits? Yeah, presubmits - presubmits are terrible.

				That's the short summary. So if you run a `git cl presubmit`, it'll look at a

				file called presubmit.py, I think, in the current directory, and maybe in all

				the directories of files - of directories that contain files you touched or

				something like that. But you can just open the top-level presubmit.py file, and

				there's a couple thousand lines of Python where basically everyone can add

				anything they want without much oversight, so it's a fairly long - at least

				historically, that used to be the case. I don't know if that's still the case

				nowadays. But yeah, it's basically like a long list of things that random

				people thought are good if they - like, presubmits are something that are run

				before you upload, also, implicitly. And so you're supposed to clean them up.

				And [INAUDIBLE] many useful things. For example, nowadays we require most code

				to be autoformatted so that people don't argue about where semicolons should go

				or something silly like that. So one of the things it checks is, did you run

				`git cl format`, which runs, I guess, Clang format for C++ code and a bunch of

				custom Python scripts for other files. But it's also - presubmits have grown

				organically, and there isn't - they're kind of unowned and they're very, very

				slow. And I think some people have tried to improve them recently, and they're

				better than they used to be, but I don't love presubmits, I guess is the

				summary. But yeah, it's another thing to check invariants that we would like to

				be true about our code base.

				46:48 SHARON: Yeah. I mean, I think - yes, spelling is something I think it

				also checks.

				46:54 NICO: It checks spelling? OK.

				46:54 SHARON: Or maybe that's a separate bot in Gerrit.

				46:59 NICO: Oh, yeah, yeah, yeah, yeah. Like, there's this thing called -

				what's its name?

				47:06 SHARON: Trucium? Tricium?

				47:06 NICO: Tricium, yeah. Tricium, right. Tricium is something that adds

				comments to your - automatically adds comments to your change list when you

				upload it. And Tricuium can do spelling correction, but it can also - it runs

				something called Clang Tidy, which is basically a static analysis engine which

				has quite a few false positives, so sometimes it complains about something

				that - but it's actually incorrect, and so we don't put that into the compiler

				itself. So we've added a whole bunch of warnings to the compiler for things

				that we think are fairly buggy. But Clang Tidy is - but these warnings have to

				be - they have to have a very low false positive rate. Like, if they complain,

				they should almost always be right. But sometimes, for static analysis, it's

				hard to be right. Like, you can say this might be wrong. Please be sure. But

				this is not something the compiler can say, so we have this other system called

				Clang Tidy which also adds a comment to your C++ code which says, well, maybe

				this should be a reference instead of a copy, and things like that.

				48:04 SHARON: Yeah. And I think it - I've seen it - it checks for unused

				variables and other - there's been useful stuff that's come from comments from

				there, so definitely. All right. Very cool. So if people are interested in all

				this build "infra-y" kind of stuff and they want to get more into it, what can

				they do?

				48:32 NICO: We have a public build@chromium.org mailing list. It's very low

				volume, but if you want to reach out, you can send an email there and a few of

				us will see your email and interact with you. And there's also I think the tech

				build on crbug. So you can just look for build bugs and fix all our bugs for

				us. That'd be cool.

				48:51 SHARON: [LAUGHS]

				48:51 NICO: And if there's anything specific, just talk to local OWNERS. Or if

				you feel this is just something you're generally interested in and you're

				looking for a project, you can talk to me, and I probably have a long list of -

				I do have a long list of somewhat beginner-friendly projects that people could

				help out with, I guess.

				49:15 SHARON: Yeah. I mean, I think being able to - if you're looking for a

				20%y kind of project or something else. But knowing how things actually get put

				together is always a good skill and definitely applicable to other things. It's

				the kind of thing where the more low level-knowledge you have, the more - it

				works - it applies to things higher up, but not necessarily the other way

				around, right?

				49:34 NICO: Mm-hmm.

				49:34 SHARON: So having that kind of understanding is definitely a good thing.

				All right. Any last things you'd like to mention or shout out or cool things

				that you want people to know about? [LAUGHS]

				49:48 NICO: I guess -

				49:48 SHARON: Or what - yeah, quickly, what is the future of the whole build

				thing? Like, what's the ideal situation if -

				49:55 NICO: Ideally, it'll all be way faster, I guess is the main thing. But

				yeah, yeah, I think build speed is a big problem. And I'm not sure we have the

				best handle on that. We're working on many things, but - not many. A bunch of

				things. But it's - like, people keep adding all that much code, so if y'all

				could delete some code, too, that would help us a lot. I mean, having -

				supporting more languages is something we have to - this is something that's

				happening. Like, Rust is happening. We are also on iOS also using Swift.

				Currently, we can't LTO Swift with the rest because that's on a different OEM

				version. There's this - in C++ - we keep upgrading C++ versions. So Peter

				Kasting is working on moving us to C++20. And then 23, we'll have them, and so

				on. There's maybe C++ modules at some point, which may or may not help with

				build speed. And there's a bunch of tech debt that we need to clean up, but

				that's not super interesting.

				51:24 SHARON: I don't know. I think people in Chrome in general are more

				interested and care about reducing tech debt in general, right? A lot of people

				I know would be happy to just do tech debt clean-up things only, right?

				Unfortunately, it doesn't really work out for job reasons. But a lot of people,

				I think, are interested in, I think, in higher proportions than maybe other

				places.

				51:47 NICO: It depends on the tech debt. Some of it might work out for job

				reasons. But, yeah.

				51:54 SHARON: Yeah. I mean, some of it is easier than others, too, right? Some

				of it is like, yeah, so, OK, well, go delete some code. Go clean up some

				deprecated calls. [LAUGHS] All that.

				52:08 NICO: Yeah, and again, I think finishing migrations is way harder than

				starting them, so finish more migrations, start fewer migrations. That'd be

				also cool.

				52:16 SHARON: All right. I am sure everyone listening will go and do that right

				away.

				52:21 NICO: Yep.

				52:21 SHARON: And things will immediately be better.

				52:27 NICO: They've just been waiting to hear that from me, and now they're

				like, ah, yeah, right. That makes sense.

				52:27 SHARON: Yeah, yeah. All right. Well, you all heard it here first. Go do

				that. Things will be better, et cetera. So all right. Well, thank you very

				much, Nico, for being here answering all these questions. I learned a lot. A

				lot of this is stuff that - everyone who works on Chrome builds Chrome, right?

				But you can get by with a very minimal understanding of how these things are.

				Like, you see your - you follow the Intro to Building Chrome doc. You copy the

				things. You're like, OK, this works. And then you just keep doing that until

				you have a problem. And depending on where you work, you might not have

				problems. So it's very easy to know very little about this. But obviously, it's

				so important because if we didn't have any of this infrastructure, nothing

				would work. So one, I guess, thank you for doing all the stuff behind the

				scenes, determinism, OSes, all that, making it a lot easier for everyone else,

				but also thank you for sharing about it so people understand what's actually

				going on when they run the commands they do every day.

				53:31 NICO: Sure. Anytime. Thanks for having me. And it's good to hear that

				it's possible to work on Chrome without knowing much about the build because

				that's the goal, right? It should just work.

				53:44 SHARON: Yeah.

				53:44 NICO: Sometimes it does.

				53:44 SHARON: [LAUGHS] Yeah. Well, thank you for all of it, and see you next

				time.

				53:51 NICO: Yeah. See you on the internet. Bye.

				54:03 SHARON: OK. So we will stop recording -

				54:03 NICO: Wee. Time for the second take.

				54:03 SHARON: [LAUGHS] Let's do that, yeah, all over again.

				54:11 NICO: Let's do it.

				54:11 SHARON: I will stop recording.

									
										978

docs/transcripts/wuwt-e06-open-source.md
									
										Normal file
									
				@ -0,0 +1,978 @@

				# What’s Up With Open Source

				This is a transcript of [What's Up With

				That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)

				Episode 6, a 2023 video discussion between [Sharon (yangsharon@chromium.org)

				and Elly

				(ellyjones@chromium.org)](https://www.youtube.com/watch?v=zOr64ee7FV4).

				The transcript was automatically generated by speech-to-text software. It may

				contain minor errors.

				---

				What does it mean for Chrome to be open source? What exactly is Chromium? Can

				anyone contribute to the browser? Answering all that is today's special guest,

				Elly. She worked all over Chrome and ChromeOS, and is passionate about

				accessibility, the open web, and free and open-source software.

				Notes:

				- https://docs.google.com/document/d/1a6sdrspJgAHDdQMMNGV0t7zo8QWgqq0hgsyV55tc_dk/edit

				Links:

				- [What's Up With BUILD.gn](https://www.youtube.com/watch?v=NcvJG3MqquQ)

				- [What's Up With //content](https://www.youtube.com/watch?v=SD3cjzZl25I)

				- [What are Blink Intents?](https://www.youtube.com/watch?v=9cvzZ5J_DTg)

				---

				00:00 SHARON: Hello, and welcome to "What's Up With That?" the series that

				demystifies all things Chrome. I'm your host, Sharon. And today, we're talking

				about open source. What does it mean to be open source? I've heard of Chrome,

				but what's Chromium? What are all the ways you can get involved? Answering

				those questions and more is today's special guest, Elly. Elly currently works

				on the Chrome content team, which is focused on making the web more fun and

				interesting to use. Previously, she's worked all over Chrome and Chrome OS.

				She's passionate about accessibility, the open web, and free and open-source

				software. Welcome, Elly.

				00:34 ELLY: Thank you, Sharon.

				00:34 SHARON: All right. First question I think is pretty obvious. What is open

				source? What does that mean?

				00:40 ELLY: Yeah, so open source is a pretty old idea. And it basically just

				means, in the purist sense, that the source code for a program is open to be

				read by others.

				00:51 SHARON: OK. And Chrome, the source code is available to be read by

				anyone. What else is it? Open source - I've heard of open-source community. It

				seems like there's a lot to it. So why don't you just tell us more about open

				source, generally?

				01:03 ELLY: Yeah, for sure. There's quite a bit of nuance here. And there's

				been differing historical interpretations of some of these terms, so I'll -

				there's two big camps that are important to talk about. One is open source,

				which means what I said - the source is available to be viewed. There's also

				the idea of free software, which is software that actually has license terms

				that allow for people to modify it, to make their own derivative versions of

				it, and that kind of thing. And so historically, there was a pretty big

				difference between those things. These days, the two concepts are often talked

				about pretty interchangeably because a lot of open-source projects are free

				software, and all free software projects basically are open source. But the

				distinction used to be very important and is still pretty important, I guess.

				Chromium is both open source and free software. So we ship under a license that

				allows for - not only for everyone to read and look at our code, but also for

				other folks to make their own versions of Chromium. So, yeah, Chromium, both

				open source and free software.

				01:56 SHARON: OK, very cool. And you mentioned Chromium in there. But I think

				for most people, when they think of the browser, they call it Chrome. So what

				is the difference between Chrome and Chromium? Are they the same thing? I think

				people, myself included, sometimes use those interchangeably, especially when

				you work on it. So what is the difference there?

				02:16 ELLY: Yeah, fantastic question. So Chromium is an open-source and free

				software web browser that is made by the Chromium Foundation, which is like an

				actual .org that exists somewhere on the internet. Chrome is a Google-branded

				web browser that is basically made by taking Chromium, which is an open-source

				and free software web browser, adding some kind of Google magic to it, like

				integrations with some Google services, some kind of media codecs that maybe

				aren't themselves free software, that kind of thing, bundling that up into a

				more polished product which we call Google Chrome, and then shipping that as a

				web browser. So Chromium is an open-source project. Google Chrome is a Google

				product that is built on top of Chromium.

				03:03 SHARON: OK. So Google Chrome is a Chromium-based browser, which is a term

				I think that people who work in any browser stuff - it's a term that they've

				all [INAUDIBLE] before.

				03:08 ELLY: Yeah, exactly. And in fact, you alluded to the fact that we

				sometimes use those terms interchangeably. And especially at Google, we

				sometimes get a little confused about what we're talking about sometimes

				because we're - the Google Chrome team are the biggest contributors to

				Chromium, the open-source project. And so we tend to sometimes talk about the

				two things as though they're the same. But there's a really important

				difference for folks who are working on other Chromium-derived browsers. So if

				you're working on a Chromium derivative that a Linux distribution ships, for

				example, your browser is based on Chromium, and it's really not Chrome. It's

				Chromium, right? It is the open-source browser that Chrome is based on. But

				it's not the same thing at all.

				03:52 SHARON: Yeah, if you want to learn a bit more about basing things on

				Chromium, the content episode is a good one to check out. We talk a bit about

				that and embedding Chrome in Chromium and what that means. So -

				04:03 ELLY: Yeah, absolutely.

				04:03 SHARON: check it out if you [INAUDIBLE]...

				04:03 ELLY: And there's also, in the Chromium source tree, there's actually a

				thing called Content Shell, which is a minimal demonstration browser. It's like

				the rendering engine from Chromium wrapped in the least amount of browser

				possible to make it work. And we use it for testing, but it's also a really

				good starting point if you're trying to learn how to build a Chromium

				derivative browser.

				04:22 SHARON: OK, very neat. So I think a next very natural question to come

				out of this is, why is Chrome or Chromium - Chromium rather - going to try to

				be good about using those correctly here - but why is Chromium open source?

				04:40 ELLY: Yeah, so this is the decision that we made right when we were

				starting the project actually. And it's based on this really fundamental idea

				that the web benefits when users have better browsers. So if we, like the

				Chromium project, come up with some super clever way of doing something, or we

				come up with some really ingenious optimization to our JavaScript Engine or

				something like that, it's better for the web, better for everyone, and

				ultimately even better for Google as a business if those improvements are

				actually adopted by other people and taken by other people and used by them. So

				it is better for us if other people make use of anything clever that we do. And

				separately from that, there's this idea that's really prevalent in open-source

				communities of, if people can read the code, they're more likely to find bugs

				in it. And that's something that Chromium constantly benefits from, is folks

				who are outside the project, just kind of looking through our code base,

				reading and understanding it, spotting maybe security flaws that are in there.

				That kind of research is so much easier to do when the source code is just

				there, and you're not trying to reverse-engineer something you can't see the

				source to. So we get a lot of benefit from being open-source like that. And

				those are the reasons we had originally, and those still all hold totally true

				today, I think.

				05:51 SHARON: That makes sense. Yeah, it seems, at first, a bit odd for a big

				company like Google to make something like this open source. But there are

				other massive open-source things at Google - Android, I think, being the other

				canonical example, which we don't know too much about, but we won't be getting

				too into that. But there are other big open-source projects around.

				06:08 ELLY: Yeah, absolutely. And there's also, like - there's Go. That's an

				open-source programming environment, like a language and a compiler and a bunch

				of tools around it that is open source built by Google. There are plenty of

				other open-source and free software projects built by large corporations, often

				for really the same reasons. We benefit because the entire web benefits from

				better technology.

				06:32 SHARON: Yeah, I think some of the Build stuff we do is open source. Check

				out the previous episode for that. And that's, yeah, exactly - not strictly

				only used by -

				06:37 ELLY: Yeah, and by the way, partly because we're open source - like, for

				example, the Chromium base library, which is part of our C++ software

				environment - our base library is regularly used in other projects, even things

				that are totally unrelated to browsers, because it provides a high-quality

				implementation of a lot of basic things that you need to do. And so that code

				is being used in so many places we would never have anticipated and has done

				honestly more good in the world than it would do if it was just part of a

				really excellent browser.

				07:13 SHARON: Something that someone on my first team told me was, if you've

				changed anything in base, that probably is going to get run any time the

				internet gets run, somewhere in that stack, which, if you think about it, is so

				crazy.

				07:26 ELLY: Oh, Yeah. Absolutely. Early in my career, I added a particular

				error message to part of the Chrome network stack. And that network stack, too,

				is one of those components that gets reused in a lot of places. And so

				occasionally, I'll be running some completely other program. Like, I'll be

				running a video game or something, and I'll see that error message that I added

				being emitted from this program. And I'm like, oh, my code is living on in a

				place I would never have really thought of.

				07:51 SHARON: Oh, that's very cool.

				07:51 ELLY: Yeah.

				07:51 SHARON: Yeah.

				07:51 ELLY: It's one of those unique open-source experiences in my book, of

				seeing your own work being used like that by other folks you wouldn't have

				anticipated.

				07:57 SHARON: Yeah, that's very cool. So something I think I've heard you say

				before that I thought sounded very cool was the open-source dream. So can you

				tell us a bit more about what that is. What is that vision? It sounds very

				nice.

				08:09 ELLY: Yeah, so I talked about this a little bit. And earlier, I cautioned

				against conflating open-source and free software. But it really is more of the

				free software dream than the open-source dream, in some sense. That dream is

				this idea that if we have software that is made available for free, under

				licenses that let people modify it and make derivative works and keep using it,

				that over time, everyone will get access to really high-quality and

				freely-available software. And we will have a situation where the software that

				people need is built by their communities, built by the people who are in those

				communities, instead of being something that they have to buy from a company

				that makes it. It'll be something they can instead produce for themselves. And

				over time, I think that this has really played out in that way. If you look at

				the state of operating systems today, for example, there are these really

				high-quality, freely-available open-source free software operating systems that

				are readily available and anyone can use, and they really do meet the needs for

				a lot of folks. And then, in fact, it kind of circles back to where Linux is a

				high-quality, free software open-source operating system that Google can then

				turn around and make really good use of to build something like Chromium OS,

				which is another free software open-source project that uses Linux as one of

				its major components. And then we get to produce a product that the Chromium OS

				engineering team would have had to spend a lot of time if we weren't able to

				make use of that existing Linux kernel work. So you get into this cycle of

				giving back and sharing and benefiting from the effects of other people

				sharing. That's the free software dream to me.

				09:57 SHARON: It does - yeah, that sounds great. And for sure - I try to use

				open-source options when I can. When I edit these videos, I use something

				open-source. It feels appropriate for what we're doing here. So, yeah, that

				sounds like it would be - it's a good system that everyone contributes to and

				everyone benefits from. And that's really nice.

				10:10 ELLY: Yeah, absolutely.

				10:16 SHARON: So going away from that towards the more less open-source part,

				so what kind of things in Chrome, the browser, are not open source? You

				mentioned a couple of things earlier. Can you tell us a bit more about some of

				those things?

				10:27 ELLY: Yeah, I'm going to caveat this by saying that I don't personally

				work on the stuff I'm about to talk about. And so my knowledge is more

				superficial. There's a couple things I'm pretty confident about. So one is, for

				example, there's a few video formats that Chrome can play that Chromium cannot

				play because Google has agreements with the companies that make those codecs

				that allow us to basically license and embed their thing and ship it as part of

				Chrome. But those agreements, we can't really extend them to everyone who might

				make a Chromium browser. And so it ends up in a situation where there is a

				closed-source component that's included in Chrome to make that possible. I'm

				struggling to think of another example right off the top of my head. I believe

				that there's also a couple things in Chrome that are integrating with Google

				APIs, where they're features that are Chrome-specific because they're

				Google-specific. And one of the things that is generally true between the two

				products is that Chrome will have more Google integrations and more Google

				magic and more Google smarts than Chromium will. And so I think some of those

				are actually closed-source components that come from Google that get embedded

				into Chrome. But because they're a closed-source, we wouldn't want to put them

				into Chromium.

				11:37 SHARON: Right. It seems like, yeah, I can sign into Chrome. I don't

				expect that I'd be able to sign in with my gmail.com into, say, Chromium. I'm

				not sure it's actually part of it, but that's a guess.

				11:49 ELLY: Yeah, so that does work, except that you need to - any Chromium

				distributor needs to go and talk to - basically, talk to the sign-in team to

				get an API key that allows their browser to sign in. There is a process for

				doing that. It doesn't actually require any closed-source code components. But

				there is still a thing where you have to talk to the accounts team and

				basically be like, hey, we're a legitimate web browser, and we want to allow

				users to sign in. Because we don't want a situation where bots or malware are

				doing fake user sign-ins from - pretending to be Chromium. That's bad.

				12:25 SHARON: Right. That makes sense. Yeah, and I think because of where

				Chrome and Chromium are positioned, I think there will be some interesting

				comparisons and differences between Chrome, Chromium, and other internal

				google3 projects. So that's kind of the term for things that are closed-source

				Google - the typical Maps, Search, all that stuff - and also comparing Chromium

				to other open-source projects. So we've talked a bit about the similarities and

				differences between Chrome and Google internal. Are there any other things you

				can think of that are either similar or different between Chrome the project

				and the people who work on it and how people do things internally at Google?

				13:11 ELLY: Yeah. So internally at Google, there's this very powerful, very

				custom-built whole technology stack around the projects. There is a continuous

				integration system. There's an editor. There's a source control system. There's

				all of this stuff. Within Google, all of that is custom. And it's all fitted to

				Google's needs. And a lot of it is just built from scratch, frankly. Whereas

				for Chromium, we're using essentially off-the-shelf open-source stuff to meet a

				lot of those needs. So, for example, for version control, we're just using Git,

				which is I think the most popular version control system in the world right

				now. It's definitely open source. And our build system, for example, which is

				like GN and Ninja put together, those are both free software open-source

				projects. Admittedly, both of them were, I think, started as part of Chromium

				because we had those needs. But they, themselves, are free software components

				that anyone else can also use to build a Chromium. And the reason why that's

				done that way - like, why doesn't - it's actually a really good question. Why

				doesn't Chrome, which is a Google project, use all of this amazing

				infrastructure for engineering that Google has? And the answer is, we want the

				Chromium project to be possible to work on for people who don't work at Google.

				And so we can't say, oh, hey, whenever you're going to make a change, you have

				to commit it into Google's internal source control system. That wouldn't work

				at all. So we're almost - because we want to be an open-source project, and

				because we want to have contributors from outside of Google, we end up almost

				pushed into using this pretty open free software stack, which I - to be honest,

				from my perspective, has a lot of other benefits. When we have new folks

				joining the team, we can actually offer them tools they're already pretty

				familiar with. They don't have the feeling that new Googlers sometimes get,

				where they're totally disoriented. Like, everything they know about programming

				doesn't apply anymore. We actually be like, hey, here's Git. You know how to

				use this. Here's Gerrit, which is another piece of open-source software that we

				use. They may not have used Gerrit before, but a lot of projects do. And so

				they might have run into it previously. So it has pluses and minuses,

				definitely. So that's a big difference. There's also a bit of what I would say

				is a cultural difference more than anything else because most Google projects

				that are not open source - so I'm not talking about things like Android or Go

				or something like that - but projects that are really just not open source,

				like Search, their ecosystem of discussion and culture and stuff is very much

				inside Google. Whereas for Chromium, we constantly are getting ideas and

				suggestions and code changes and stuff from outside of Google. And so we also

				tend to have perspectives from outside of Google in our discussions more often

				as we work on Chromium. So part of that is at the level of, if we're going to

				make a change, we would have maybe input coming in on that change from Mozilla

				even. They're a group we collaborate with a ton on web standards. And so we

				would have their perspective in the discussion. Whereas if we were working

				entirely within Google, we might not have those external perspectives. So

				culture-wise, I feel like Chromium has more perspectives in the room sometimes

				when we're thinking about stuff.

				16:26 SHARON: That makes sense because browsers exist across other companies

				too, and there's a lot of compatibility and standards and stuff. So just in

				that nature of things, you have to have a lot more of this collaboration. If

				you make a change, it'll affect all of the embedders maybe, and then you have

				to think about this. And, yeah, there's a lot more discussion - [INTERPOSING

				VOICES]

				16:42 ELLY: Yeah, absolutely.

				16:42 SHARON: If you're Search, you're like, OK, we're going to, I don't know,

				do our thing.

				16:47 ELLY: Yeah, you have more - I don't know if "autonomy" is the right word.

				But, yeah. I want to caveat this by saying I'm not on Search. And so maybe it's

				totally different. But that's how it looks to me as a person who works on

				Chrome.

				16:59 SHARON: Yeah. Yeah. And I think in terms of actual development and making

				code changes and stuff, I think probably the biggest difference is that because

				anyone can download the source repository and make changes and all that, the

				actual programming and changes you do, you do those on a computer. Maybe that's

				a machine you SSH into or a cloud top or whatever. But you have to actually

				download all of the code. Whereas with all of the google3 stuff, everything

				happens in a cloud somewhere. So everything is all connected, and you just do

				things through the browser pretty much.

				17:29 ELLY: That's very true. Actually, there's another important facet that

				just occurred to me, which is, because Chromium is open source - and in

				particular, some open-source projects will use this model where they send out a

				release every so often. So they'll be like, we're shipping a new major release

				of our program, and here's the source that corresponds to that. So there are

				companies that do that. But we actually do what's called developing in the

				open. So our main Git repository that stores our source is public. Which means

				that as soon as you put in a commit, or even if you just put it up for code

				review, that's public. Everyone on the internet can see what we're doing live,

				which is really pretty interesting in terms of its effects on you. So for

				example, if you're in - you're working inside google3, and you're like, I have

				this really cool, wild idea, I'm going to go and make an experimental branch

				and just make a prototype of it and see what happens, you can just go do that.

				It's not a problem. But if you're working in Chromium, and you go and make your

				wild prototype experimental branch, you have a pretty good chance that

				someone's going to notice that. And then maybe you get a news story that's

				like, hey, Chromium might be adding this amazing feature. And you're like, oh,

				no, that was my wild, experimental idea. I didn't intend for this to happen.

				But now people have really picked up on it, and people outside of the company

				that you've never met are starting to get excited about something that you

				never really intended to build and just wanted to try. So it's a different way

				of working. You're sort of always in the public eye a little bit. And you want

				to be a little bit more considerate about how something might look to people

				way outside of your team and outside of your context. Whereas teams that are

				inside google3 I don't think have to think about that as much.

				19:07 SHARON: Yeah, I mean, for me, I've only really worked in Chromium full

				time and all that. And I've just gotten used to the fact that all of my code

				changes are fully public and anyone can look at them. Whereas I think people

				who work in anything that's not like that - people in the company you work, I

				can see it. But not just anyone out there. So I don't know. I've gotten used to

				it, but I think it's not a typical thing to [INAUDIBLE].

				19:30 ELLY: Oh, yeah. Absolutely. And in fact, this is something that folks who

				are transferring into Chrome from other parts of Google sometimes have a little

				difficulty with, is if you're used to writing a commit message where maybe the

				only description in the commit message is go/doc about my project, for Chromium

				that doesn't fly because only Googlers can actually follow those links. And so

				the commit message to a non-Googler doesn't say anything. And so you actually

				have to start thinking, how am I going to explain this whole thing I'm doing to

				a non - to a person who doesn't have any of this Google-specific context about

				what it is. You go through this little mental - you cross this little mental

				bridge where you actually are forced to reframe your own work away from, what

				are Google's business goals, and towards, how does this fit Chromium, the

				open-source project, that other people also use? It's interesting and

				occasionally a little frustrating, but interesting and usually really

				beneficial.

				20:26 SHARON: Yeah, for sure. And I think from people I've talked to, it just

				seems like another, briefly, difference between internal Google stuff and

				Chromium is that internal Google just has a ton of tools you can use.

				20:37 ELLY: Yes, absolutely.

				20:37 SHARON: Which both means a lot of things that are maybe a bit challenging

				in Chromium are probably easier, but also maybe finding the right tool is hard.

				But -

				20:42 ELLY: Oh, yeah. That is very much the case. I have only limited

				experience working inside google3. But I definitely have experienced the

				profusion of tools and also the fact that the tools are just honestly amazing.

				And it makes total sense. Google has many, many engineers whose whole job is to

				build great tools. And Chromium is just not that big of a project. We just

				don't have that many folks that are working on it. The folks who do build

				infrastructure work for Chromium do amazing work, but there's not hundreds of

				them. And so it's not on the same level.

				21:12 SHARON: Yeah. And what you said earlier makes me have - gives me - has -

				makes me wonder - and this ties us into the next thing - of other open-source

				projects, they just do a release, and they don't maybe do development in the

				open. And having not actually worked on other open-source projects really, I

				kind of assumed that this development in the open was the norm. So how common

				do you think or you know that that practice is?

				21:45 ELLY: Gosh, I would really be guessing, to be honest with you. But I

				would say the development in the open is by far the norm these days. And when

				you see projects that follow the big release model instead, the way that looks

				is they'll be like, hey, version 15 is out, and here's the source for

				version 15. You can look at it. But the development, as it happens, happens

				internally. I would tend to associate that with being maybe big company

				projects that have a lot of confidentiality concerns. So for example, if you're

				building the software that goes with some cool, new hardware for your company,

				you don't want to start checking that software into Git publicly because then

				people are going to read it and be like, ooh, this has support for a

				billion-megapixel camera. That must be coming in the new thing. And so I think

				that the big release model might be, these days, more prevalent when people are

				doing hardware integrations, where there's other components that are shipping

				at a fixed time and you don't want your source to be open until that point. But

				honestly, the developing in the open model is, I think, much more common these

				days. Historically, back in the '70s and '80s, when you would buy an operating

				system and it would come with source, that was just a thing that you got as

				part of the package, then it was much more of the source is released with the

				OS model. Whereas these days, because distributed development is so easy with

				modern version control systems, it's just so common to just develop in the open

				like we do.

				23:11 SHARON: Oh, cool. I didn't know that. So compared to other open-source

				projects, what are some similarities and differences that Chromium has to

				others that you may be familiar with?

				23:25 ELLY: Ooh. All the ones I'm familiar with are quite a bit smaller than

				Chromium. And so it's going to be hard to talk about it because, frankly -

				23:32 SHARON: That's probably the common difference, though, right? Probably

				very few are as big as Chromium.

				23:32 ELLY: Oh, yeah. So in particular, one of the hardest problems in open

				source - in running an open-source project is managing how humans relate to

				other humans. The code problems are often relatively easy. The problems of how

				do we make decisions about the direction of a project that maybe has a hundred

				contributors who speak 10 different languages across a dozen time zones, that's

				a hard problem. And so I often talk about the idea between open source, open

				development, and then open governance. And so open source is just, like, you

				can see the source. Open development is you can see the development process. So

				the Git repo is open. The bug tracker is open. The mailing lists, where we do a

				lot of our discussion, are open. So we do open development. But then you have

				this next step of open governance, where the big decisions about where the

				project is going are made in the open. And for Chromium, some of those are made

				in the open, especially when it's really about the web platform or that kind of

				thing. But some of them are not. For example, if we're deciding that we're

				going to do some cool new UI design, that design and the initial development of

				it might not necessarily be - or sorry, the development would be done in the

				open, but the designing of it might not. That might be a discussion between a

				few UX designers who all work at Google in a Google internal place. And so

				Chromium has a bit of open governance but not all the way. A lot of smaller

				projects have super open governance. So they'll literally be like, hey, should

				we rewrite this entire thing in Rust? And they'll make that decision by arguing

				about it on a mailing list, where everyone can see. And that's totally, totally

				fine. Because Chromium is so big, we can't make those kinds of decisions by

				having every Chromium engineer have their opinion and just post. It would be

				complete chaos. And because we're big and prominent, a lot of the work that we

				do is very much in the public eye. And so even discussions that are maybe

				relatively speculative - like that example I gave before, where you have an

				idea and you're like, wouldn't it be neat if we did this? It's easy for that to

				turn into people inferring what Google's intentions are with Title Case, like,

				Big Important Thing, and turning that into a lot when you would not have

				intended it to be that way. And so we do end up keeping our governance

				relatively on the closed side compared to other open-source projects I've

				worked on. Other than that, in terms of engineering practices and what we do to

				get the code written, we uphold a super high standard of quality. And in

				particular - which is not to say that most open-source projects don't, because

				they totally do. But Chromium, in my opinion, is really, really thoughtful

				about not just, hey, how should code review work, but really evolving stuff

				like, how should we bring new developers into this project? What should that

				feel like? Those are discussions that we have. And I often feel like those are

				discussions that other open-source projects don't talk about as much. What else

				is different for us? I'm not sure. I think that those are some of the big ones.

				The differences in scale are such that it's almost hard to talk about. The

				difference between an open-source project that maybe has 5 contributors and one

				that has 500 is very, very large.

				27:07 SHARON: With the open governance thing you mentioned, something that that

				made me think of is maybe Blink Intents, where you submit a thing to a list and

				then that gets discussed. So that's part of the Chromium project, I think,

				right? That falls under that category.

				27:20 ELLY: Yep. Yep.

				27:20 SHARON: And so that's where, if you want to make a change to Blink, the

				rendering engine, you do this process of posting it to a list, and then people

				weigh in.

				27:25 ELLY: Yeah, absolutely. So Blink really does do open governance in a way

				that I, honestly, very much admire. Blink and the W3C and a lot of these groups

				that are setting standards for the internet do do open governance. Because,

				frankly, it's the only way for them to work. It would not be good or healthy

				for the web if it was just like, we're going to do whatever - whatever we,

				Google, have decided to do and good luck everyone else. That would be very bad.

				So yeah, Blink definitely does do open governance. But when it gets to things

				that are more part of the browsers' behavior and features, we tend to have the

				governance a little more closed.

				28:08 SHARON: Right. And I think an example of Blink being more open governance

				is the fact that BlinkOn is open to anyone to participate to. And that's the

				channel that we're posting this on right now. It just happened to make sense

				that I figured most of the audience who is watching Blink [INAUDIBLE] already

				are interested in these, too. So that's why - [INTERPOSING VOICES]

				28:27 ELLY: Yeah, absolutely.

				28:27 SHARON: And for people who may not have - may have found these videos

				that don't know about BlinkOn. That's what that is.

				28:34 ELLY: Yeah. And just in that vein of open governance for Blink,

				especially, there's also this idea of being a standard and then having things

				be compatible with it. So the web platform is a collection of standards. And

				other browsers have to implement those standards, too. And so for example, if

				we make up a standard that is very difficult or impossible for, like, Firefox

				to implement, that's not good. That's fragmenting the web platform. That's a

				bad thing. Whereas the Chromium UI, like how the omnibox works in Chromium, for

				example, isn't a standard. It doesn't matter whether Firefox or Edge or Opera

				or whoever have the same omnibox behavior as us, right? And so there's much

				less of a need to all agree. And instead, it's almost a little bit better to

				have some variety there so that users can get a little bit more of a choice and

				that collectively more things get tried in that vein. So there's places where

				agreement and standardization are really important. And then there's places

				where it's actually OK for each individual browser to go off on its own a bit

				and be like, hey, we thought of this cool, new way to do bookmarks. And so we

				have built this. And it doesn't matter whether the other browsers agree about

				it because bookmarks are not a thing that interoperates between browsers.

				29:44 SHARON: Yeah, that makes sense. So now let's talk about some of the

				actual details of what it's like to work on Chromium and make changes, write

				code, and new ideas. So I think you mentioned a few things, like bug tracking.

				That's all public, in the open, apart from, of course, security-sensitive

				things and other [INAUDIBLE] are hidden. What else is there? Code review - that

				was Gerrit. You mentioned that. So You can see all the comments that everyone

				leaves on everyone's changes.

				30:16 ELLY: Oh, Yeah. And for better or for worse, by the way. It's good to

				bear in mind that if you're like - you're going to type like a slightly jerk

				message to someone on a code review, that's going to be preserved for all time,

				and everyone's going to be able to see it.

				30:29 SHARON: Yeah. Yeah. Be nice to people. [CHUCKLES] Version control -

				that's Git. Probably people will know about that. Something that might be worth

				mentioning is that a lot of people who contribute to Chromium, and if you look

				at things like Gerrit and Chromium Code Search - that's also public, of

				course - looks a lot like Google internal code search, but obviously it's open

				source. So a lot of people have @chromium.org emails.

				31:00 ELLY: Yes.

				31:00 SHARON: So why are there separate emails? Because you can use at a

				google.com or a GMail or any email. So why have this @chromium.org email thing?

				31:05 ELLY: Yeah, so there's a few different reasons for that. So chromium.org

				emails are available to members of the project, which is a little bit

				nebulously defined, but it's definitely not just Googlers. And so there's a

				couple reasons why people like having those. So for some folks, it's sort of a

				signal that you are acting as a member of the open-source project rather than

				acting with your Google hat on, if you like. And so for example, I help run the

				community moderation team for Chromium. And so when I'm doing work for that

				team, I'm very careful to use my chromium.org account because I want it to be

				clear that I'm enforcing the Chromium community guidelines, which are something

				that was agreed upon by a whole bunch of Chromium members, not just Googlers.

				And so I'm not enforcing Google's code of conduct. I'm enforcing Chromium's

				code of conduct in my role as a Chromium project person. So sometimes you

				deliberately put on your Chromium hat so that you can make it clear that you

				are acting on behalf of their project. Some folks - and I'm also one of these

				folks, by the way - just happen to really be big fans and supporters of free

				software and of open source. And so if I have the choice between wearing my

				corporate identity and wearing my open-source project member identity, I might

				just wear my open-source project member identity and decide to actually

				contribute that way. And so a lot of the folks who've been on Chromium - or

				have been on Chrome, I should say, for a while, that's part of their reasoning.

				They joined because they were excited to work on something that was open. And

				so they have this open-source identity, this Chromium identity, that they use

				for that. There's a third factor, and this touches on one of the sometimes less

				pleasant parts of working in open source, which is our commit log and our bug

				tracker and all of that stuff are public. And what that means is that everyone

				on the internet can go see them. And that is often great, but it's occasionally

				not great. So for example, if you go and make an unpopular UI change, people on

				the internet know that that was you. And that might not be something that

				you're necessarily super ready to deal with. So for example, way, way, way, way

				early in my career, I made a change to Chromium OS because I was working - I

				was on the Chrome OS team as a brand Noogler. So this is I've been at Google

				maybe five or six months. I made a change to Chrome OS. Somebody happened to

				notice it and take issue with it. I don't even remember what the change was or

				the issue. But they happened to notice it and take issue with it. They showed

				up in our IRC channel, because we used IRC at the time, which was also public

				because the whole project was very open like that, and really just started

				yelling at me personally about it. And I'm like, this is not a cool experience.

				This is something that if this was a Google coworker of mine, I would be

				talking to HR about this. But it's actually just a random person on the

				internet. And so there are some folks who use their Chromium username as a

				little bit of a layer of insulation almost, where it's like, I want to work on

				this project, but I don't - maybe my Google username has my full name in it. I

				don't necessarily want every change I make to be done like that. And so if you

				don't do that, you can end up in a situation where you make a change, and then

				it's really attributed to you as though it was your personal idea and you did

				this bad thing. And that's not a risk that everyone wants to take as part of

				doing their work. And so sometimes people have a chromium.org account really

				because they want an identity that's separate from their Google account - that

				has a different name on it, that has different stuff like that. And so one of

				the things that I'm always cautious to remind folks of on my team is, if you're

				working with someone who has a chromium.org account, always use that

				chromium.org account when you're speaking in public, always, always, always,

				because you don't want to break that veil if someone is relying on it.

				35:09 SHARON: Right. Yeah, that makes sense. And I think, in general, whenever

				you are signing up for interacting in these public spaces, generally, I think

				it's encouraged to use your chromium.org account. So for example, Slack, which

				is the modern - current IRC often -

				35:27 ELLY: It hurts my soul to hear you say that.

				35:32 SHARON: Well - [LAUGHS]

				35:32 ELLY: I'm a die-hard IRC user. I've been using IRC for 30 years. And I

				was one of the few people who was I think very sad when we decided to move off

				IRC. But you're right, that it is the modern IRC option.

				35:44 SHARON: I think a lot of people are very die hard about IRC. So, you

				know, but modern or not, that's what's currently being used.

				35:49 ELLY: Absolutely.

				35:55 SHARON: So Slack is where anyone can join and discuss Chromium stuff. And

				generally, that kind of thing, you're encouraged to use your chromium.org

				account.

				36:01 ELLY: Yeah, absolutely. And to be fair to Slack also, the Slack has

				probably 30 times as many people in it as the IRC channel ever did. So I think

				that it's pretty clear that Slack is more popular than IRC was. But, yeah, no,

				we use our Chromium identities a lot, really, really on purpose. And to be

				honest, I would like it if we use them even more. Sometimes you will see folks

				who actually have both identities signed up. So they'll have their google.com

				and their Chromium, and that's always confusing for everyone. So if it was up

				to me, I would say everyone has a Chromium identity, and they'd just all use it

				when they're contributing.

				36:39 SHARON: Yeah, that's definitely one of these unique two Chromium

				[INAUDIBLE] pain points of someone [INAUDIBLE] use their maybe - often, they're

				the same for most people. But sometimes they're different. Sometimes they're

				very subtly different, and it's -

				36:53 ELLY: Absolutely.

				36:53 SHARON: you end up sending your [INAUDIBLE]...

				36:53 ELLY: I also - I have met a couple folks who the Google username they

				really wanted wasn't available, but it was available for chromium.org. And so

				they picked a shorter, cooler username for chromium.org, which is totally -

				totally fine to do. But then, every time you have to remember, oh, I know them

				by this longer Google username, but actually they use this shorter username for

				Chromium.

				37:13 SHARON: Yeah, you have to remember their real life name. You have to

				remember their work email. And then now you have to remember another work

				email.

				37:19 ELLY: Well, we have software that can help with that a bit.

				37:25 SHARON: Yeah, for sure. So as part of that, and that's, in a way - a

				thing that to me feels very related is there's a thing called being a committer

				in Chromium. So what does it mean to be a committer? And what does it entail?

				37:37 ELLY: Yeah, so committers are basically people who are trusted to commit

				to CLs, for want of a better way of putting it. So the way the project is

				structured, anyone can upload a CL. And anyone anywhere on the internet can

				upload a CL. It has to be reviewed by the OWNERS of the directories that it

				touches or whatever. But there are some files that are actually, like, OWNERS

				equals star. So for example, the build file in Chrome browser, because

				everybody needs to edit it all the time, it just has OWNERS equal star. And

				there's a comment that's like, hey, if you're making a huge change, ask one of

				these people. But otherwise, you're just freely allowed to edit it. And so if

				the committer system didn't exist, anyone on the internet would be allowed to

				edit a bunch of parts of the project without any review, which is pretty bad.

				And so there's this extra little speed bump where it's like, you have to send

				in a few CLs to show that you're really a legit person who's contributing to

				the project. And once you've done that, you get this committer status, which

				actually allows you to push the button that makes Gerrit commit your change

				into the tree. And that's what it does mechanically. We culturally tend to have

				it mean something a little different than that, but it's - culturally, it's

				like a sign of trust of the other project members in you. So getting that

				committer status really means, we collectively trust you to not totally screw

				things up. That's what it is. And so you have to be a committer to actually be

				in an OWNERS file, for example. You can't be listed as an owner until you're a

				committer. Because if you're not a committer yet, we're not really - if we're

				not trusting you to commit code, we're not really going to trust you to review

				other people's code. And, yeah, when you're new joining the project, it's

				actually a pretty big milestone to become a committer. You become a committer

				after you've been working for anywhere from three to six months, I would say.

				And it's definitely this moment of being like, yeah, I've really arrived. I'm

				no longer new on the project. I'm now a full committer.

				39:51 SHARON: Can you briefly tell us what the steps, mechanically, to becoming

				a committer are?

				39:51 ELLY: Yeah, so you need to have landed enough CLs to convince people you

				know what you're doing. And there is no hard and fast limit, but it's like - it

				should be convincing. And so I often hear maybe 15 to 20 of nontrivial CLs is a

				pretty good number. Having done that, you need someone to propose you or

				nominate you for committership. So there's actually - there's a mailing list

				for having these discussions. And so whoever's going to nominate you, who has

				to already be a committer, they'll send mail to that list, basically being

				like, I would like to nominate this person for committer. There's a comment

				period during which people can reply. And then if there's nobody who is raising

				a big objection to you being a committer, after - I don't know what the actual

				time period is - but after some amount of time, the motion carries with no

				objections, and then your Chromium account becomes a committer. I think Google

				accounts can also be committers as well, but I've only ever done this process

				for Chromium accounts. And so those threads - what's going on in those threads

				is mostly people endorsing the request. So let's say that I have someone who's

				new on my team who I want to propose as a committer. I'll start the thread

				nominating them as a committer, and then I'll go and talk to maybe two or three

				of the people who have reviewed a lot of their changes, and I'll be like, hey,

				would you endorse this person for a committer? If so, please post in this

				thread. And so in the thread, there will actually be a couple of replies that

				are like, plus 1, or, yes, this seems like a good fit. Very rarely, there might

				be a reply, which is like, hey, I saw some - I saw some stuff on this CL that

				shows that maybe this person isn't quite ready. We had a whole bunch of back

				and forth comments, and eventually it really didn't seem like they understood

				what I was asking for. And I feel like they're not really ready yet. Sometimes

				that will happen. But usually the threads - by the time someone's nominating

				you, you're already in good shape. So that's the mechanical process. And then

				there is - it might actually just be Eric, individually, who goes through and

				flips the bits on people being committers based on the threads. I'm not sure.

				But there's some process by which those threads turn into people being

				committers.

				42:14 SHARON: OK, cool. Is there an analog of this either internally at Google

				or in other open-source projects? Because internally at Google, there's the

				concept of readability, which means you are vouched for that you know how to

				code in this one language, which has some similarities. That's maybe a similar

				thing. Are there any similar notions in other projects you've seen?

				42:38 ELLY: Yeah, so many projects have this notion of being a member. And that

				often combines our notions of committer and sometimes code owner. And so they

				might - or for some open-source projects, you'll actually hear "maintainer" as

				the thing. And so they'll be like, only people who are project members can

				upload changes in the first place. And only people who are maintainers can

				merge those changes. So that little speedbump on entry is pretty common.

				Because it's a fact of life that if you are on the public internet and you have

				no barriers to entry, you're going to have spam in your community no matter

				what you do. And so that kind of split is super, super common. For some

				projects that don't do open development, the entire thing might happen inside a

				company or inside an organization anyway. And then there is no notion of

				committer status because you're just hired onto that team and then you can

				commit. But for projects that do open development and free software projects,

				there is often a sense of, these are the people who are roughly trusted to land

				code. And for a lot of projects, especially bigger ones, there's actually a

				two-tiered model, where maybe you have people who are domain experts on a

				specific thing, like, they maintain some subsystem. And they're trusted to make

				whatever changes they need or approve other people's changes in that area. But

				then at the wider scale, there's what's often called a steering committee or a

				core group or something. And those groups have authority over the whole project

				and the direction of everything that's going on. And so you'll often see that

				kind of model in larger projects. At smaller scales, it's often literally a

				list of one to five people who all have commit access to the same Git repo, and

				there's no - no structure on top of that. But for bigger projects, governance

				becomes a real concern. And so people start thinking about that.

				44:35 SHARON: All right. Now, let's switch topics to talking about the more

				day-to-day logistics of working on Chromium. So if you're not a Googler, don't

				work at Google, to what extent can you effectively contribute to Chromium, the

				project?

				44:48 ELLY: Yeah, so that depends where you're coming from, both whether you're

				part of another large organization, like maybe you work at Microsoft, you work

				at Opera, Vivaldi, one of those companies, or if you're really an IC lone

				contributor. If you're in a large organization, probably your org will have its

				own structure around how you should contribute anyway. And so you might just

				want to talk about that. So I'll really focus on the individual contributor

				angle. And so for engineers specifically, like if you're a programmer who wants

				to contribute to the code base, that's awesome. The best approach I think is

				really to find an area that you're passionate about because it's so much more

				fun and enjoyable to contribute when you're doing something you care about. So

				find an area you care about. Get in touch with the team that works on that

				area, either through their mailing lists or find their component in Monorail or

				find them in the OWNERS files or whatever. Get in touch with those folks. Ask

				them what are good places for you to contribute as a new person. That's often a

				really great way to get started. And you'll have a person you can go to for

				advice to be like, hey, how do I go about doing this thing? My experience has

				been that Chromium contributors are pretty much all super helpful. And so

				they're very willing to just give you guidance or do whatever. And you'll then

				know who to send your code reviews to.

				46:01 SHARON: Cool. Yeah. And if you're not an engineer, what are some ways you

				can also contribute?

				46:06 ELLY: Yeah, so there's a whole bunch of these. And by the way, these all

				apply to basically every open-source project, so not just Chromium

				specifically. So open-source projects, if you are a good writer, if you enjoy

				doing technical writing or you enjoy doing UX writing or you want to do that

				kind of thing, almost every open-source project out there is looking for people

				to contribute documentation. And Chromium is no exception at all to that. So

				high-quality documentation, we love that stuff. Or even if you're just honing

				that craft and you want to practice, Chromium is not a bad spot to do that. If

				you're a UX designer or a visual designer, a lot of open-source projects will

				actually appreciate your contributions of you bringing in, like, hey, I thought

				of a way that this user experience could feel or how the screen could look or

				something like that. They'll often appreciate that kind of input or design

				work. If you are someone who speaks multiple languages, translations are

				another great way to contribute to open-source projects. A lot of open-source

				projects don't have access to the same kind of - Chromium has access to a

				translation team within Google who do a lot of our translations. A lot of

				open-source projects don't have that. And so contributing translations of

				documentation, of user-facing interface, stuff like that, can be super

				valuable. And the last thing I'll say, which can be done by really anyone - you

				don't even need special skills for this one - is try early releases of stuff.

				So try development branches. If you're a Chrome user, try running Beta or Dev

				or Canary. And then when something doesn't feel right or when it's - when it

				doesn't work for you or it crashes or whatever, file bugs. And try to get

				practiced at filing good bugs, with details and info and steps to reproduce the

				bug and stuff like that. That's such a huge help as a developer of any

				open-source projects - to get that early-user feedback and be able to correct

				problems before they make it to the stable channel. And on Chromium, I've run

				into a few folks who just - their main contribution to the project is really

				just that they file great bugs all the time. There's a few folks who all they

				really do is they run Canary on Mac, and they notice when something doesn't

				feel quite right. And so they file stuff that's like, maybe the engineering

				team wouldn't necessarily have noticed it. But when someone calls it out, we're

				like, oh, that actually does feel kind of janky, and now we can go fix that.

				And getting that feedback early is so, so valuable. So there's a lot of

				different ways. Those are some, but there's plenty more, too.

				48:21 SHARON: OK. Cool. Yeah, and a few things on that. If you want to really

				try out random things, you can go to chrome://flags, play around there, see

				what happens. In terms of going back a bit for being an engineer, there's other

				web-adjacent stuff that you can do that we won't get into too much now. But

				that can be things like adding web platform test, web standard stuff. And for

				people who are into security, we have a VRP, Vulnerability Rewards Program. But

				if you know about that, probably you're into the whole security space. This is

				not how you're going to - maybe this is how you heard about it, and you want to

				get into it. But, anyway -

				48:59 ELLY: Yeah. I will say, if you're a security researcher and you aren't

				familiar with the Chromium VRP, you should go take a look because it's -

				Chromium is a really interesting project to audit for security. And the VRP can

				make it very worth your while to do so if you find good bugs.

				49:12 SHARON: Mm-hmm. Yeah. And going back a bit earlier to being an engineer,

				like an IC, who is not at Google or any of these other big companies, there are

				other barriers to entry to being a contributor, right?

				49:28 ELLY: Oh, yeah.

				49:28 SHARON: So I definitely encountered this after my internship. I worked on

				Chrome. I was like, hey, I know what's going on now at the end of it. A couple

				things we didn't finish. I'll go home, and I will keep working on this - good

				intentions. And I got home, got my laptop, which was a pretty good laptop, but

				still a laptop. I downloaded Chrome. That took a very long time. I built it for

				the first time, which always takes a bit longer. But that took so long. And

				even the incremental builds just took so long that I was like, OK, this is not

				happening. I'm in school right now. I've got other things to worry about. So

				how feasible is it for a typical person, let's say, to actually make changes in

				Chromium?

				50:05 ELLY: Yeah, that is unfortunately probably the biggest barrier to entry

				for individuals who want to make technical contributions. Obviously, it doesn't

				affect you if you're contributing documentation translations, whatever. But if

				you're trying to modify the code, yeah, the initial build is going to be very

				slow, and then the incremental builds are going to be very slow. And a lot of

				the ancillary tasks are slow too, like running the test suite or running stuff

				in a debugger. The project is just very big. And that's something that I think

				a lot of folks on the Chromium team wish we could reduce. But Chromium is big

				because the web is big and because what people want it to do is big. And so

				it's not just big for no reason. But it does make it harder to get started as a

				contributor. I've had this experience, too. I have a modern laptop sitting on

				the desk over there. And it takes seven to eight hours to do a clean Chromium

				build on that. Whereas on my work workstation, which has access to Goma,

				Google's compile farm, it takes a few minutes. And the large organizations that

				contribute also all have compile farms for the same reason. It's just so slow

				to work when you're only doing local building and don't have access to a ton of

				compilation power.

				51:12 SHARON: Mm-hmm. Yeah. I wonder if we could, I don't know, do a thing for

				people who are individuals who contribute more. Probably that would be really

				hard to do. Probably people have thought about it. But, yeah.

				51:24 ELLY: It would be nice if we could. I don't know what the challenges

				would be offhand, but it would be very cool if we could somehow make that

				available.

				51:30 SHARON: All right. That all sounds very cool. I know I learned a lot.

				Hopefully some of you learned a lot, too. I think if you are working within

				Google, it's really easy to not really interact with any of this more

				open-source stuff, depending on which part you work on. Maybe you work on a

				part that's very Google Chrome specific. I know before I was working on

				Fuchsia, so that was before Launch. So that was not really something we were

				open to the public about anyway. And a lot of even the typical Chrome tools I

				was unfamiliar with. So I think depending on which part you work on, this

				stuff - it's all there, but you might not have had a chance to interact with.

				So thank you, Elly, for telling us about it and giving us some context about

				free and open-source software in general.

				52:08 ELLY: Yeah, of course.

				52:08 SHARON: Is there anything you would like to give a shout out? Normally,

				we shout out a specific Slack channel. I think in this case, the Slack in

				general is the shout out. Anything else?

				52:20 ELLY: The Slack, in general, definitely deserves it. Honestly, I'm going

				to go a little bit larger scale here. I'm going to shout out all of the folks

				who have contributed to Chromium, both at Google and elsewhere. It is the work

				of many hands. And it would not be what it is without the contributions from

				the folks at Google, the folks at Microsoft, folks at Yandex, folks at Naver.

				All of these different browsers and projects and all of the different

				individuals that have contributed, like everyone in the AUTHORS file - so shout

				out to all of those folks. And also, I really want to shout out the open-source

				projects not even part of Chromium that we use and rely on every day. So for

				example, we use LLVM, which is a separate open-source project for our

				compilation toolchain. And I think I would not be exaggerating to say that

				Chromium couldn't exist in its current form without the efforts of a bunch of

				other open-source projects that we're making use of. And so I'm really hopeful

				and optimistic that Chromium can live up to that. We're standing on the

				shoulders of a lot of other open-source projects to build the thing that we've

				built. And I'm hopeful that, in turn, other projects are going to stand on our

				shoulders to build yet cooler stuff and yet - yet better programs and build a

				yet better open-source community. So shout out to all of the authors of all the

				open-source software that Chromium uses, which is a lot of people. But they

				deserve it.

				53:37 SHARON: Yeah, for sure. It's very cool how it's very - all very related.

				And even within Chrome, I think people stick around longer than typical other

				projects. And it's cool to see people around, like a decent number of them,

				from before Chrome launched. And that's probably [INAUDIBLE] to a generally

				more positive engineering culture. So that's very good.

				53:58 ELLY: I think so. But I'm biased, of course.

				53:58 SHARON: Yeah, maybe. [LAUGHS] Cool. You mentioned mailing lists a bunch.

				Any favorites that you have?

				54:08 ELLY: Oh, yeah. chromium-dev is the mailing list of my heart, I would

				say. It's the main open-source development mailing list for us. It's a great

				place for all of your newbie questions. If you're just like, how the heck do I

				even check out the source, that's a good place to ask. The topic-specific

				mailing lists, especially net-dev and security-dev, are really good if you have

				questions in those specific areas. But honestly, all of the mailing lists on

				chromium.org are good. I haven't yet encountered one where I'm like, that

				mailing list is bad. So check them all out.

				54:33 SHARON: Cool. All right. Check out every single mailing list. Sounds

				good.

				54:38 ELLY: Yeah, every mailing list, every Slack channel.

				54:38 SHARON: All right. Great.

				54:38 ELLY: You're all good.

				54:38 SHARON: Every Slack channel, I think - yeah, I'll add myself to the rest

				of them. All right. Well, thank you very much, Elly.

				54:45 ELLY: Of course.

				54:45 SHARON: Thank you for chatting with us. And see you all next time.

				54:51 ELLY: All right. Thank you, Sharon. Easter egg - in the second part of

				this video, Elly is drinking soda.

1057

docs/transcripts/wuwt-e07-mojo.md Normal file

File diff suppressed because it is too large Load Diff

1091

docs/transcripts/wuwt-e08-processes.md Normal file

File diff suppressed because it is too large Load Diff

									
										691

docs/transcripts/wuwt-e09-site-isolation.md
									
										Normal file
									
				@ -0,0 +1,691 @@

				# What’s Up With Site Isolation

				This is a transcript of [What's Up With

				That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)

				Episode 9, a 2023 video discussion between [Sharon (yangsharon@chromium.org)

				and Charlie (creis@chromium.org)](https://www.youtube.com/watch?v=zOr64ee7FV4).

				The transcript was automatically generated by speech-to-text software. It may

				contain minor errors.

				---

				Site Isolation is a major part of Chrome's security. What exactly is it? How

				does it fit into navigation? What about security? Today’s special guest telling

				us all about it is Charlie, who made it happen. He's also worked all over

				navigation, making sure it works with all its complexities and remains secure.

				Notes:

				- https://docs.google.com/document/d/19LTLcwd2_JfiIklPXY0yu0ktpy-p8za2ZZXXzqBBVIY/edit

				Links:

				- [What's Up With Processes](https://www.youtube.com/watch?v=Qfy6T6KIWkI)

				- [Life of a Navigation](https://www.youtube.com/watch?v=OFIvyc1y1ws)

				---

				0:00 SHARON: Hello, and welcome to "What's Up With That?" the series that

				demystifies all things Chrome. I'm your host, Sharon, and today we're talking

				about site isolation, what exactly is it? How does it fit into navigation? What

				about security? Today's special guest telling us all about it is Charlie. He

				helped make site isolation happen. He's worked on Chrome since before the

				launch, though as an intern, and since then, he has worked all over navigation

				including things like the process model, site isolation, and just making sure

				that changes to that are all secure and that things still work. So welcome,

				Charlie.

				0:30 CHARLIE: Thank you for having me.

				0:30 SHARON: OK, let's start off with what is site isolation?

				0:36 CHARLIE: So site isolation is a way to use Chrome's sandbox to try to

				protect websites from each other. So it's a way to improve the browser security

				model.

				0:43 SHARON: OK, we like security. And can you tell us a bit about what a

				sandbox is?

				0:50 CHARLIE: Yeah. So sandbox is a mechanism that tries to keep web pages

				contained within the renderer process even if something goes wrong. So if they

				find a bug to exploit, it should still be hard for them to get out and install

				malware on your computer or do things outside the renderer process.

				1:05 SHARON: OK. Last video, we talked all about the different types of

				processes and what they all do. So why are we particularly concerned about

				renderer processes in this case?

				1:17 CHARLIE: Sure. So renderer processes really have the most attacked

				surface. So browser's job is to go out and get web pages from websites you

				don't necessarily trust, pull down code, and run that on your machine. And most

				of that code is running within this sandbox renderer process. So an attacker

				may be able to run code in there and try and find bugs to exploit. The renderer

				process is where most of those bugs are going to be. It's where the attacker

				has the most options and direct control. So we want that to be locked down as

				much as possible.

				1:55 SHARON: OK. Right. So how exactly does this work? How am I getting

				attacked?

				2:02 CHARLIE: Right. So all software tends to have bugs, and an attacker will

				try to find ways to exercise those bugs in the code to let them accomplish

				their goals. So maybe they find that there's some parsing error, and so the

				code in the web browser does the wrong thing when you give it some input. And

				for an attacker on the web, that input could be something in HTML or JavaScript

				that makes the browser do something wrong, and maybe they can use that to their

				advantage.

				2:36 SHARON: So say I do get attacked. What's the worst that can happen? Should

				I really be concerned about this?

				2:42 CHARLIE: Well, that's exactly what we think about in the browser security

				model is, what's the worst that can happen? How can we make that not be as bad

				as it could be? So in the old days when browsers were first introduced, it was

				basically just a program, it's all one process. And it would fetch content from

				the web, and so if something went wrong, there was no sandbox. There was no

				other protection. You were just relying on there not being bugs in the browser.

				But if something did go wrong, that web page could then install malware in your

				computer and your whole machine would be compromised. And so that might give

				them access to files on your disk or other things that you have access to on

				the network like your bank account or so on, which, obviously, is a big deal.

				3:28 SHARON: Right. Yeah, it would like to not have other people have that. OK,

				cool. So can you tell us a bit about how site isolation actually works? What is

				the mechanism behind it? What is going on?

				3:41 CHARLIE: Sure. So when Chrome launched, we were using the sandbox to try

				and prevent that first type of attack of installing malware in your machine or

				having access to the file system or to network, but we wanted it to do more to

				protect websites from each other. And to do that, you have to treat each

				renderer process like it can only load pages from one website. And if you go to

				visit a different website, that should be in a different process. And so

				there's a bunch of aspects of site isolation for, well, OK, as you go from one

				website to another, we need to use a different process, but the big one that

				made this such a large change to the browser was making cross-site iframes run

				in a different process.

				4:30 SHARON: What is an iframe?

				4:30 CHARLIE: So an iframe is basically a web page embedded inside of another

				web page. So you can think about this as an ad or a YouTube video. It might be

				from a different origin from the top level page that you're viewing, but it's

				another web page embedded inside it. And so that has a different security

				context that it's running on.

				4:54 SHARON: You mentioned it might be from a different origin, and it might be

				useful to know what the difference between a site and an origin is, especially

				as it relates to what we call site isolation.

				5:00 CHARLIE: Yeah, so we're being specific in using the word site isolation

				instead of origin isolation. A site is a little broader, so it's a registered

				domain name plus a scheme, so https://example.com would be an example of a

				site, but you might have many origins within that as you get into subdomains.

				So if you had foo.example.com and bar.example.com, those would be different

				origins within the example.com site. Web security models all about origins.

				Those foo.example.com and bar.example.com shouldn't be able to access each

				other, but there are some old web APIs that have stuck with us like being able

				to modify something called document.domain, where two different origins in the

				same site can sometimes access and modify each other, and we don't know in

				advance if they're going to do this. So therefore, we have to put everything

				from a site in the same process because we can't move things from one process

				to another later. We hope that someday we can get rid of that. There is some

				work in progress for that to go away. Maybe we can do origins.

				6:10 SHARON: Cool. So the site isolation stuff is all in the browser, so that's

				the browser security model. What's the difference between that and the web

				security model? Are these the same?

				6:16 CHARLIE: They're certainly related to each other, but they're a little

				different. So the web security model is conceptually what can web pages do, in

				general, what are they allow to access for another website or for another

				origin or for things on your machine, camera, and microphone, and things like

				that. And the browser security model is more about how we build that and how do

				we enforce the web security model, but also, provide some extra lines of

				defense in case things go wrong. So that incorporates things like the sandbox,

				the multi-process architecture, site isolation. What can we do to make it

				harder for attackers to accomplish their goals, even if there are bugs.

				7:04 SHARON: It seems like good stuff to have. So a couple other, maybe

				definitions to get through. So what is a security context?

				7:10 CHARLIE: Yeah. So that's the environment where this code is running. In

				the web, it's something like an HTML document or a worker, like a service

				worker, someplace where code is running from what we would call security

				principal, which is, for the web, something like an origin. So if you have an

				HTML document you've gotten from example.com, that's running in a web page in

				the browser that has a security context. And an ad from a different origin

				would be a different security context.

				7:49 SHARON: And a security context and security principal always the same, or

				are there times where those are different?

				7:55 CHARLIE: No, you can have two different security contexts, like two

				different documents that had the same security principal, and they might be

				able to access each other. Or they might be living in different processes, but

				still have access to the same cookies or local storage, things on disk. So the

				principal is, this is the entity that has access to something.

				8:16 SHARON: When people think of site isolation, often, they think about

				navigation as well, partly because that's how our teams are structured, so how

				exactly do these relate, and where in the life of a navigation - name of a

				talk, want to go watch - does site isolation stuff happen?

				8:34 CHARLIE: Yeah, so they're definitely related. So navigation is about how

				you get from one web page to another, and that might be a different security

				context, different security principal. And I got interested and involved with

				navigation because of site isolation, my interest in that. And as you think of

				the web browser as an operating system for running programs, it's how you're

				getting from one program to another. So it would make sense that as you go from

				one website to another, you get a new container for that, a new process. So

				that was one part of how I got involved with navigation was building what we

				call a cross-process navigation. So you have to start in one renderer process

				and then be able to end up in a different renderer process with all the various

				parts of the life of a navigation, where you go out to the network and ask for

				the web page. And maybe you have to run some - before, unload events first to

				see if you were actually allowed to leave, or maybe the user has some unsaved

				data. All the timing of that is tricky, and then switch to the new process at

				the right time. So navigation has a lot of different corner cases and

				complexity that then get involved with the process model so that you can do

				this in any type of navigation, in any frame. And so that's where our team ends

				up involved in both site installation work and the navigation code and the

				browser.

				10:06 SHARON: Right. What a cool team. So you mentioned the process model, and

				that is related, but not the same as the multi-process architecture. So let's

				just quickly mention what the differences there are, because in this case, it

				is important.

				10:22 CHARLIE: Yes. So the process model for the browser is how we decide what

				goes into each process, and specifically, we're talking about renderer

				processes and web pages here, where we can decide, as we create new tabs and we

				visit websites on those tabs which renderer processes are we going to use. So

				without site isolation, maybe it's that each newly created tab gets its own

				process. But anything you visit within a given tab stays in the same process.

				Or maybe you can do some cross-process transitions within that tab as long as

				you're not breaking scripting between existing pages. So site isolation defines

				a process model that says you can never put web pages from two different

				websites in the same renderer process, and then that provides a bunch of

				constraints for how navigation works.

				11:16 SHARON: And then the multi-process architecture is more just the fact

				that we have these different processes.

				11:22 CHARLIE: Right. It makes this possible, because it gives us this ability

				to run browser code and renderer code separately and plug-in code and other

				utilities and network service that - yeah.

				11:27 SHARON: Yeah, because back in the day, that wasn't the case. That's what

				made Chrome different.

				11:34 CHARLIE: Right. So when Chrome launched, we were moving from this more

				monolithic browser architecture that was common at the time, where everything

				ran in one process to separate browser process, renderer process that was

				sandbox, and we could play around with different process models. So when Chrome

				launched, part of the internship that I was doing was looking at what should go

				in each renderer process? What process model should we use? And we thought site

				isolation would be great, but you can't really do that yet. It's too

				complicated to get the iframe things to work. So maybe we can do a hybrid where

				sometimes we swap to a new renderer process as you go from one website to

				another at the top level, but then other times, you'll end up with multiple

				sites in the same process. And it was like that until we were able to ship site

				isolation much later.

				12:23 SHARON: Cool. So this sounds, conceptually, like it makes sense. You want

				to have different sites/different origins in different renderer processes, and

				it sounds like it shouldn't be that hard, but it is/was/still is very hard. So

				can you briefly just tell us about how and why navigation is hard? Because

				other people who don't work on browsers at all or tech or even people in

				Chrome, I feel like, they're just like, isn't navigation just done? This just

				works, right? So why is there still a team doing this, and what is so hard

				about it?

				12:59 CHARLIE: That was often the most common question we would get when we

				were explaining what work we were doing on site isolation was, oh, doesn't it

				already work that way? And it's like, yeah, I wish. Yeah, so there's two parts

				of that. There is, why is navigation hard, and why is site isolation hard? So

				tying into any kind of navigation thing is tricky because of how many different

				types of navigation and corner cases there are. As you're going from one page

				to another, is it redirecting to a different website, or does it end up not

				actually giving you a web page back? Maybe it's a download. Is it not moving to

				a new document at all and it's just a navigation within the same document,

				which has different properties. There's a lot of things that we need to keep

				track of in the navigation system and how it affects the back-forward history

				that makes it tricky. And then it continues to get more complicated over time,

				as we add new fancy features to the browser. So there's lots of things that

				we've layered on top of that with back-forward cache and pre-rendering and new

				navigation APIs for interacting with session history, which make things faster

				and nicer for web developers, but also, provide even more ways that navigation

				can get into interesting corner cases, like why didn't we think that about

				pre-rendering a page with a sandbox iframe that might cause a different path to

				happen? So that's where a lot of the complexity in navigation comes from and

				why there's ongoing challenges, even though it's something that seems like it

				has worked from the beginning. Site isolation being hard is related to the fact

				that you can navigate in any frame in a page, and iframes being embedded is

				something that we used to just handle entirely within the renderer process. So

				this is a fun way to think about the multi-process architectures that shipped

				around when Chrome was launched and then other browsers that did similar things

				was we could take the rendering engines that had existed already for a decade

				or so from existing browsers and just run multiple copies of them. So as you

				open up a new tab, we've got another copy of WebKit, which is the rendering

				engine we were using at the time, and we had to make changes to make it work in

				the renderer process talking to the browser process, but we didn't really need

				to change fundamentally how it rendered a web page. And so it was in charge of

				deciding what network requests it was going to make for getting iframe content

				and then rendering the iframe and where a click was going to go, that kind of

				thing. And to do out-of-process iframes, you need the iframe inside the page to

				be rendered in an entirely separate renderer process. And that is a big change

				to how the rendering engine works. And so that was what took all the time and

				what made site isolation a multi-year project, where we had to fundamentally

				introduce these new data structures, like render frame host and representations

				of each frame in the browser process, change how the rendering engine worked,

				and then change all the features in the browser that assumed the renderer would

				take care of this. And now, we need to handle them spread across multiple

				processes.

				16:28 SHARON: How did that fit in with the forking of WebKit into Blink, which

				is what the rendering engine in Chrome is now?

				16:34 CHARLIE: Yeah, so the fork was absolutely necessary to do this. We pretty

				much had to wait until that happened, because we didn't have as much

				flexibility to make large, architectural changes to WebKit as we were sharing

				it with other browsers, like Safari and so on. We were looking into ways that

				we might be able to of approximate what we want, but as the decision to fork

				WebKit into Blink was made, it opened the door and gave us a chance to say, we

				can do this now. Let's go ahead and dive in and make site isolation happen.

				17:14 SHARON: That makes sense. In a quite early talk, it was probably from 10

				years ago now, Darin gave a talk, and he was saying how having per site, having

				each renderer have just one site in it was like the Holy Grail, and he seemed

				very excited about it. So that makes sense because of the -

				17:34 CHARLIE: Yeah, and it feels like the natural use of a sandbox in a

				browser. The same reason that we got all these questions, like isn't that how

				it already works? Is that it's such a natural fit for we have a container for

				running a web page, what is this unit that you want to put in the container?

				It's a website that you're visiting. And the fact that we couldn't easily pull

				them apart into different processes was totally an artifact of how web browsers

				were originally built that didn't foresee this - oh, they're being used as

				complicated programs with different security principles.

				18:13 SHARON: Yeah, in a different talk, John from Episode 3 content had

				mentioned that site isolation was basically the biggest change to Chrome since

				it launched and probably is still the case. So yeah, it was a project.

				18:29 CHARLIE: Yeah, it was a long project, and we had a lot of help from many

				people across the Chrome team, but it was cool to get to this outcome, where we

				could then say, now we have processes that are locked to a single security

				principal, so it's nice to get to that outcome.

				18:47 SHARON: So for people on the Chrome team now, what do you wish they knew

				about site isolation/navigation in terms of as an engineer? Because before, I

				was on a different team, and someone on my team said, oh, you should know how

				navigation works. And I said, yeah, that sounds like a great idea, but how? So

				what are things that people should just keep in mind when they're out and about

				doing their stuff that usually isn't directly interacting with navigation even?

				19:14 CHARLIE: Right. Yeah, so I think that the biggest thing to keep in mind

				is to limit what we put into a renderer process or what a renderer process has

				access to, to not include cross-site data. And we already have to have this

				mentality in Chrome that we don't trust the renderer process. If it sends an

				IPC or Mojo call to the browser process, we should assume that it might be

				lying or asking for things that it shouldn't have access to. And I think it's

				in the back of a lot of people's heads already that, OK, I shouldn't let it

				like go get a file from disk, but also, we don't want it to mix data from

				different sites. It shouldn't be able to ask for something from - to lie and

				say, oh, I'm origin x, please give me data from there. Because that's often how

				APIs used to work in Chrome was, the renderer process would say what origin

				it's asking for, and please give me the cookie for that.

				20:12 SHARON: That sounds bananas.

				20:12 CHARLIE: Yeah. Now, it sounds crazy. And so we think that the browser

				process should already know based on who's asking what they have access to. So

				that's really the thing that, in order to avoid site isolation bypasses, that's

				what developers should keep in mind. So for features like Autofill or something

				where it's easy to think, oh it would be nice for me to just have that data on

				hand in the renderer process and I can just put it in when it's needed. No, you

				should keep it out of the renderer, and then only provide the data that's

				needed.

				20:51 SHARON: In security-discuss circles, another term you hear often is a

				renderer escape or renderer bypass or whatever. Is that the same as a site

				isolation bypass, or are those different?

				21:00 CHARLIE: Yeah, so sandbox escape is a common term that is used for when

				an attacker has found some bug already, and then they are able to escalate

				their privilege to affect the browser process or get out of the browser process

				and to the operating system. So a sandbox escape is a lot worse than a site

				isolation bypass. It would give the attacker control of your computer and

				installing malware and things. So sandbox escapes, we want to have as many

				boundaries as possible to try to prevent that from happening. A site isolation

				bypass is not as bad as a full sandbox escape, but it would be a way that an

				attacker could find some way to get access to another website's data or attack

				that website. So maybe it's able to trick the browser into giving it cookies

				from that site or using the permissions that have been granted to another

				website. And then renderer compromise would be another type of exploit that

				happens entirely within the renderer process. That's one where the attacker has

				found some bug, they can run whatever native code they want within the renderer

				process, and that's what we're trying to contain with the sandbox and what site

				isolation tries to make even less useful to the attacker. Because even if you

				can run any code you want within the renderer process, you shouldn't be able to

				install malware because of the sandbox, and you shouldn't be able to access

				other site's data because of site isolation

				22:47 SHARON: Yeah, I think when I was learning about site isolation and stuff,

				I was like, whoa, this is a lot going on, and most people just have no idea

				about it. And in terms of how other bugs and whatnot, something that is often

				mentioned is Spectre and that still affect thing. And the only mention, on

				Wikipedia in the Mitigation section of Spectre, they mentioned site isolation,

				but I was like, this should have its own page, so maybe one day -

				23:20 CHARLIE: Maybe one day.

				23:20 SHARON: one of us is going to write a thing about that. But yeah, that's

				kind of the bug, right? So can you just talk about that?

				23:25 CHARLIE: Yeah, so Spectre and Meltdown were certainly a big change to the

				security landscape for browsers. At a high level, those are attacks that are

				based on the micro-architectural parts of the CPU. The way that the basic CPU

				hardware works, there are ways to leak data that weren't anticipated. And we

				can view it as it gives attackers what we call an arbitrary read primitive,

				something that can access anything in your address space in a process. You can

				think about it as the CPU wants to not stop and wait for going and accessing

				data from RAM, so it thinks, well, I'll just guess what the answer is going to

				be and then keep running some instructions. And if I was right in my guess, the

				next several steps are done already, and I can just move on from there. And if

				I was wrong, well, I just throw away that work, and I do the right thing, and

				we move on, and everybody is fine. But attackers found that while you're doing

				those extra steps ahead of time, you're also affecting the caches on the CPU,

				and cache timing attacks let you find out what work was done there. So some

				very clever researchers found that you can do some things in those extra steps

				that happen in this speculative state to find out what data is in addresses you

				don't have access to. And so places where we thought some check in the renderer

				process could say, oh, you don't have access to this thing from another

				website. We're fine. Now, you could get access to it, just based on how CPUs

				work, without needing any bugs in the browser. So now, we're thinking, OK,

				we're running JavaScript, and if it can leak things from the renderer process,

				we can't have data we're stealing in the renderer process. You could try to

				find ways to prevent those attacks, but those ended up being difficult. And

				ultimately, we found that it wasn't really feasible to prevent the attacks in

				all the forms that they could happen. So site isolation became the first line

				of defense to say, data from other websites, data we're stealing should not be

				in the render process where a Spectre attack could get access to it. Now, that

				was actually one of the big, exciting events that helped us accelerate the work

				on site isolation and get it launched when that was discovered in 2017 or 2018.

				26:24 SHARON: So at that point, site isolation was mostly done, and it was just

				getting it out?

				26:24 CHARLIE: Yeah, it was really interesting. So we'd been working on it for

				several years for a different reason for the fact that we wanted it to be a

				second line of defense against compromised rendering processes. We assume

				people are going to find bugs in the renderer process, in V8 or in Blink or

				things like that, and we wanted that to not be as big of a problem. We wanted

				to say, OK, whatever. There isn't data we're stealing in that process. We had

				already shipped some initial uses of out-of-process iframes in 2017 for

				extensions, and we were working on trying to do some sort of initial steps

				towards using site isolation for some websites and see how that goes when we

				found out about Spectre and Meltdown. And so that next six months or so was a

				very accelerated, OK, we've got to get everything else working with the way

				that site isolation interacted with DevTools and extensions and printing and a

				bunch of other features in the browser that we needed to get working. And so it

				was an interesting accelerated rollout, where we even had an optional mode and

				an enterprise policy where you could say, I don't care if printing doesn't

				work, turn on site isolation so that Spectre attacks won't find other data

				we're stealing in the process. And then we got to where it was working well

				enough we could ship it for all desktop users in, I think it was Chrome 67 in

				mid 2018. So it was good that far along that we were able to ship the full

				thing within a few months.

				28:19 SHARON: Very cool. Yeah, I mean, those are all the things that make

				navigation hard, like extensions as part of it, and there's just all these

				things and all of these go-through navigation and effective, so that's very

				exciting. So what is the state of site isolation now, and are there still going

				to be changes? That was a few years ago, so are things still happening?

				28:45 CHARLIE: Yeah, we're still trying to make several different improvements.

				We've made several improvements since the launch, so that initial launch, since

				it was mostly focused on Spectre, didn't have all the defenses we wanted

				against compromise renderer processes, because the Spectre attack can't affect

				actual running code. It can't go and lie to the browser process. It won't give

				you full control over what's running in the renderer process, but it can leak

				all the data that's in there. So anything that a web page can pull into a

				renderer process can be leaked. So after that initial launch, we needed to go

				and actually finish the compromise renderer defenses and say, OK, all the IPCs

				that come out of the renderer, make sure they can't lie and steal someone

				else's data, so get all the browser process enforcements in place. Another big

				thing after that was getting it to work on Android, where we wanted this

				defense. We have a much different set of resource constraints on mobile

				devices, where there's not nearly as much memory and renderer processes are

				often killed or just discarded. So there, we couldn't isolate all websites from

				each other. We had to use heuristics to say, here are the sites that need it

				the most, so sites where users log in, in general, or sites where this

				particular user is logged in or other signals that this site probably needs

				some protection, we'll give those isolation, and then other ones can share a

				renderer process. So we've tried to improve those heuristics and isolate as

				many sites as we can there. And then things that we weren't initially isolating

				from each other, we have been able to. So extensions was an example where we

				started by just making sure extensions didn't share a process with web pages,

				but now, we make sure that no extensions can share a process with each other.

				And we're trying to get to where we could isolate all origins from each other,

				depending on what resources are available, but there's some changes with,

				basically, deprecating document.domain that are in flight that might make that

				possible.

				30:57 SHARON: So say I have a fancy computer, and I just want maximum site

				isolation because I care about security. How do I go get that?

				31:03 CHARLIE: Yeah, so there are some experimental ways to do that. You can go

				into the chrome://flags page, where you can turn on and off different features

				and experiments that are in progress. And there's one there called strict

				origin isolation, which will ensure that all origins within various sites are

				isolated from each other, and that works on desktop and Android. It'll just

				create slightly more processes than we do today. Similarly, on Android, if you

				wanted to isolate all sites, there is an option for full site isolation there

				called site-per-process, which you could use that or strict origin isolation to

				get maximum site isolation today.

				31:51 SHARON: So another platform that Chrome does exist on is iOS. So can we

				do anything there? Why is that not in [INAUDIBLE]

				31:58 CHARLIE: So Chrome for iOS has to use Apple's WebKit rendering engine

				today, and current versions doesn't have site isolation, and we don't have the

				ability to run our own rendering engine that has support for it. So we don't

				have it today, but my understanding is that WebKit is working on site isolation

				as well, and actually, Firefox has also shipped their version of site

				isolation, which is pretty cool to see other browser vendors building this as

				well. And so if that were made available to other third-party browsers on iOS,

				then maybe it could be used there. But at the moment, we're constrained, and we

				can't ship it on that platform.

				32:47 SHARON: In terms of how the internet happens, this seems like a good

				thing to just have generally. So is it possible that this could be a spec one

				day that any browser should implement, or is it - because it's under the hood

				and it's not something that's maybe necessarily visible to websites, maybe

				that's not part of it, but is this an option?

				33:04 CHARLIE: Yeah. I think it ties back to the earlier question about web

				security model versus browser security model, where the web visible parts of

				this, it's meant to be transparent to the websites. There's no behavior changes

				to the web platform by turning on site isolation. There's not meant to be. And

				so it's not really a spec visible thing, it's more part of the browser's

				architecture, the same way that there's no spec for sandboxes in a browser. You

				could build a browser that doesn't have a sandbox, but today, the best practice

				is to have better security by having a sandbox. So I think the relevant thing

				for web specs is just that we don't introduce APIs that don't work when

				different origins are in different processes. And that sounds like, well OK,

				that makes sense, and thankfully, we were sort of in that state to begin with,

				and in some places we got lucky. Like postmessage is asynchronous, which is a

				mechanism for sending a message to another origin, but they don't need to run

				in the same process because that message will be delivered at a later time. So

				we can send it to a different process running on a different thread. Some

				places we got unlucky, like document.domain, where web APIs said that different

				origins can script each other if they agree that it's OK, as long as they're in

				the same site, and that constrained us in the process model. So we're trying to

				improve things about the web spec. You could almost say that deprecating

				document.domain is a way of seeing that the browser security model and the web

				security model aligning with each other to say, OK, we want to use processes.

				We want this asynchronous boundary. You shouldn't be able to script other

				origins from the same site. So I think that's the closest is making sure that

				specced APIs fit well with this multi-process site isolation world.

				35:12 SHARON: There are some headers and tags and whatever that websites can

				use to alter how the browser handles things though, right?

				35:23 CHARLIE: Yes, absolutely. And those are both good ways that websites can

				more effectively isolate themselves, in general, both from web visible behavior

				and from the browser's architecture and ways that browsers that don't have

				full-site isolation, that don't have out-of-process iframes in all cases, web

				pages might still be able to get some of the isolation benefits using those

				APIs. And so those are things like cross-origin opener policies that says, for

				example, if I open a pop up to a different website, there's not going to be any

				communication between me and that pop up. So it's OK to put them in different

				processes, and they can be better isolated from each other. That's good from an

				architecture perspective. It's also nice from a web perspective in that you

				don't have to worry about is the window.opener variable in the pop up able to

				be used to do sneaky things to the page that opened it. So there's nice,

				web-visible reasons to use something like a cross-origin opener policy to keep

				them protected from each other. So that's one example of that. There's others

				as well.

				36:46 SHARON: Something I've seen around that is a web spec is content security

				policy. Is that related to any of this at all?

				36:52 CHARLIE: It kind of is. Yeah, so content security policy is another way

				for websites to tell the browser better ways to secure that site. And so some

				of it is useful for saying I want to do a better job preventing cross-site

				scripting attacks on my page, so don't run a script if you find it in these

				random places. It should only come from these URLs or in these contexts on my

				page. So that's more about what happens in a given renderer process, but there

				are some places where content security policy does overlap a bit with site

				isolation. There is a sandbox value you can put into a content security policy

				header that makes it get treated like a sandbox iframe. And while we don't yet

				have support for putting sandbox iframes in another process, that was work

				that's in progress and we're hoping to ship before long. And so CSP headers

				that say sandbox will also be able to be isolated from the rest of their site.

				So if they have some kind of untrustworthy content in them, that won't be able

				to attack the rest of the site.

				38:04 SHARON: OK. Yeah, so it's that difference between the web versus browser,

				what's visible, what's an option versus how it's actually implemented.

				38:11 CHARLIE: Right.

				38:11 SHARON: Cool. So a lot of this, we've talked about security a lot, and I

				think for people who don't know about security, the image you have is people

				trying to break into - like I'm in, that whole thing, and that's very much not

				what's going on here, because we're not trying to break things. So can you tell

				us just a bit about the difference between offensive and defensive security and

				how this is one of those.

				38:38 CHARLIE: Yeah, so a lot of attention in the security space goes to big,

				exciting, flashy attacks that are found. On the offensive side, look, I found a

				way to break the security of this thing, and we have big vulnerability reward

				bounties to reward when people find these things so we can get them fixed. So

				even on the defensive side, you want people working on offensive security,

				looking for these bugs, looking for things that need to be fixed so we can

				defend users. But the defensive side is super important and I find it a

				satisfying place to be, even if it isn't always as glamorous. It's like, you

				have to have all the defenses in place and all of these different attacks that

				are found, it's like, yeah, we need to fix them, and we need to find ways to

				make that less likely. But ultimately, this is the real goal, is we want to

				have systems that we can trust, that are safe to use, and that we can go and

				visit untrustworthy web content and not have to worry about it. You need these

				extra lines of defense. You need all these different ways of defending the

				product and shipping security fixes fast, all the things that security works on

				in a defensive sense so that people can use these systems and depend on them in

				their lives. So that's the fun and fulfilling part of this, even if it isn't

				quite as glamorous as I found a sandbox escape, but those are fun to look at

				too.

				40:17 SHARON: I heard security described as a bunch of layers of Swiss cheese.

				So you have all these different layers of mitigations to try to keep bad things

				from happening, but each of them is not perfect. And if the holes in those

				layers line up, then that's where you get a vulnerability. So in this very

				approximate metaphor, what are the neighboring slices of cheese to site

				isolation? What other defensive things are related to this and are trying to

				achieve the same goal sure?

				40:46 CHARLIE: Sure. Yeah, so there's going to be holes in any layer that you

				build we. Have bugs in software, and in site isolation's case, it's trying to

				put this boundary between the renderer process, where we assume everything is

				compromised already and the data that the attacker wants to get to, other

				websites, data on your machine and so on. So the adjacent layers of Swiss

				cheese would be within the render process, we do have security checks that try

				to say we have same origin policy checks, things that try to keep certain data

				opaque to a web page so the JavaScript can't look at it. Those checks in the

				renderer process do matter. Today, we do have multiple origins from the same

				site in the same process. The renderer process' job is to make sure that they

				don't attack each other. But there's some fairly large Swiss cheese holes in

				that layer that we try to fix whenever we find them. And so site isolation's

				job is to be the next layer, which won't have holes in the same places,

				hopefully. Its holes, site isolation bypasses, might be, oh, there's some way

				for the renderer process to ask the browser process for something it shouldn't

				have access to, and it tricks it, and it gets access to that. We hope that it's

				tough to line those holes up, that an attacker has to find both a bug in the

				renderer process and a bug in site isolation and luck out in that those bugs

				line up and you can get to one from the other in order to get access to another

				website's data. And then the next layer of Swiss cheese would be all the things

				that the browser process does to keep the renderer isolated from the user's

				machine and the sandbox itself that you shouldn't have access to the OS APIs

				and so on. So those would be other ways to try and get beyond site isolation to

				other things.

				42:48 SHARON: That makes sense. Yeah, when I first heard about it, I was like,

				oh, that's such a fun way to think about it, really. It's a good visual seeing,

				OK, this is how things go wrong. All right, cool. Do you have any other fun

				stories about site isolation, making it happen, stuff since then?

				43:08 CHARLIE: I mean, it's been a really fun journey the whole way. There's

				been different projects and different exploratory phases, where we weren't sure

				what was going to work or what we needed to get done. I've worked with a bunch

				of great interns and people who have been on the team on early phases like

				getting postmessage to work across renderer processes, later phases about what

				would it look like to build out a process iframes using something like the

				plugin infrastructure, just is this feasible? Or what is it that we could

				protect that a particular renderer process is allowed to ask for. If can we

				keep allowing JavaScript data from other websites into a renderer process,

				while blocking your bank account information from getting it, those both look

				like network responses from different websites, but one has to be let through

				for compatibility reasons, and one has to be blocked. Can we build that? Are we

				doing a good job of keeping that sensitive data out? These are things that. We

				had some great PhD interns working with us on, and ultimately, got us to where

				we could ship this and protect a lot of data. So it's fun working with all

				those people along the way.

				44:35 SHARON: Yeah, that sounds very cool. These days, so earlier on, you

				mentioned people whose questions were like, why doesn't this already happen? So

				these days, it does happen more or less like that. So what kind of questions or

				misconceptions do you still see folks who typically work on Chrome still have

				when it comes to this kind of stuff?

				44:52 CHARLIE: I think it's often assuming that navigation is simpler than it

				is and not realizing how many corner cases matter and how all of these

				different features that have built on top of navigation interact with each

				other. So I think that's where we spend a lot of our time these days beyond the

				we want to improve site isolation. We want to make these abstractions easier

				for other people to understand. So I think that's one of the big challenges now

				is how many different directions the navigation code has been pulled and how

				those things interact with each other.

				45:24 SHARON: Right. And that's kind of - was intentional initially, right? You

				don't want everyone who works on Chrome to have to know how all of this works,

				but then when you hide it so well, they're like, oh, this is fine. I'll just do

				my thing. It'll just be my one thing, but then everyone has such a thing, and

				then it becomes too many things. Yeah, I used to work on a different part of

				Chrome that was not related to this, and you see some of these big classes,

				like web content or whatever. You're like, oh, I'll just get what I need from

				that, and things will be fine, but you just don't even have any idea of all the

				things that could go wrong. So it's cool that someone is out here trying to

				keep that under control.

				46:00 CHARLIE: And I'm glad there's a lot of efforts to try to improve the APIs

				for how we expose these things, web content to web content, observer which is

				growing into quite a large API with many users, looking at ways to make these

				APIs easier to use and harder to make mistakes with. So I think those are

				worthwhile efforts.

				46:20 SHARON: OK. Cool. Well, I think that covers all of it. Now, folks know

				how isolation works. Problem solved. This is great. All right, thank you very

				much. Great.

				46:34 CHARLIE: Thanks. Oh, no. What? OK, hold on.

[docs] add "What's Up With That" transcripts

16 docs/README.md

601 docs/transcripts/wuwt-e01-pointers.md Normal file

453 docs/transcripts/wuwt-e02-dchecks.md Normal file

488 docs/transcripts/wuwt-e03-content.md Normal file

968 docs/transcripts/wuwt-e04-tests.md Normal file

923 docs/transcripts/wuwt-e05-build-gn.md Normal file

978 docs/transcripts/wuwt-e06-open-source.md Normal file

1057 docs/transcripts/wuwt-e07-mojo.md Normal file

1091 docs/transcripts/wuwt-e08-processes.md Normal file

691 docs/transcripts/wuwt-e09-site-isolation.md Normal file

16

docs/README.md

601

docs/transcripts/wuwt-e01-pointers.md Normal file

453

docs/transcripts/wuwt-e02-dchecks.md Normal file

488

docs/transcripts/wuwt-e03-content.md Normal file

968

docs/transcripts/wuwt-e04-tests.md Normal file

923

docs/transcripts/wuwt-e05-build-gn.md Normal file

978

docs/transcripts/wuwt-e06-open-source.md Normal file

1057

docs/transcripts/wuwt-e07-mojo.md Normal file

1091

docs/transcripts/wuwt-e08-processes.md Normal file

691

docs/transcripts/wuwt-e09-site-isolation.md Normal file