We've established a bit of a tradition here at Ars. Every year at Google I/O, we have a sit-down talk to learn more about Android directly from the people that make it. Of course, this year, just about every major event was canceled due to the coronavirus pandemic, nothing is really normal, and Google I/O never happened.
We can still do interviews over the Internet though! So while it happened later in the year than normal, we were still able to hold our annual chat with some of the most important Googlers at Android HQ: Dave Burke, Android's VP of Engineering, and Iliyan Malchev, Principal Engineer at Android and the lead of Project Treble.
We came prepped with questions about the more mysterious corners of Android 11, which actually led to a lot of interesting talk about the future. You'll learn about a coming re-write of the Bluetooth stack, and there's lots of talk about modularity and easy updating (like plans will hopefully, someday, allow you to update the Linux kernel and developer APIs as easily as you download an app update).
This interview took place just days before the launch of Android 11, which went final on September 8t. As usual, this has only been lightly edited for clarity, and I'll include whatever background is needed in italics. Given the odd state of everything as we all popped into a Google Meet video chat, COVID was an obvious first topic.Ars: How are you all dealing with COVID-19 in Android development land?
Dave Burke: Good. Like everyone, when we did the work-from-home switch, it was a bit of a scramble to say the least. We had a lot of developer productivity things to iron out. A lot of people use high-powered workstations for building the operating system, with a phone tethered over USB, and we wanted to find a way that they could still use their workstations but flash to a phone that was tethered to a laptop. So we did a bunch of infrastructure work and whatnot to get everyone up and running. That actually worked pretty well. I was pretty impressed.
Google Meet worked out great, too. I remember thinking a couple years ago, "Wow, Google is investing a lot in this video conferencing stuff, why not use something commercial?" But now I'm so glad Google built its own. A lot of us now use the Meet rooms, so we have a lot of like virtual, Slack-like channels, if you want to call it that. It's been pretty good!
I mean it's obviously been tough as well, not having things like corridor conversations. We saw somewhere between like a 7 to 11 percent drop in productivity, I would say, at the beginning. Then it seemed to recover as people adapted. We did set the schedule back about a month just because we need to accommodate folks with that transition. The industry needed time to adapt, too, but yeah, it's mostly working out. Of course, I think we would all be quite happy to go back to work.
Iliyan Malchev: I think the biggest worry going into COVID was just, 'Will the ISP infrastructure be able to handle this huge spike in media consumption?' It seems to have held up for the most part.
Burke: The other thing I've been working on is is exposure notifications with Apple, so that's been pretty intense. We're building capacity to detect if you've been approximate to someone who has tested positive for COVID-19. We've been running fast on that, so that's been an exciting extra second job.
Ah, yes, Android's COVID exposure notifications rolling out to a smartphone near you. The APIs for this have shipped in Google Play Services and rolled out to basically all of the two billion Google Play Android devices out there. Full OS updates might be dependent on your device manufacturer, but Play Services updates hit every phone that has the Play Store installed, so basically every Android phone outside of China.The current problem with COVID tracking is that individual health organizations need to make apps that plug into this API, and in the US, that's usually your state government. States don't really have competent app developers on call for something like this (Is your local DMV website as much of a disaster as mine?). So far, only six states currently have COVID apps. It sounds like Google's next step to try and fix this is to make apps itself, so states only need to supply a configuration file to get up and running.
Project Mainline updates
Project Mainline, aka "Google Play System Updates" is one of the biggest changes to come to Android in some time. The feature debuted in Android 10, and it's a major step in the modularization of Android. Mainline added a new "APEX" file type, designed to package core system components for easy updatability through the Play Store. Previously, the Play Store only shipped APK files, which, since they were built with third-party apps in mind, came with all sorts of security and functionality limitations that wouldn't work for core system components. APEX files can only come from Google or your OEM, so these offer a lot more power and start-up earlier in the boot process. APEX files can do things like update the app framework, which you could never do with an app.
Mainline was also about getting OEMs on-board with Google taking over more of Android's base code—code that previously was available to OEMs to customize. Google had to sit down with all the OEMs (I imagine these meetings look like the Galactic Senate) and ask, "Do you really need to customize the way the DNS resolver works?" When all the answers come back "no," that becomes a Mainline module and everyone agrees to ship the same piece of code. When there are customization concerns, Google and OEMs are working together to upstream code into a module that everyone can use.
That was Android 10. For Android 11, the last news we got about Mainline was Google's first developer preview blog post from February, which said there were "12 new modules" in Mainline. It didn't provide much more detail than that.
Ars: Your blog post said there were "12 new apex modules" in Android 11, but what are they exactly?
Burke: Yeah, there's a bunch. I have a list here: so statsd, which is our daemon for collecting stats, and that makes sense because you want to have uniform telemetry. Wi-Fi tethering is a new module. NNAPI—the neural networks API—again, that's another space that's changing rapidly as we learn more techniques in machine learning. ADBD. Cell broadcast. There's some Wi-Fi modules. SDK extension stuff. Oh, and a media provider as well, which underpins scoped storage, so we wanted that to be updatable.
I think there's a total of 21 modules now and I think probably more important than what the actual modules are is the work that's gone into the infrastructure to make them possible. We have very advanced release management. We've got short-term, long-term telemetry. We've got A-B capability. We've got a file system snapshot in the rollback. And the other part, of course, is just a cultural change for the developers to learn how to write in a module that's being updated all the time. I'm pretty excited about the foundation that we've laid more than what the specific modules are because there's more to come.
Ars: Speaking of "more to come," I wanted to ask about that "SDK extension module," which sounds pretty important. Is this as interesting as I'm imagining? You want to deliver new API levels via the Play Store?
OK, time out while I explain this question: Android versions are identified to you and me by their version numbers, but internally Android identifies itself to apps with a number interchangeably called the "SDK level" or "API level." So all the new features, permissions, and security restrictions in Android 11 are available to apps in "API Level 30." In the past, API Levels have always gone up +1 with each Android release (even for the smaller 0.1 releases, which is why we're at level 30).
The speculation with an SDK module is that Google would be able to ship entire new SDK levels to developers, including new features, without having to push out an entire OS update. This would be absolutely incredible for Android, since full OS updates have such poor distribution and small user bases that developers are reluctant to support new APIs when no one can run them. API levels over Google Play would be just like a Play Services rollout, where a new feature can hit two billion devices virtually overnight. This also sounds very hard to believe, because a new developer API can hit any part of the OS. How could you possibly update that via a single module?
Burke: I think the whole idea of updatable OS modules is a pretty profound shift, so it's all pretty interesting. But yeah, we have the ability in Android 11—Android R, as we call it—to create new system APIs and deploy them in mainline modules. That's in R. In S [Android S would be version 12], we're going to plan to be able to actually deliver new public APIs in Mainline modules, so we're really just extending the breadth of what's a module and what's updatable.
Ars: That's going to have to be more limited than a full OS update, right? How well can that work compared to an OTA? It sounds amazing but also pretty hard.
Burke: Yeah, I think it's still early. You're right. The challenge with updatable modules is that the module updates but you can't assume that everything around it updates. So, you have to be careful and have stable internal APIs or boundaries between those interfaces.
So yeah, we're still working. I think what you really want is for the API to be connected only to another updateable module, otherwise it doesn't quite make sense. We're building out the capability and then we'll see what we'll use it for.
The GKI, a generic (and updatable?) Linux kernel for Android
The other big ease-of-update project going on at Google is the development of the GKI, a "Generic Kernel Image" for Android. (It seems like the majority of Android work lately is going toward making updates easier.) Just as Project Treble focused on making the user-facing top-half of the operating system more updatable, the GKI is focusing on standardizing a big chunk of the bottom half of Android: the Linux kernel.
Currently, the Linux kernel used in an Android phone is heavily forked and unique to each individual model of phone. There are usually three forks between Linux and an Android kernel. First, the Linux LTS kernel gets forked by Google into the "Android Common" kernel, with all the patches needed to run Android. Then Android Common gets forked by SoC vendors like Qualcomm or Samsung (for Exynos) into an SoC kernel, with all the changes needed to run on a particular model of SoC. Then the SoC kernel gets forked into a device kernel by a phone manufacturer, with all the changes needed for a particular device. The device kernel is what ships to users.
Just as Project Treble introduced the Generic System Image (GSI), a version of Android that can run on every single Android device thanks to lots of modularization work, the GKI is a build of the Linux kernel that can run on every Android device. The GKI isn't mainline Linux yet—though work is happening toward that—it's the Android Common kernel, so we're dealing with one fork instead of three forks. Progress!
Like the GSI, at first the GKI will only be used for testing. In Android 12, Google is planning to ship the GKI to consumers. Today, Android devices can get minor LTS kernel updates through the monthly security system updates, but they almost never jump major versions.
Soon it looks like updating the kernel will get a lot easier, though. Luca Stefani, the director of the Lineage OS Android distribution, spotted a "GKI" Mainline module in the Android source code. So will devices soon be able to update the Linux kernel as easily as you would update an app?
Ars: So there's this "GKI" Mainline module in AOSP, too. Are you guys really going to deliver a Linux kernel over the Play Store?
Malchev: Yes. I think I mentioned last time when we spoke we've been on this multi-year journey to shift kernel development more towards upstream and away from multiple forks, which is the case today, between our SoC partners and our OEM partners. GKI is just the final manifestation of this. We have several phases we're working on, but they're all predicated on GKI existing to begin with. So we've been working very closely with our chipset partners and our OEM partners behind the scenes to steer development more towards upstream. That has been the biggest challenge. It has been organizational and philosophical rather than technological, per se.
We recently extended the lifetime of LTS kernels from two to six years to be able to cover the actual lifespan of individual Android devices. The way that GKI has unfolded is that there's going to be a GKI for each supported LTS train on Android. So for example, with Android 11, we officially support three kernel versions, of which 5.4 is the new version that we're adding. So with Android 11, any device that launches with 5.4 LTS is going to have to be compatible as with the GTK 5.4 kernel. We did not retroactively create GKI kernels for the older existing LTS lines because we just don't do anything retroactively with Android. It would be incredibly expensive and time-consuming. So any device that launches 5.4 in Android R is GKI compatible. What that means is if you ship 5.4, then you have to take the GKI kernels, replace your shipping kernel with it, and pass all of our qualification tests.
Going into the next Android release, we're aiming to have devices actually shipping with these Generic Kernel Images. We can update it via the OTA subsystem that OEMs have but we also want to lead the way by having GKI be treated as a mainline module in our own Pixel devices.
Burke: This is a pretty profound direction we're going in, and it's also super complicated, but I think the idea of having a generic kernel that's common between all devices within those LTS versions would be huge from a normalization perspective. You want to have some uniformity at the system level. And of course, devices can still express their unique hardware with individual drivers.
The other thing that's really interesting here is that about 40 percent of our CVEs—or our reported security bugs—are in the kernel. So the ability to have an updatable kernel would have a huge impact on security and also maintenance costs for device makers. You would simplify and reduce their costs greatly.
Because of that statistic, in practice, we found that our CVE will be reported to us and we'll realize, "Oh, because the Pixel team has been revving the kernel, it's already fixed." So the ability to be able to be closer to upstream will have a huge impact on the security overall and in the ecosystem.
Ars: I feel like what I've heard in response to updateable kernels in the past is that that's scary and dangerous and you could break a device. Is there anything about this that makes it safer?
Malchev: We have built a lot of fail-safes into the system which are shared with the broader Mainline project. If, for whatever reason, an update is pushed to the device that causes it to fail to boot—we have some complex heuristics about what "failed to boot" means—we have this pretty elaborate checkpointing mechanism built into Android where a failure to boot will be will cause the entire device state to be rolled back to a previous last known good point. In the case of catastrophic failure of this kind, you will simply revert back to the last known boot point in your device with whatever it was good with before.
Ars: Are you talking about A/B partition thing that's been around for a while or is this something new?
Malchev: Dynamic partitions is what I think what you're thinking of. It's not that, we have User Data Checkpointing where it's not just about GKI, it's about any critical component of the system.
Time out! I get it now. User Data Checkpointing (UDC) was added in Android 10 with Project Mainline. The docs describe it as a backup system for the /data partition, just like how the A/B partition system is a backup for the /system partition. The A/B partition system allows for a rollback of updates that fail to boot, but if a bad update modifies the data partition before it crashes, the new modified data partition can't be used with the rolled back version of the system image. (Downgrading the OS while keeping user data is a security risk.) Since new /data can't be with an old system partition, the user just lost all their data and needs to wipe their phone. Google added snapshot support to Android's F2FS file system for easy backups, and now /data can be rolled back along with /system if something bad happens. The docs specifically describe it as a backup for the /data partition, but it sounds like it also works on other partitions.
Malchev: Let's say you push an update, and the device starts booting, it's updating database schematas, it's making changes, but at one point something goes bad. And that starts causing SystemServer crashes or resets. We catch those problems and when you reboot the next time, simply throw away the checkpoint which hasn't been committed because we haven't received a "good" signal. Then, boom, you're back to the last good state as if nothing had happened. This is our safety net to catch the bad regressions. Of course, we don't ever want to get there, and this is just an insurance policy.
Burke: There are a lot of preemptive things that you can do in terms of incremental rollouts and A/B comparisons. Having the telemetry built-out is super important, which is why with Mainline we've started with the less exciting modules at the beginning to prove out all of that infrastructure before you go for the big ones like this.
But you know, if you think back to operating systems like Windows 95 or Windows 3, they came on a diskette and there was no crash reporting back to Microsoft. There were no Service Packs. Then, it started to slowly change. You could send your crash reports back to Microsoft, and they started seeing patterns and then they start delivering Service Packs over the Internet and you could patch the little pieces of it. Now you look towards today at any advanced app on Android or IPhone, and they have a lot of crashalytics and crash reporting. They have like A/B data. They've got telemetry. It's very sophisticated. You might even have a react native app where it's updating parts of the app dynamically.
I think there's a direction where operating systems are evolving to. Things that were previously super scary, like the idea of changing your kernel at all, I think if we get this right, we can start changing the narrative on this and make it not scary. It's not trivial. It's super-complicated engineering, but I think this is the way you go. You have de-identified, anonymous, aggregated statistics. You have a pulse on the health of the fleet of the system, and then you have the ability to push incremental updates and roll back, etc. That's the sort of direction that I think this is all going in.
It's interesting to hear Burke talk about all Android phones, all around the world, as a "fleet" that Google (anonymously) watches over. Google Play Android devices all come with telemetry data that can go back to Google, which you can turn on or off during the initial setup or after setup at Settings -> Google -> (top-right menu button) -> Usage & Diagnostics. It's on by default, so a good percentage of the two billion Android devices out there are sending back telemetry. In a previous interview, we heard Android engineers talk about how this enables a huge internal dashboard at Google that lets it track and improve things like battery life and run experiments on the Android beta population. It's probably great for crash reports, too.
Malchev: Oh, I think I could make one more point regarding the GKI. Since the entire operating system rests on the kernel, there's a lot more value in having a uniform kernel that the entire OS can rely on in terms of functionality and so on. Being able to rely on this is very very important for all other Mainline modules. Like something as simple as your C library, it talks directly to the kernel, but to which kernel? So when we have a single kernel that your libc library talks to, that's just transformational for the rest of the system. The transitive dependencies get so much simpler when we can reason about having a single kernel.
Ars: Right, and your SoC partners seem to not be very enthusiastic about updating the kernel, so this will help. But are we talking about LTS updates or major new kernel versions?
Malchev: We're talking about LTS train so like the 5.4 kernel keeps getting updated. We really want to get to the point where we can jump across versions as well.
It's not about enthusiasm or lack thereof, there's so much complexity hidden in the pipeline that we're trying to reduce. I'll say that our partners have been very supportive of this effort and they see the benefits more than the risks.
Whatever happened to...
Not every Android feature we get wind of is immediately adopted or even released, so why not check up on some things? The first is the "Slices API," which sounded like a big deal when it was announced for Android 9. One slide from Google I/O pitched the Slices API as modular UI code—developers could use the Slices API to build a notification, and that UI would also show up as a home screen widget, a search result, and an icon long press.
Developers tend to focus on notifications, since those pop-up drive app engagement, but they ignore lesser-used Android features like home screen widgets. Slices would fix that by turning notification work into a UI that would work everywhere.
Ars: Whatever happened to Slices? There was this one slide at Google I/O that called the Slices API "one reusable API for remote content in Android," which sounded interesting. Did anything happen to that or was it a bad idea?
Burke: I still think it's a great idea, but I don't think we found the fit for it just yet. We actually built it out and right now we're working with the Google Assistant team to see if we can figure out something that makes sense. We don't think we found the right balance between effort for the developer and user features to make it make sense yet. It's something we're still really interested in. I think the core concepts are really good. I think it's just the details of how to apply it and use cases are not quite there yet. So we're gonna keep iterating on it.
Ars: OK. The thing that I liked about Slices—maybe it was a side goal of it—was that developers would "accidentally" make home screen widgets just by making notification UI. You could take a media player notification and pin it to your home screen or something.
Burke: Yeah, exactly. That unification of different places to remote display content was definitely part of that goal. That's also an ingredient to make sense from a developer standpoint. Like, if you can write to this one API and get multiple surfaces, the incentives are going to be good. I do think that's part of the answer but we still got to iterate more to make it fully make sense.
Another missing-in-action feature is the permissions dashboard, which we saw in a leaked build before any official Android 10 developer preview ever came out. The dashboard would give you visualizations showing what apps were using which permissions, a little like the battery dashboard, just for privacy. It never launched, but it did end up in AOSP, where Lineage OS developers picked it up and turned it into a shipping feature. What about being included in official Android, though?
Ars: What is the permissions hub in AOSP? We saw it leak out in early Android Q builds but it never ended up shipping, right?
Burke: Oh yeah. There's not much there. Generally, we're always trying different things. That was an idea that wasn't really complete. Our focus in Android 11 is on giving developers tools to better understand and improve their use of sensitive data. So one of the things we added was the data access audit and we've actually had some really good feedback on that. So this is the ability for you to sort of attribute your usage of sensitive data, like maybe location, and then later have an audit log. It's basically a callback where you learn about what permissions your app is using that it might not need.
You might wonder like, "Why would a developer need that?" But when you have these really complicated apps that have multiple teams across an organization working on them, it's very easy to lose track of all the different data access that's happening. So providing those tools actually was something that's being we've been asked for. That's what our focus has been on this release. And yeah, we'll try not to leak things out that are half-baked ideas.
One of the odder design decisions in Android 11 was to strip the Recent Apps screen (or "Overview" as Google has taken to calling it) of a ton of features. Both Android 10 and 11 have a scrolling list of thumbnails, but Android 10 had a Google search bar, a row of predictive app icons, and, when you swipe up, a full-on app drawer. Android 11 walked that way back, stripping out everything below the thumbnail view and replacing it with two buttons, "Screenshot" and "Select."
The predictive apps and app drawer in Android 10's Recent App screen was an odd feature. Despite being Recent Apps, it was actually part of the home screen launcher code. This really complicates things in Android because you can replace the default home screen but not the Recent Apps screen, so what happens to the Recent Apps code when you replace the home screen? Well, uh, you lose all the app icon features. Predictive apps and the app drawer in Recent Apps only worked if you were using the packed-in launcher, and if you switched to a third-party launcher, everything below the thumbnail view just disappeared. Anyway, why the change?
Ars: The recent app screen used to have an app drawer and suggested apps, and that's been removed in Android 11. Did this have something to do with third-party launcher support or did it just not work out?
Burke: No, that was just more a design goal. Some people liked it in Overview, but we noticed a couple of things. One, we figure that screenshotting and sharing your screen was a much more natural thing to do in Overview, so we wanted to make that functionality much more prominent.
And number two, in terms of predictive apps and content, we felt like it was a much more logical place to be part of your home screen, and so we want to invest more there. So really it's about specializing the spaces and setting a clear vector for both the home screen launcher and Overview. It just gives us more clarity about what we're going to do over the next few years.
Next up: Dynamic System Updates (DSU) was another future-facing Android 10 feature we were hoping to get an update on with the launch of Android 11. DSU is a dual boost system for Android, letting you load a second, "guest" version of Android on your device without destroying the main OS. A second copy of Android could be downloaded onto a temporary, virtual partition, which you could then boot to. A feature like this would make testing Android betas a lot easier and more accessible. Previously it required wiping out a phone and installing unstable, unfinished software as the main OS, which is a real hassle unless you have a dedicated device for it.
DSU was not super useful on Android 10 since, with Android 10 being the only compatible OS, you could only switch between different versions of Android 10. The release of Android 11 GSIs was the first time you could load a different version though the DSU system, but you would need to hook the phone up to a computer, install some SDK bits, and do some command line work. That all seemed like a lot to work, and Googlers seemed to agree. Android 11 came with a "DSU Loader" section in the developer options, which would automatically download and install a new build, direct from Google. Suspiciously, this is in the most important, top section of developer options, as if it's going to be a really big deal. So, for Android 12, will we finally see easy-to-install beta builds?
Ars: I saw there's a thing in developer options that will pull down images for DSU installs. Are we going to see this offer beta builds in the future?
Malchev: That's the intent. The idea is to be able to download a Generic System Image or really anything that is Treble-compatible to a device and run it without destroying your factory image. If you're a developer and you need a physical device to develop against, we want it to make it possible for you to try out new Android images on real hardware but without the destructive aspect of "I nuked my Samsung image" or whatever.
We fetch the images internally because they have to be signed appropriately and we do not want this mechanism to be abused by anyone. DSU support has to be explicitly okayed by the OEM because we don't want to enable some sort of back door approach to this. We're not really pushing DSUs too hard because this is a complex mechanism. Like with GKI, it's a complex mechanism, so we wanted to make sure that all the various wrinkles are smooth out before we say it's ready. So it's supported on Pixel devices right now.
Ars: OK, and there's no way to go backwards to older operating systems for testing, right? Is that a security problem?
Malchev: You can, but you probably shouldn't. The reason I'm saying is, when you download a dynamic system update, you are not rolling back things like the keystore master and keystore house because we don't want to touch the vendor substrate. That would be truly dangerous. So what we're downloading is just the top half of the OS—the system image. And if the system image is older, it might expect an older version of the HAL to be provided and that may not be there.
What's the deal with these developer options?
A good spot to check in any new build of Android are the developer options, a hidden Android menu that is just a huge list of checkboxes for various features. Some of these are for app developers and are clearly explained and documented, but others are for Google's Android engineers and are in-development Android features. A lot of times, the just-for-Google settings are labeled as cryptically as possible and don't have any public documentation at all. A prime example from Android 11 is this "Enable Gabeldorsche" checkbox, which has something to do with a new Bluetooth stack? Huh?
Ars: The most esoteric question I have on my list is, "What is Gabeldorsche?"
Burke: That's another example of something in development that's being tested out, but this one's more thought through. This is basically a future direction for our Bluetooth stack and really it's an initiative to re-write the Bluetooth stack piece-by-piece. The goal is the usual good stuff like security and reliability and interoperability, but the most interesting thing to me is the ability to have much better automated end-to-end testing.
You know, the Bluetooth specification lends itself to a gazillion device quirks. Every device behaves slightly differently and it's very difficult to have a very reliable Bluetooth stack. You've got to accommodate all sorts of strange interpretations of the spec and then your interoperable testing burden and cost is really high as a result. Of course, if we could all go back in time, we, the industry, would have written a much tighter specification with less complexity, but without that, what we're trying to do is improve automated testing.
It was a developer option just so we could turn it on during the beta and developer previews and test it out, but it's not ready for Android R, so that developer option will disappear for R. Likely in S in Android 12, you'll see parts of Gabeldorsche start appearing in the stack that hopefully will be completely transparent to users. Bluetooth will just get better.
Ars: Great, OK, one more mysterious checkbox! There is an "enhanced connectivity" thing in developer options.
Burke: Yeah, that's another thing folks are working on and not quite finished or fully baked. This will actually probably launch in a future update like a maintenance release or Pixel feature drop. This is basically two things: the classic "parking lot" problem where you're using your phone on Wi-Fi in the parking lot and you go out of range. You switch to cellular, and we're hoping to make that more seamless and vice-versa. And then also, we're figuring out when to use 5G in a battery-efficient way. So we've been working on a bunch of ideas using machine learning, and they actually look really promising. It wasn't quite ready for Android 11, and so we hope to bring that out in an update sooner rather than later, so that developed option will disappear as well on the final version.
That's all my questions! Thanks to Dave and Iliyan for taking the time to talk. Hopefully next year it's in person again!
"Android" - Google News
September 20, 2020 at 07:30PM
https://ift.tt/2FJ5ZYQ
The Android 11 interview: Googlers answer our burning questions - Ars Technica
"Android" - Google News
https://ift.tt/336ZsND
https://ift.tt/2KSW0PQ
Bagikan Berita Ini
0 Response to "The Android 11 interview: Googlers answer our burning questions - Ars Technica"
Post a Comment