dps: Add concept of "partition ownership" #117

AdrianVovk · 2024-09-06T00:47:07Z

What and Why

Basically: the DPS had no way to support multi-boot without relying on /etc/fstab. But there's a whole class of distros now (the "Particle OSs") that do only OEM installations, create their rootfs on first boot, and thust don't even have a rootfs to put an /etc/fstab onto. Ultimately these distributions are completely hermetic and all state can be wiped out by the user at any point in time - but in this case /etc/fstab wouldn't persist and thus all information about the distribution's own partition layout would be lost. Thus, the old approach was unusable.

The DPS is about making disk images self-descriptive, and multi-booting disk images should be able to describe themselves as such too.

Backwards Compat

Technically this isn't fully backwards compatible. In practice, however, I suspect that change won't actually affect anyone. Here's my reasoning:

If a system is being single-booted, the backwards compatibility I built into this proposal will ensure that this keeps working
If a system is being multi-booted, it wasn't relying completely on the DPS anyways - and /etc/fstab and root= must be involved anyways. These override the DPS

The situation where this causes issues would have to be quite strange. There would need to be a distro in a multi-boot scenario, that was installed first and chose to not use /etc/fstab, while a second distro was installed and happened to name one of its partitions with the right prefix to trick the first distro.

Limitations & Alternative approaches

The big obvious one is that you can't dual-boot two copies of the same OS using this - you'd have to rely on /etc/fstab in that case. That's not a situation other automatically-partitioning OSs seem to care about (i.e. Windows) so frankly I'm not all that interested in supporting that case.

I considered using the partition UUID instead of label (replecating the HMAC mechanism of the /var/ partition, but using the distro's name instead of machine-id) but ran into two problems. First, that would conflict with the whole magic dm-verity roothash getting split across the UUIDs of two partitions mechanism. Second, it wouldn't work anyways since the UUIDs must be generic and it's completely valid to have multiple /usr partitions per one OS.

It's possible to use some different separator character to make this a bit more backwards compatible. Right now, it's quite easily possible to run into a partition named distro_whatever, which might end up getting caught by this mechanism. I decided that this is actually intended behavior in spirit of the DPS. Why not - it's already extremely likely to be mounting that partition anyways (as described above), but now it'll keep doing that even if /etc/fstab is wiped out somehow.

Something else I considered is having a machine-local "installation ID". Basically we'd set aside 4 bits of GPT partition flags, which would give 16 installation IDs to use, which means you could multi-boot up to 16 different OSs. In practice, however, it's unclear to me where these installation IDs would have to be stored and sourced. Unlike a distro's ID=, or just a random UUID, these could not be derived from anything and would instead have to be sequential per-installation. If I boot OS number 2, how would it know that it's OS number 2; how and where would that be stored? Additionally, how would a fooOS installer know if OS number 2 is an existing copy of fooOS to reinstall/repair, or if it's actually a copy of barOS to avoid touching? Ultimately I decided against this approach.

Allows distributions to claim ownership over partitions. This allows for multi-boot between distributions that cannot use /etc/fstab and rely only on the DPS (i.e. GNOME OS, mkosi-produced images, etc). Previously to this, the DPS had no way to support multi-boot without relying on `/etc/fstab`. It's the spirit of the DPS to make disk images self-descriptive about what they contain. Being self-descriptive about multi-boot is a natural extension of that.

AdrianVovk · 2024-09-06T00:47:28Z

Things left to do: an actual implementation of this! PRs will be incoming

gpt-auto-generator, implementing the new ownership scheme when picking what to mount
repart, letting distros opt-into taking ownership of the partitions they create (which would also change repart's maching algorithm to consider ownership too)
systemd-dissect, extracting ownership info to list the OS a given partition belongs to
sysupdate: we can't rely on the name _empty anymore, because we want to preserve any ownership over the partition.
- No reason not to accept a partition labelled with <distro>_empty instead
- Maybe add a setting to auto-generate the partition label, so ownership can be established
- Warn about giving up partition ownership

Are there other places I'm missing? Please let me know so I can add them to the list

septatrix · 2024-09-24T16:53:14Z

Definitely would like some way to use a /usr-only + /var partition setup which is currently pretty much impossible due to the requirement that the machine-id would need to be persisted somewhere. As I do not use multi-boot I would also be fine with simply dropping the machine-id matching but this scheme also seems reasonable and should work for me.

However, I do have an idea which could improve this, make this more powerful and at the same time allow better backwards compatibility: Add two new GPT flags, one signaling that the machine-id match logic must be applied and the other one to signal that the label logic must be applied. These could be combined (or both disabled). What do you think?

Another simpler solution to the slight backwards incompatibility problem of this proposal would be to introduce new partition types with new GUIDs which implement the new logic and have the old partitions keep the old logic. Adding this for /var would be simple but adding this also for root and /usr might be a bit too much.

septatrix · 2024-10-06T14:17:03Z

Another options would be to create an fstab.extra credential with the corresponding fstab and put it inside the EFI partition. The disadvantage of that is that it must be updated in lockstep with the kernel binary: ESP/loader/credentials/*.cred cannot be used as that would apply to all OSes and ESP/.../foo.efi.extra.d/*.cred must be used - the "foo" in that, however, is the full name of the kernel binary excluding the boot counter, so version is included in that...

NekkoDroid · 2024-10-06T21:30:16Z

So... I've been working on my own /usr/ only image and I also have been thinking how to support multiple images simultaniously as well. To mind came 2 different options:

The root partition uses some filesystem that can mount named/identified subvolumes (e.g. by IMAGE_ID or so) as root of the system. This has the benefit/drawback (depends how you wanna see it) that all the files of the root partition would be separated between different IMAGE_IDs. Or
Have a /etc/machine-id.<IMAGE_ID> for each corresponding /usr/ partition image. These then get bind-mounted over /etc/machine-id depending on which image was booted. This on the contrary to 1 shares all files between different IMAGE_IDs.

Both of these would mostly still keep the same discovery logic used currently for the DPS, just an additional step is added for when the root partition is added. Option 1 could also be expanded more generically to all partitions that have such a filesystem.

These wouldn't cover the case of the _empty partitions, but there I wouldn't see why your suggested <IMAGE_ID>_empty shouldn't work.

AdrianVovk · 2024-10-07T19:54:35Z

fstab.extra

This doesn't work with factory reset too: what happens of partitions are destroyed/recreated with new UUIDs and all that. Even with fstab, I still can't set root=UUID=<...> on the kernel command-line.

Add two new GPT flags

Why two? What should the behavior be if neither flag is set? Mount the first found partition I guess?

We could use up a GPT flag for this, though I wonder if that's actually necessary. If anything, maybe we could just have a kernel command-line switch to opt into this behavior 🤷. Or only opt in if IMAGE_ID is set (which isn't set by anyone except for the image-based distros that would actually use this).

AdrianVovk · 2024-10-07T20:00:12Z

The root partition uses some filesystem that can mount named/identified subvolumes (e.g. by IMAGE_ID or so) as root of the system

So then root= is shared between all the installations?

I suppose that could be a usecase for sticking to the current discovery logic. You can already kinda implement this, via rootflags= on the kernel command line

Still, in theory there should be one root partition per installation of the OS (Ditto with /var, unless /var is part of root). Solving it at the filesystem level could be a solution, but I shudder at the idea of multiple installations of different distros sharing one filesystem. Could work in theory, but AFAIK has yet to work in practice

septatrix · 2024-10-07T20:21:05Z

Why two? What should the behavior be if neither flag is set? Mount the first found partition I guess?

Yeah exactly. This is mostly due to my personal aversion to ugly legacy behaviour to keep background compatibility. Like in this case where the only scenario where it matters is if people have two var partition. This means that they must have two root partitions with different machine-ids. As there is currently no way a distro knows which of them to use unless they are immutable it means that they must have used two immutable images with a machine-id embedded inside them. Which is likely very very rare so I think it would be fair to just drop the machine-id matching for /var altogether (or make it an optional/deprecated part of the spec, and drop support for it in gpt-auto-gen in a future systemd version)

AdrianVovk · 2024-10-07T20:26:25Z

You're missing the context that the DPS came before UKIs and all of the modern immutability stuff. /var matches on machine-id because it's supposed to be derived from the rootfs, and the rootfs was supposed to be specified on the kernel commandline. The DPS even spells this out: installers should still specify root=UUID=<uuid> on the kernel command line. Installers are required to do this if their rootfs isn't the first one on disk.

UKIs, secure boot, vendor-provided initrds, and immutable kernel command lines is what broke this scheme once they showed up later (and in many ways are still showing up now).

septatrix · 2024-10-07T20:42:32Z

You're missing the context that the DPS came before UKIs and all of the modern immutability stuff.

Yup that's it. I thought it came exactly because of them.

cgwalters · 2024-10-07T20:51:51Z

Something else I considered is having a machine-local "installation ID". Basically we'd set aside 4 bits of GPT partition flags, which would give 16 installation IDs to use, which means you could multi-boot up to 16 different OSs. In practice, however, it's unclear to me where these installation IDs would have to be stored and sourced

In ostree, because it's just a way to store roots inside a given filesystem, it was designed from the start with this use case in mind - it's what the "stateroot" concept is about; each stateroot has its own sub-root with its own /etc and /var.

I think this "installation ID" is pretty similar; the ostree stateroot concept has similar questions around "how do you pick them". As of recently we started encouraging using default - but it can be any arbitrary string.

I'd be interested in generalizing this and cleaning it up...but it does seem like a whole other level from the partition-based world. Perhaps as we slowly replace the guts of ostree with composefs bits and perhaps try to have some higher level logic in composefs around booting, it could make sense to try to standardize something

but I shudder at the idea of multiple installations of different distros sharing one filesystem. Could work in theory, but AFAIK has yet to work in practice

Agreed.

One big detail related to the /usr merge is handling compat symlinks like /lib that I think in the general case are different per OS/distro.

septatrix · 2024-10-07T21:53:53Z

specs/discoverable_partitions_specification.md

+If two Linux-based operating systems are installed on the same disk, the scheme
+above suggests that they may share the swap, `/home/`, `/srv/`, `/var/tmp/`,
+ESP, XBOOTLDR. However, they should each have their own root, `/usr/` and
+`/var/` partition.


This is phrased unclear regarding whether or not swap/home/srv/var-tmp are actually allowed to be shared or not, and whether partition ownership applies to them.

This was referenced Oct 6, 2024

Allow relaxing machine-id matching for /var partitions #121

Open

mkosi: machine ID in initrd and host are different?? systemd/systemd#32908

Closed

gpt-auto-generator: add kernel command line option to relax /var partition UUID check systemd/systemd#25156

Open

septatrix reviewed Oct 7, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dps: Add concept of "partition ownership" #117

dps: Add concept of "partition ownership" #117

AdrianVovk commented Sep 6, 2024

AdrianVovk commented Sep 6, 2024 •

edited

Loading

septatrix commented Sep 24, 2024

septatrix commented Oct 6, 2024

NekkoDroid commented Oct 6, 2024

AdrianVovk commented Oct 7, 2024

AdrianVovk commented Oct 7, 2024 •

edited

Loading

septatrix commented Oct 7, 2024

AdrianVovk commented Oct 7, 2024

septatrix commented Oct 7, 2024

cgwalters commented Oct 7, 2024

septatrix Oct 7, 2024

dps: Add concept of "partition ownership" #117

Are you sure you want to change the base?

dps: Add concept of "partition ownership" #117

Conversation

AdrianVovk commented Sep 6, 2024

What and Why

Backwards Compat

Limitations & Alternative approaches

AdrianVovk commented Sep 6, 2024 • edited Loading

septatrix commented Sep 24, 2024

septatrix commented Oct 6, 2024

NekkoDroid commented Oct 6, 2024

AdrianVovk commented Oct 7, 2024

AdrianVovk commented Oct 7, 2024 • edited Loading

septatrix commented Oct 7, 2024

AdrianVovk commented Oct 7, 2024

septatrix commented Oct 7, 2024

cgwalters commented Oct 7, 2024

septatrix Oct 7, 2024

Choose a reason for hiding this comment

AdrianVovk commented Sep 6, 2024 •

edited

Loading

AdrianVovk commented Oct 7, 2024 •

edited

Loading