Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discoverable Partitions Specification: Provide per-filesystem-type GUIDs #132

Open
DemiMarie opened this issue Dec 7, 2024 · 19 comments
Open

Comments

@DemiMarie
Copy link

The Discoverable Partitions Specification provides specific GUIDs for each role a filesystem plays and for the architecture of the filesystem. However, it does not specify the filesystem type. This means that a tool must probe the filesystem images to determine which filesystem is present, instead of being able to just call the appropriate mount tool directly. Not only does this make it harder to write tools that dissect disk images, it also leads to potential ambiguity. Different filesystems have superblocks at different locations, so the same image might be valid for more than one filesystem. This is extremly unlikely to happen by accident, but it is not impossible, and it is conceivable that an attacker could somehow be able (through ordinary filesystem operations and unprivileged ioctls) to cause a filesystem image of one type to become at least probe-able as another.

@bluca
Copy link
Member

bluca commented Dec 7, 2024

This information is already in the super block, there's no need to duplicate it

@DemiMarie
Copy link
Author

The problem is that different filesystems have their superblocks at different places. What prevents the same image from having a BTRFS superblock, an XFS superblock, and an ext4 superblock?

@DemiMarie
Copy link
Author

If a filesystem never writes to data before its first superblock, a rule like “first superblock wins” would be an option.

@bluca
Copy link
Member

bluca commented Dec 7, 2024

That's not a valid image, so why would it matter? Just don't build broken images, such errors have nothing to do with the specification

@DemiMarie
Copy link
Author

What about mutable partitions like /home and /var? IIUC ext4 doesn’t guarantee that it will never write a valid BTRFS superblock to offset 0x10000, and if one creates a huge file full of ext4 superblocks then there is even a nonzero chance of this happening unless something (that I am unaware of) guarantees otherwise.

@DemiMarie
Copy link
Author

@tytso am I correct about ext4?

@bluca
Copy link
Member

bluca commented Dec 7, 2024

Again, all of that is out of scope. The purpose of this spec is to establish common ways to identify the purpose of a GPT partition, as in, its mountpoint, essentially. What consitutes a broken filesystem, how to detect that, etc etc, is all out of scope, implementation-specific details and belongs somewhere else.

@poettering
Copy link
Collaborator

poettering commented Dec 9, 2024

The gpt-auto logic already has a policy concept (see systemd.image-policy(7) man page). It won't allow you to restrict the set of file systems we'll accept, but it does allow you to restrict the encryption/authentication requirements. And that's what you have to anyway for a properly secure system: because kernel file systems are not really validated against rogue fs images you have to have some form of block-level authentication logic in place, and if you have that the ambiguity issue goes away.

Also note that the gpt-auto logic is very restrictive, it has a short allowlist of file systems it looks for (which are ext4, btrfs, xfs, erofs, f2fs, squashfs, vfat) which should be reasonable well mantained. It won't remove the ambiguity, but it does lock down the attack surface.

There's a TODO list item somewhere to extend the policy language to make further restrictions on fs type when probing. happy to take a patch for that.

@cgwalters
Copy link

Again, all of that is out of scope.

One perspective on this is to contrast with traditional fstab, which while it does support auto for probing the fstype I think widespread usage is that installer and management tools specify the filesystem type. My impression of usage of .mount units is similar. So the possible ambiguity here is "introduced" by the DPS.

Just don't build broken images, such errors have nothing to do with the specification

It's not just about pre-generated images but whatever we happen to find on disk for mutable data partitions, right?

I guess the question though really is: Is ambiguity possible today accidentally? It doesn't seem impossible, but before we become concerned about it some investigation is probably needed.

@DemiMarie
Copy link
Author

Again, all of that is out of scope.

One perspective on this is to contrast with traditional fstab, which while it does support auto for probing the fstype I think widespread usage is that installer and management tools specify the filesystem type. My impression of usage of .mount units is similar. So the possible ambiguity here is "introduced" by the DPS.

Exactly!

Just don't build broken images, such errors have nothing to do with the specification

It's not just about pre-generated images but whatever we happen to find on disk for mutable data partitions, right?

Indeed so!

I guess the question though really is: Is ambiguity possible today accidentally? It doesn't seem impossible, but before we become concerned about it some investigation is probably needed.

I’m concerned about both accidental ambiguity (util-linux/util-linux#1305, caused by a previous ZFS filesystem that hadn’t been overwritten) and intentional ambiguity introduced by a malicious, unprivileged user who has full read-write access to some directory on the filesystem. An unprivileged user can’t directly access the disk, but they can exert substantial influence over its contents, especially if the filesystem doesn’t do encryption. For instance, if I write a 900GiB file to a 1TiB ext4 filesystem that repeats the same sector over and over, most sectors on the disk will have that value. I expect that a more clever attacker can use knowledge of the allocation algorithms to exert even more control. Attackers have a lot of experience doing exactly this with memory allocators.

My understanding is that libblkid will refuse to pick a filesystem if there is ambiguity. In this case, a fairly nasty persistent DoS could result, which might not be recoverable without data loss.

There are a few countermeasures I can think of:

  1. Create new GUIDs, one for each filesystem type. This preserves attribute bits, but creates a lot of GUIDs.
  2. Reserve a few of the bits reserved for Partition Type GUID owners to indicate the filesystem type.
  3. Require mkfs tools, filesystem implementations, and libblkid to follow a defined protocol, such as the following, that forbids ambiguity:
    1. All data before the superblock must be zeroed by mkfs, and kept as zero by the filesystem driver.
    2. When probing a disk image to detect a filesystem, libblkid must use the first valid superblock that it finds, skipping the others.

Personally, I think that this specification should implement either option 1 or option 2, and that other tools should also implement option 3.

@poettering
Copy link
Collaborator

option 3 is not so easy. modern file systems maintain additional copies of the superblock at various offsets, and the first one is not necessarily on sector 0. btrfs does that for example, it puts it a few MiB inside the disk, and then adds copies in logarithmically growing distances. if you'd declare that these file systems should never consider the other superblocks then they kinda lose the reason they exist in the first place...

i am not convinced that placing any info the the gpt partition metadata would be wise, because that's not protected cryptographically. you'd make things worse by trying to make them better.

dm-verity/dm-crypt/dm-integrity is the root of trust on disk for us, and that only covers partition contents, not partition metadata.

I guess what you could do is define a "secure envelope" partition type or something like that, that would take the first and last sector of a dm-verity/dm-crypt/dm-integrity partition (i.e. inside of it, covered by the protections) and would carry the info you are looking for. but that would probably be a hard sell, since you'd then have to use dm-linear or so on top of the dm-verity/dm-crypt/dm-integrity to chop off the beginning and the end sector again before you can mount things.

I wonder if this is actually really a problem. i.e. can you actually create an fs image that both qualifies as valid ext4 and valid btrfs or so? And moreover, could an unpriv user actually create that just by writing files to the fs?

@DemiMarie
Copy link
Author

i am not convinced that placing any info the the gpt partition metadata would be wise, because that's not protected cryptographically. you'd make things worse by trying to make them better.

For encrypted mutable partitions, I think it would be best to store the filesystem type in the LUKS metadata. This is about unencrypted partitions only, which are common in e.g. virtualized workloads where encryption is done by the host.

@DemiMarie
Copy link
Author

I wonder if this is actually really a problem. i.e. can you actually create an fs image that both qualifies as valid ext4 and valid btrfs or so? And moreover, could an unpriv user actually create that just by writing files to the fs?

One can at least create an image that is sufficiently close to this to fool blkid.

@bluca
Copy link
Member

bluca commented Dec 10, 2024

Perhaps, but that's blkid's problem to deal with, and the kernel's to refuse such images. If one wants to create malicious images, they can create malicious GPT GUIDs too. This is wildly out of scope, and there's no real use case.

@DemiMarie
Copy link
Author

Perhaps, but that's blkid's problem to deal with, and the kernel's to refuse such images. If one wants to create malicious images, they can create malicious GPT GUIDs too. This is wildly out of scope, and there's no real use case.

To the best of my knowledge, ext4 and XFS have no idea where the other puts their superblocks, much less where e.g. ZFS puts its superblocks. If I write something that looks like a ZFS superblock to a file on my ext4 filesystem, the only thing I know of preventing it from being placed where it looks like an actual ZFS superblock is chance.

Ideally, all filesystems would use the same locations for their superblocks to prevent collisions like this, but they do not.

@DemiMarie
Copy link
Author

option 3 is not so easy. modern file systems maintain additional copies of the superblock at various offsets, and the first one is not necessarily on sector 0. btrfs does that for example, it puts it a few MiB inside the disk, and then adds copies in logarithmically growing distances. if you'd declare that these file systems should never consider the other superblocks then they kinda lose the reason they exist in the first place...

Any corruption that could clobber the first superblock could just as easily clobber the partition tables, other filesystem metadata, or both. The other superblocks are indeed useful for data recovery scenarios, but filesystem probing I don’t think it is safe or necessary to use them. If additional fault tolerance is needed, storing the data in the partition table (which has a backup copy) or the LUKS header (without which the device is undecryptable and unrecoverable) are other options.

@bluca
Copy link
Member

bluca commented Dec 10, 2024

Sure, but again: completely irrelevant for anything happening here. The kernel has a filesystem development mailing list, I'd suggest to raise these issues there, and filesystem developers should be able to help clarify those concerns: [email protected]

@poettering
Copy link
Collaborator

For encrypted mutable partitions, I think it would be best to store the filesystem type in the LUKS metadata. This is about unencrypted partitions only, which are common in e.g. virtualized workloads where encryption is done by the host.

last time i looked luks metadata is not integrity protected in any way. hence about as trustable as the gpt partition table.

tbh, I find the problem not particularly interesting if the system doesn't do encryption or integrity protection. If you don't do that then most security guarantees are gone anyway, at least in my view of the world.

@DemiMarie
Copy link
Author

For encrypted mutable partitions, I think it would be best to store the filesystem type in the LUKS metadata. This is about unencrypted partitions only, which are common in e.g. virtualized workloads where encryption is done by the host.

last time i looked luks metadata is not integrity protected in any way. hence about as trustable as the gpt partition table.

tbh, I find the problem not particularly interesting if the system doesn't do encryption or integrity protection. If you don't do that then most security guarantees are gone anyway, at least in my view of the world.

What about the virtualized case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants