Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove mask on /sys/dev/block #2277

Open
grooverdan opened this issue Jan 5, 2022 · 26 comments · May be fixed by containers/podman#13708 or #2278
Open

remove mask on /sys/dev/block #2277

grooverdan opened this issue Jan 5, 2022 · 26 comments · May be fixed by containers/podman#13708 or #2278

Comments

@grooverdan
Copy link

The masking of /sys/dev/block was added in 0334b6195820f7261f87a4f4e5d739a6d560f4b2 which constrained the previous /sys/dev masking.

The contents of this filesystem (at least on 5.15.7-200.fc35), is as follows, a bunch of symlinks.

$ ls -la /sys/dev/block
total 0
drwxr-xr-x. 2 root root 0 Dec 26 17:36 .
drwxr-xr-x. 4 root root 0 Dec 26 17:36 ..
lrwxrwxrwx. 1 root root 0 Dec 26 17:37 252:0 -> ../../devices/virtual/block/zram0
lrwxrwxrwx. 1 root root 0 Dec 29 19:03 253:0 -> ../../devices/virtual/block/dm-0
lrwxrwxrwx. 1 root root 0 Dec 29 19:03 253:1 -> ../../devices/virtual/block/dm-1
lrwxrwxrwx. 1 root root 0 Dec 29 19:03 253:2 -> ../../devices/virtual/block/dm-2
lrwxrwxrwx. 1 root root 0 Dec 29 19:03 253:3 -> ../../devices/virtual/block/dm-3
lrwxrwxrwx. 1 root root 0 Dec 29 19:03 259:0 -> ../../devices/pci0000:00/0000:00:1d.4/0000:07:00.0/nvme/nvme0/nvme0n1
lrwxrwxrwx. 1 root root 0 Dec 29 19:03 259:1 -> ../../devices/pci0000:00/0000:00:1d.4/0000:07:00.0/nvme/nvme0/nvme0n1/nvme0n1p1
lrwxrwxrwx. 1 root root 0 Dec 29 19:03 259:2 -> ../../devices/pci0000:00/0000:00:1d.4/0000:07:00.0/nvme/nvme0/nvme0n1/nvme0n1p2
lrwxrwxrwx. 1 root root 0 Dec 29 19:03 259:3 -> ../../devices/pci0000:00/0000:00:1d.4/0000:07:00.0/nvme/nvme0/nvme0n1/nvme0n1p3
lrwxrwxrwx. 1 root root 0 Dec 26 17:37 7:0 -> ../../devices/virtual/block/loop0
lrwxrwxrwx. 1 root root 0 Dec 26 17:37 7:1 -> ../../devices/virtual/block/loop1
lrwxrwxrwx. 1 root root 0 Dec 26 17:37 7:2 -> ../../devices/virtual/block/loop2
lrwxrwxrwx. 1 root root 0 Dec 26 17:37 7:3 -> ../../devices/virtual/block/loop3
lrwxrwxrwx. 1 root root 0 Dec 26 17:37 7:4 -> ../../devices/virtual/block/loop4
lrwxrwxrwx. 1 root root 0 Dec 26 17:37 7:5 -> ../../devices/virtual/block/loop5
lrwxrwxrwx. 1 root root 0 Dec 26 17:37 7:6 -> ../../devices/virtual/block/loop6

The locations where these symlinks point to are still accessible in the container:

$ podman run  --rm -ti bash
bash-5.1# uname -a
Linux ef43af629ac1 5.15.7-200.fc35.x86_64 containers/podman#1 SMP Wed Dec 8 19:00:47 UTC 2021 x86_64 Linux

bash-5.1#  ls -lad /sys/devices/pci0000\:00/0000\:00\:1d.4/0000\:07\:00.0/nvme/nvme0/nvme0n1/ /sys/devices/virtual/block/
drwxr-xr-x   12 nobody   nobody           0 Jan  5 03:48 /sys/devices/pci0000:00/0000:00:1d.4/0000:07:00.0/nvme/nvme0/nvme0n1/
drwxr-xr-x   14 nobody   nobody           0 Jan  5 03:39 /sys/devices/virtual/block/

The reason this information is useful is that MariaDB uses the major/minor device numbers and follows this path to find the physical size of the blocks used for O_DIRECT calls - https://github.com/MariaDB/server/blob/385842e15bbd51ad6cad9cf3bfb69d93d0c36921/storage/innobase/os/os0file.cc#L1319-L1325.

As a feature request:

  • Don't mask /sys/dev/block - it isn't hiding anything and is useful
  • Before you mask /sys/devices, please only block the ones that aren't volume mounted within the container.

Another discussion found: https://bugzilla.redhat.com/show_bug.cgi?id=1884283
Note: the manual of podman-run doesn't list this masking.

@grooverdan
Copy link
Author

It doesn't look like moby mask this; https://github.com/moby/moby/blob/master/oci/defaults.go#L90-L101

@rhatdan
Copy link
Member

rhatdan commented Jan 5, 2022

I think this came from containers/podman#7801

@rhatdan
Copy link
Member

rhatdan commented Jan 5, 2022

You can also do

podman run --security-opt unmask=/sys/dev/block ...

@rhatdan
Copy link
Member

rhatdan commented Jan 5, 2022

https://bugzilla.redhat.com/show_bug.cgi?id=1772993
Triggered the whole thing, but I am not sure if this is public.

@grooverdan
Copy link
Author

https://bugzilla.redhat.com/show_bug.cgi?id=1772993 Triggered the whole thing, but I am not sure if this is public.

That's public, the ones it links to aren't, but it seem its sufficiently quoted to follow the story.

So its masked to stop lsblk showing the information? Even though /sys/devices is still there.

Option 3a: With /sys/dev/block masked as a tmpfs, readd symlinks for the major/minor device ids in the container.
with/without extension 3b, mask the symlink targets /sys/devices/.... of all the devices not added in 3a.

Yes, I saw the unmask option.

@github-actions
Copy link

github-actions bot commented Feb 5, 2022

A friendly reminder that this issue had no activity for 30 days.

@grooverdan
Copy link
Author

Is the above 3a option acceptable?

@rhatdan
Copy link
Member

rhatdan commented Feb 7, 2022

Makes sense to me, as long as we don't break the bugzilla fix.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Mar 10, 2022

@grooverdan Are you expecting us to implement something here? Are you going to open a PR or is the unmask functionality fix the issue.

@grooverdan
Copy link
Author

I was hoping you'd see the visibility of /sys/devices and only masking /sys/dev/block as the fix for rhbz#1772993 a cosmetic fix for lsblk only and not really a comprehensive masking of host storage devices.

In the event that this isn't a good enough reason for implementing this slightly over-engineered, but me friendly implementation, or I'm missing something about the ocs environment, I think I'll get back to this eventually.

In the mean time I'm just falling back to assuming a 512 sector size even though the /sys/dev/block/{maj:minor}/queue/physical_block_size would give the exact number. unmasking would resolve the precision here, but I don't want to clutter existing documentation to say use it always.

@rhatdan
Copy link
Member

rhatdan commented Mar 23, 2022

So you want the links of the block devices that are present in the container. How would I figure this out?

@grooverdan
Copy link
Author

Iterate the mount points inside the container taking into account; default mounts, volumes, and --device arguments passed

  • stat on those locations
  • use the returned st_dev broken out to the major/minor device numbers as the symlinks in /sys/dev/block to keep.
  • readlink on those.

tmpfs mask /sys/dev/block

  • recreate the symlinks.

Optionally to enhance isolation:

  • mask /sys/devices
  • selectively mount (ro?) the resolved symlinks in /sys/devices
  • mount (ro?) a whitelist of safe things /sys/devices/{cpu,power,software,tracepoint?,uprobe} (estimated, I don't have a good grasp of the function of these).

@rhatdan
Copy link
Member

rhatdan commented Mar 24, 2022

Seems like a hell of a lot of work, potentially very error prone, and I am not even sure it is the right thing to do. Might make more sense to expose the mount masks in containers.conf and just allow users to choose.

@grooverdan
Copy link
Author

If we break this into two tasks, the first and most important, re-populating the /sys/dev/block that caused a masking of Linux block device visibility discoverability away from the user-space (sysfs(5)). With this I don't we can break this any more than what the masking already did.

Once we have this right, we can think if additional masking of /sys/devices is prudent.

@rhatdan
Copy link
Member

rhatdan commented Mar 29, 2022

Can you do that without breaking the original problem we were striving to fix?

@grooverdan
Copy link
Author

I think so - containers/podman#13708 as first working POC.

I probably need to do a tmpfs mount rather than a volume. Feedback welcome.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented May 16, 2022

Lets move this forward. I tried out your patch and it works for me.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Jun 16, 2022

@grooverdan are you going to open a PR?

grooverdan referenced this issue in grooverdan/podman Jul 15, 2022
User space programs want to access information about the block
devices they are operating on. E.g. the block size is an important
aspect if doing O_DIRECT filesystem calls.

On the other hand, rhbz#1772993 wants to keep the host information
as hidden from the container running processes as possible.

We expose only the volumes and devices that are mounted into the
container by re-generating the symlinks in /sys/dev/block for the
block devices that have host based symlinks. These are generated on
ctr.state.RunDir/sysdevblock as a mountpoint and mounted ro into the
container.

The default visibility can changed by the user with
--security-opt={u,}mask=/sys/dev/block

Consolidate the libpod.mountBind implementation.

Closes #12746

Signed-off-by: Daniel Black <[email protected]>
grooverdan referenced this issue in grooverdan/podman Aug 9, 2022
User space programs want to access information about the block
devices they are operating on. E.g. the block size is an important
aspect if doing O_DIRECT filesystem calls.

On the other hand, rhbz#1772993 wants to keep the host information
as hidden from the container running processes as possible.

We expose only the volumes and devices that are mounted into the
container by re-generating the symlinks in /sys/dev/block for the
block devices that have host based symlinks. These are generated on
ctr.state.RunDir/sysdevblock as a mountpoint and mounted ro into the
container.

The default visibility can changed by the user with
--security-opt={u,}mask=/sys/dev/block

Consolidate the libpod.mountBind implementation.

Closes #12746

Signed-off-by: Daniel Black <[email protected]>
@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@grooverdan
Copy link
Author

its on containers/podman#13708

I am having difficulty with:

  • arch portability of mips/non-mips stat structures and golangci-lint complaining about unnecessary casting (as logical solution around differing sizes)
  • checkpoint restore on how to handle .../userdata/sysdevblock

@rhatdan
Copy link
Member

rhatdan commented Aug 24, 2022

Just tell lint to never mind

@giuseppe
Copy link
Member

coming back to this open issue, why wouldn't security-opt unmask be enough for this problem? Do you want it to work out of the box?

@grooverdan
Copy link
Author

As a general principle, if storage (extends to other resources too) is available to the container, it should be able to query enough information about these to use them efficiently.

While an unmask is a simple resolution, one users can apply, there a simplicity in making sure that information about the storage is always available to the container, without relying on user configuration of things they may not be familiar with or be equipped to assess the consequences of.

FYI: the original masking to hide host information is still flawed (v4.7.0): https://bugzilla.redhat.com/show_bug.cgi?id=1884283#c8

@giuseppe giuseppe transferred this issue from containers/podman Dec 17, 2024
giuseppe added a commit to giuseppe/common that referenced this issue Dec 17, 2024
there is no real advantage in masking this path, since this
information is already available under /sys/devices.  In fact, the
files under /sys/dev/block are just symlinks.

[root@953139a2851f /]# ls -l /sys/dev/block/252\:0
lrwxrwxrwx. 1 root root 0 Dec 17 12:47 /sys/dev/block/252:0 -> ../../devices/virtual/block/zram0

Fixes: containers#2277
Fixes: https://issues.redhat.com/browse/RHEL-3115

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe giuseppe linked a pull request Dec 17, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants