-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend LXCFS integration #1072
Extend LXCFS integration #1072
Conversation
Currently RunExecutor creates a cgroup instance and uses that. But sometimes the underlying executor needs to create a nested cgroup and put the tool into that. Now we pass it back to RunExecutor such that it can make use of it for the measurements. The effect is that we can use different cgroups for limits and for measurements (and killing processes). But in this commit we still pass only the original cgroups instance back, so there is no behavior change.
We already do that for cgroups v2, such that the cgroup of the benchmarked tool is a child cgroup of the cgroup where we configure the limits. For cgroups v1 it is not necessary to do this so far, but it might be good for consistency and it is required for better integration of LXCFS.
Building on the last commit, we now change the behavior and pass back the cgroup where the actual tool is in, instead of the parent cgroup (at least on cgroups v2, no change for cgroups v1). This should still not result in any visible changes, because measurements and killing processes should be the same for the parent cgroup and the tool cgroup - there is nothing else in the parent cgroup.
We recommend to install LXCFS together with BenchExec, because we use that to virtualize for example /proc/uptime in the container. However, a main use case of LXCFS is to virtualize files that contain information about the system such as the available CPU cores in /proc/cpuinfo. We never advertised this, but I assumed this was working all the time. I found out that it never worked, though. The reason is that LXCFS is using the limits configured for the init process of the container, but our init process has no limits, it is not part of the same cgroup as the other processes in the container (on purpose, because we do not want to measure its resource consumption). So now we create yet another cgroup for the init process that is below the one with the limits but outside of the one that is used for measurements. Note: A single runexec execution will now create up to 5 cgroups. This is made possible due to the separation between the cgroups for limits and for measurements in the last commits. With this change, /proc/cpuinfo now shows only the cores available in the container if LXCFS is running. This helps processes in the container to see how many CPU cores they are allowed to use and for example to decide how many threads to spawn. Fixes #1070
Like for /proc, LXCFS provides a virtualized /sys/devices/system/cpu that only shows the allowed cores. Of course we want to mount that in the container as well, at least if the user has not requested /sys to be hidden or have full access to the host directory. Fixes #1069
@schroeding A code review and testing in as many scenarios as possible would be good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, test code also works with all the different grep-implementations I could find on Debian & NixOS.
As a side note, the output of /proc/cpuinfo
is inconsistent to other related outputs in /sys/devices/system/cpu/cpu*/
- we do everything correctly, mounting the correct directory from /var/lib/lxcfs
, but the output of lxcfs in the /sys/devices/system/cpu/cpu*/
directories is itself inconsistent (at least on my AMD powered test system) with /proc/cpuinfo
and the information directly in /sys/devices/system/cpu/*
.
Programs which parse /sys/devices/system/cpu/cpu*
in detail (e.g. cpu-info, some versions of htop) thus are still confused for now, but I don't see anything we can do about it, this has to be fixed by lxcfs (see e.g. lxc/lxcfs#627).
Thanks, also for explaining the LXCFS problem. I think the problem does not look bad enough that we need a workaround or so. |
We already use LXCFS to provide better isolation of the containers, and virtualize
/proc/uptime
. But LXCFS can do more and for example also provide a virtualized view on the CPU-core information provided by the kernel, such that applications see only cores that they are allowed to use. This was incompletely implemented and not working so far.