-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Package manager lockfiles #70
Comments
Having a manifest/lockfile as an output is great for having some idea of what the image contains indeed, but I am not so sure it's possible to standardize. If it's just an output file you compare, having a list of components/sources/patches is not enough as you need a lot more things for reproducibility, like what compiler flags were used, configure arguments and prefixes, what the environment of the build process was and so on. And that's only about having an output file.
If you want to also make the lockfile the input, then it would mean that any given system using it would have identical input and producing identical output but doing it in its own way, at which point there would kinda be no point at all tbh. You would basically end up reimplementing the exact same buildorchestation/package-manager system in different ways, for clear to no clear benefit. What advantage would that get you? (And the output could be reproducible anyway with a single instance) Like it would be basically:
Which would raise the question of why do we have (actually implement from scratch) N number of systems with identical input and output and what's the point of repackaging things afterwards since they are identical anyway? If you know the code sources (git repos patches), the orchestration system definitions used (.spec files, debian/ w/e), and the version (or have the sources/binaries) of your build toolchain (rpm*,dpkg*) that's enough** reproduce a build. What extra advantages would there be by having |
Let me rephrase what I want: The lockfile consists a set of allowed packages. Let's say a set of rpm files. So given To make this more concrete, let's split up the different phases a package manager performs:
In those phases, I want to ensure that resolved packages are also checked against the allowed set of packages in the lockfile.
|
The problem here is that official repositories generally only include the latest few versions of packages. So anything using a lock file and the official repositories would eventually stop building as the requested versions would not be available anymore. Why not keep around mirror snapshots and use those instead of the official repositories? |
I think there are many ways to preserve and access old packages, including keeping your own snapshot mirrors, using the ones provided by Debian, Arch, Redhat (RHEL, Fedora), vendoring and providing packages locally or using a form of content addressable storage to get all packages listed in a lockfile. Either the install succeeds and the selected packages are chosen deterministically or the install fails. This can be very useful for correct caching / cache invalidation, supply chain security and reproducibility. What I want to get at is that we should decouple the source of the packages from the benefits a lockfile can provide. |
Many of the projects around image based linux would benefit from having standardized package manager dependency lockfiles.
I just created a proposal for the rpm / dnf ecosystem here: rpm-software-management/dnf5#833
Benefits
Incremental builds
OS image builders like mkosi could read a lockfile as an input to decide if a (layer of an) image needs to be rebuilt.
This makes incremental builds possible and would work really well to generate systemd sysext and similar formats.
Reproducible builds
The dependency lockfile would be an input to the image build. This allows tools like mkosi to always use the same set of pinned packages (rpms, debs, ...) instead of using the latest packages available via package repositories.
If you want to perform reproducible OS image builds based on traditional package managers, having a lockfile or manifest is basically a requirement.
Bootstrapping a healthy dependency management ecosystem
As soon as you start pinning package manager packages using a lockfile, you are responsible to update the locked dependencies if a vulnerability is found.
A lot of tooling and support is required for this to work well in practice. If we set standards for package manager lockfiles, this allows the whole ecosystem to build tools on top of that.
Supply chain security
This is basically a result of the other points: if you build image based linux distributions based on existing package manager systems, you'll want to know exactly what packages go into an image.
Having lockfiles makes this process a lot simpler.
Possible implementations
This section is vague intentionally and should only give you a rough idea.
I think the basic options are:
My feeling is that the second option is easier to implement in practice.
I'd be happy to receive feedback. Is this something the UAPI group is interested in tackling / standardizing?
The text was updated successfully, but these errors were encountered: