Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package manager lockfiles #70

Open
malt3 opened this issue Aug 23, 2023 · 4 comments
Open

Package manager lockfiles #70

malt3 opened this issue Aug 23, 2023 · 4 comments

Comments

@malt3
Copy link

malt3 commented Aug 23, 2023

Many of the projects around image based linux would benefit from having standardized package manager dependency lockfiles.
I just created a proposal for the rpm / dnf ecosystem here: rpm-software-management/dnf5#833

Benefits

Incremental builds

OS image builders like mkosi could read a lockfile as an input to decide if a (layer of an) image needs to be rebuilt.
This makes incremental builds possible and would work really well to generate systemd sysext and similar formats.

Reproducible builds

The dependency lockfile would be an input to the image build. This allows tools like mkosi to always use the same set of pinned packages (rpms, debs, ...) instead of using the latest packages available via package repositories.
If you want to perform reproducible OS image builds based on traditional package managers, having a lockfile or manifest is basically a requirement.

Bootstrapping a healthy dependency management ecosystem

As soon as you start pinning package manager packages using a lockfile, you are responsible to update the locked dependencies if a vulnerability is found.
A lot of tooling and support is required for this to work well in practice. If we set standards for package manager lockfiles, this allows the whole ecosystem to build tools on top of that.

Supply chain security

This is basically a result of the other points: if you build image based linux distributions based on existing package manager systems, you'll want to know exactly what packages go into an image.
Having lockfiles makes this process a lot simpler.

Possible implementations

This section is vague intentionally and should only give you a rough idea.
I think the basic options are:

  • try to standardize on a single lockfile format that works for all package managers
  • try to standardize on one lockfile format for each package management system (deb, rpm, ...)

My feeling is that the second option is easier to implement in practice.

I'd be happy to receive feedback. Is this something the UAPI group is interested in tackling / standardizing?

@alatiera
Copy link
Member

Having a manifest/lockfile as an output is great for having some idea of what the image contains indeed, but I am not so sure it's possible to standardize.

If it's just an output file you compare, having a list of components/sources/patches is not enough as you need a lot more things for reproducibility, like what compiler flags were used, configure arguments and prefixes, what the environment of the build process was and so on. And that's only about having an output file.

The dependency lockfile would be an input to the image build.

If you want to also make the lockfile the input, then it would mean that any given system using it would have identical input and producing identical output but doing it in its own way, at which point there would kinda be no point at all tbh. You would basically end up reimplementing the exact same buildorchestation/package-manager system in different ways, for clear to no clear benefit. What advantage would that get you? (And the output could be reproducible anyway with a single instance)

Like it would be basically:

input.lock -> rpmbuild orchestration thingy -> output binaries -> assert_eq(output_lock, input_lock) -> idk shove it into .rpms, OCI layers, w/e
input.lock -> debbuild orchestration thingy -> output binaries -> assert_eq(output_lock, input_lock) -> .debs or other format

Which would raise the question of why do we have (actually implement from scratch) N number of systems with identical input and output and what's the point of repackaging things afterwards since they are identical anyway?

If you know the code sources (git repos patches), the orchestration system definitions used (.spec files, debian/ w/e), and the version (or have the sources/binaries) of your build toolchain (rpm*,dpkg*) that's enough** reproduce a build. What extra advantages would there be by having rpm and deb be able to use and output the same format?

@malt3
Copy link
Author

malt3 commented Aug 29, 2023

Let me rephrase what I want:

The lockfile consists a set of allowed packages. Let's say a set of rpm files.
What I want is an extension for package managers where the package resolution is deterministic.

So given dnf install --lockfile packages.lock <expression>, I want the command to always install the same set of packages.

To make this more concrete, let's split up the different phases a package manager performs:

  • parse the expression
  • optionally update the package index using remote repositories
  • find all requested packages using the expression and the package index
  • recursively find all transitive dependencies of the requested packages in the package index
  • perform the actual installation

In those phases, I want to ensure that resolved packages are also checked against the allowed set of packages in the lockfile.
So the new algorithm would look like this:

  • parse the expression
  • optionally update the package index using remote repositories
  • find all requested packages using the expression and the package index. If any selected package is not in the lockfile, return error
  • recursively find all transitive dependencies of the requested packages in the package index. If any selected package is not in the lockfile, return error
  • perform the actual installation

@DaanDeMeyer
Copy link
Member

The problem here is that official repositories generally only include the latest few versions of packages. So anything using a lock file and the official repositories would eventually stop building as the requested versions would not be available anymore. Why not keep around mirror snapshots and use those instead of the official repositories?

@malt3
Copy link
Author

malt3 commented Aug 30, 2023

I think there are many ways to preserve and access old packages, including keeping your own snapshot mirrors, using the ones provided by Debian, Arch, Redhat (RHEL, Fedora), vendoring and providing packages locally or using a form of content addressable storage to get all packages listed in a lockfile.
So using the lockfile allows you to make simplified statements about the determinism of the package selection:

Either the install succeeds and the selected packages are chosen deterministically or the install fails. This can be very useful for correct caching / cache invalidation, supply chain security and reproducibility.

What I want to get at is that we should decouple the source of the packages from the benefits a lockfile can provide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants