Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkgs.dockertools.buildLayeredImage: customisable layering strategy #122608

Conversation

adrian-gierakowski
Copy link
Contributor

@adrian-gierakowski adrian-gierakowski commented May 11, 2021

Allow customisation of the algorithm used to convert nix references graph (created from docker image contents) to docker layers.

A collection of building blocks (python functions) is provided, which can be assembled into a processing pipeline by specifying a list of operations (and their initial arguments) via a nix lang list.

A graph of nix references is first converted into a python igraph.Graph object (with each vertex representing a nix path), and then fed into the user defined pipeline. Each stage in the pipeline represents a function call, with initial arguments specified by in nix, and the last argument being the result of the previous stage in the pipeline (or the initial Graph object).

Each step of the pipeline is expected to produce a data structure consisting of arbitrarily nested lists/dicts with Graph objects (representing docker layers) at it's leafs. The result of the last stage in the pipeline is recursively flattened (with each dict converted into list of values), until a flat list of Graphs remains. This is then output as a json array of arrays (each Graph converted into an array of paths).

This functionality is made available via new layeringPipeline argument to the streamLayeredImage/buildLayeredImage functions. The default value of the argument has been chosen to to preserve current layering behaviour.

Functions available to use in the pipeline:

  • popularity_contest: sorts paths by popularity, with each path put in it's own layer. This, when followed by limit_layers, results in layers identical to current implementation (implementation copied and adapted from closure-graph.py).
  • subcomponent_out: splits given nix references graph into 2 graphs (preserving edges within each sub-graph):
    • main: includes specified paths and all of their dependencies (including transitive)
    • rest: all remaining paths
  • subcomponent_in: similar to the above but main output includes dependants of the provided paths (instead of dependencies)
  • split_paths: implements idea described by @roberth, by removing edges targeting given paths, and splitting resulting graph into 3:
    • main: specified paths + their dependencies - common paths (see below)
    • common: paths which can be reached from both the specified paths and any of the remaining roots of the input graph
    • 'rest': paths which cannot be reached from the specified paths
  • limit_layers: limits the number of layers to desired amount by combining excess layers from the end of the list into one
  • reverse: reverses given list of layers. This, if inserted between popularity_contest and limit_layers, achieves the same result as layeredStrategies.popularityWeightedTop from this PR.
  • split_every: splits given graph into layers with given number of paths each (apart from the last layer, which might contain less paths).
  • remove_paths: removes vertices corresponding to given paths from a Graph (most likely not a good idea to use this apart for removing roots)
  • over: maps given function over a value of a given key in a dict. Similar to over from haskell lenses, or ramda. Useful for applying further transformations to selected sub-graph in the result of subcomponent_out/in or split_paths.
  • map: map given function over all elements of a list

Some images to demonstrates behaviour of subcomponent_out/in and split_paths.

Given the following input graph, and paths argument set to ["A"]:
plot_subcomponent_out_0_input

Result of subcomponent_out:

plot_subcomponent_out_0_result

With main output consisting of the green vertices and rest of red.

Result of subcomponent_in:

plot_subcomponent_in_0_result

Colours: as above.

Result of split_paths:

plot_split_paths_0_result

main - green
common - purple
rest - red.

Note the edge removed from the graph above. The common sub-graph is calculated by taking an intersection of the sub-graph formed by vertices which can be reached from A and one which can be reached from Root. If A was the root, then both common and rest would be empty.

Example of a custom layeringPipeline

Used to package an application written in python, optimising for sharing layers between rebuilds of the same image (but also with a layer containing python runtime with all it's dependencies, which can be shared between images of different application built on top of the same version of python).

The pipeline: https://github.com/adrian-gierakowski/nix-docker-custom-layering-strategy-example/blob/6eec668c4ef19b09926d8b53b7003575a115273e/default.nix#L42-L108
The resulting docker layers: https://github.com/adrian-gierakowski/nix-docker-custom-layering-strategy-example/blob/119c15a98a16b9b85d676d9ea86ce231d7306d55/store-layers.json

TODOs:
  • write proper documentation
  • add integration tests
  • better error messages when user defined pipeline is invalid. We can detect 'syntactic' errors, invalid function names and number/types of arguments. Currently there is none of this, user will simply get a cryptic error throw by python if something doesn't line up.
  • consider if names of subcomponent_out/in and split_paths could be improved. The former are based on corresponding method of igraph.Graph class and the latter is the first thing that came to mind when I started working on it.
  • split_every: allow customising to walk the graph in depth-first or breath-first mode (currently it's simply iterates the vertices in the order in which they are stored in the igraph.Graph object, which is a bit arbitrary and probably not very useful).
  • remove code duplication between existing references-by-popularity and the new flatten-references-graph (probably by implementing the first in terms of the latter).
Motivation for this change

Current algorithm for grouping nix paths into layers is optimised for a specific use case (sharing layers between unrelated images), but it pessimisms others. For example the reuse of layers between consecutive rebuilds of images in which amount of nix paths exceeds maxLayers. This significantly slows down development of applications with large number of dependencies (where each dep is a separate nix path).

The need for customising the layering algorithm has been highlighted in a couple of issues:
#116446
#48462

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@adrian-gierakowski
Copy link
Contributor Author

@roberth @grahamc

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats on the first version!

I've read through the description and scanned parts of the diff. I'll have to take some time to review it thoroughly.
Some first observations:

  • It seems that you've accidentally duplicated (part of?) the PR description
  • The layering pipeline is a bit hard to follow. This is an area we have identified at the start of this project to be hard to predict and that it may call for some experimentation. Some complexity is to be expected for an advanced feature, but I expect that we at least improve the readability. For one, the syntax you've used is essentially a raw interface to python. It will be easier to read with a little "DSL" of Nix functions in front of it. Perhaps we can also identify some usage patterns and exploit those to make it more intuitive.
  • Given the size, one could start to consider splitting the python code into its own repo. Such a decision must be weighed against the benefits of having it together with the dockerTools Nix code that it's coupled with. Aesthetic reasons are not sufficient to motivate such a split.
  • One comment in the diff. You've already identified in the TODO that Graham's popularity contest hasn't been moved entirely into the new framework yet.

@adrian-gierakowski
Copy link
Contributor Author

adrian-gierakowski commented May 11, 2021

Congrats on the first version!

cheers!

I've read through the description and scanned parts of the diff. I'll have to take some time to review it thoroughly.
Some first observations:

  • It seems that you've accidentally duplicated (part of?) the PR description

fixed

  • The layering pipeline is a bit hard to follow. This is an area we have identified at the start of this project to be hard to predict and that it may call for some experimentation. Some complexity is to be expected for an advanced feature, but I expect that we at least improve the readability. For one, the syntax you've used is essentially a raw interface to python. It will be easier to read with a little "DSL" of Nix functions in front of it. Perhaps we can also identify some usage patterns and exploit those to make it more intuitive.

Do you mean something like this:

let
  layeringPipeline = with pkgs.dockerTools.layeringDSL; compile [
    (subcomponent_out [pkgs.python3])
    (over "rest" (pipe [
      (split_paths [ankisyncd])
      (over "main" (pipe [
        (subcomponent_in [ankisyncd])
        (over "rest" (pipe [
          popularity_contest
          reverse
          (limit_layers 110)
        ]))
      ]))
      (over "rest" (pipe [
        (subcomponent_in [with-env-from-dir])
        (over "rest" (subcomponent_in [pkgs.cacert]))
      ]))
    ]))
  ];

instead of

let
  layeringPipeline = [
    ["subcomponent_out" [pkgs.python3]]
    ["over" "rest" ["pipe" [
      ["split_paths" [ankisyncd]]
      ["over" "main" ["pipe" [
        ["subcomponent_in" [ankisyncd]]
        ["over" "rest" ["pipe" [
          ["popularity_contest"]
          ["reverse"]
          ["limit_layers" 110]
        ]]]
      ]]]
      ["over" "rest" ["pipe" [
        ["subcomponent_in" [with-env-from-dir]]
        ["over" "rest" ["subcomponent_in" [pkgs.cacert]]]
      ]]]
    ]]]
  ];

which is a more tightly formatted version of this?

The "DSL" version is only slightly more readable IMHO, but I guess one benefit it could bring is that the pipeline could be validated within nix and an error could be thrown with a trace pointing exactly at the place which caused it.

  • Given the size, one could start to consider splitting the python code into its own repo. Such a decision must be weighed against the benefits of having it together with the dockerTools Nix code that it's coupled with. Aesthetic reasons are not sufficient to motivate such a split.

I have no preference. Happy to do whatever you (and other maintainers) decide.

  • One comment in the diff. You've already identified in the TODO that Graham's popularity contest hasn't been moved entirely into the new framework yet.

I copied over the code but haven't touched the original package since I wasn't sure if we should keep it around (within nixpkgs only dockerTools depended on it). I guess it doesn't hurt to implement it in terms of flatten-references-graph package in case any code outside nixpkgs depends on it. If you think we should keep it I'm happy to do this.

@roberth
Copy link
Member

roberth commented May 15, 2021

a little "DSL" of Nix functions in front of it

Do you mean something like this:

let
  layeringPipeline = with pkgs.dockerTools.layeringDSL; compile [
    (subcomponent_out [pkgs.python3])
    (over "rest" (pipe [
      (split_paths [ankisyncd])

Yes, something like this aligns more with what people expect from a nix file.

Perhaps we can also identify some usage patterns and exploit those to make it more intuitive.

Not sure anymore. Maybe try to limit the nesting instead?
At least for the splitting operations, the user specifies a path. This means that over calls aren't necessary to find the layer where the operation should be applied.
For popularity_contest we'll still need something like over, but I don't think that's a problem.
It also noticed that because of over, operations are applied to isolated parts of the graph, rather than the whole graph. I've tried running split_paths twice in the pipeline, without over, but that did not work.

I would expect this

nix-build --expr 'with import ./. {}; dockerMakeLayers { baseJson = writeText "json" "{}"; contentsList = [ hello figlet cowsay ]; pipeline = [ ["split_paths" [ hello ] ] ["split_paths" [ figlet ] ] ]; }'

to work, but it ran into a

type error
Traceback (most recent call last):
  File "/nix/store/3g9j2hw15zj4xdk45b3hzcy5i27inq57-flatten-references-graph-0.1.0/bin/.flatten_references_graph-wrapped", line 9, in <module>
    sys.exit(main())
  File "/nix/store/3g9j2hw15zj4xdk45b3hzcy5i27inq57-flatten-references-graph-0.1.0/lib/python3.8/site-packages/flatten_references_graph/__main__.py", line 44, in main
    print(main_impl(file_path))
  File "/nix/store/3g9j2hw15zj4xdk45b3hzcy5i27inq57-flatten-references-graph-0.1.0/lib/python3.8/site-packages/flatten_references_graph/__main__.py", line 24, in main_impl
    result = flatten_references_graph(
  File "/nix/store/3g9j2hw15zj4xdk45b3hzcy5i27inq57-flatten-references-graph-0.1.0/lib/python3.8/site-packages/flatten_references_graph/flatten_references_graph.py", line 36, in flatten_references_graph
    return create_list_of_lists_of_strings(pipe(
  File "/nix/store/b6wvrfxrl3cbqi8vdld63mah9d11p067-python3.8-toolz-0.11.1/lib/python3.8/site-packages/toolz/functoolz.py", line 303, in __call__
    return self._partial(*args, **kwargs)
  File "/nix/store/3g9j2hw15zj4xdk45b3hzcy5i27inq57-flatten-references-graph-0.1.0/lib/python3.8/site-packages/flatten_references_graph/pipe.py", line 74, in pipe
    return tlz.pipe(
  File "/nix/store/b6wvrfxrl3cbqi8vdld63mah9d11p067-python3.8-toolz-0.11.1/lib/python3.8/site-packages/toolz/functoolz.py", line 627, in pipe
    data = func(data)
  File "/nix/store/b6wvrfxrl3cbqi8vdld63mah9d11p067-python3.8-toolz-0.11.1/lib/python3.8/site-packages/toolz/functoolz.py", line 303, in __call__
    return self._partial(*args, **kwargs)
  File "/nix/store/3g9j2hw15zj4xdk45b3hzcy5i27inq57-flatten-references-graph-0.1.0/lib/python3.8/site-packages/flatten_references_graph/split_paths.py", line 99, in split_paths
    split_path_indices = list(unnest_iterable(map(
  File "/nix/store/b6wvrfxrl3cbqi8vdld63mah9d11p067-python3.8-toolz-0.11.1/lib/python3.8/site-packages/toolz/functoolz.py", line 303, in __call__
    return self._partial(*args, **kwargs)
  File "/nix/store/3g9j2hw15zj4xdk45b3hzcy5i27inq57-flatten-references-graph-0.1.0/lib/python3.8/site-packages/flatten_references_graph/split_paths.py", line 78, in split_path_spec_to_indices
    vertex = find_vertex_by_name_or_none(graph)(split_path_spec)
  File "/nix/store/b6wvrfxrl3cbqi8vdld63mah9d11p067-python3.8-toolz-0.11.1/lib/python3.8/site-packages/toolz/functoolz.py", line 303, in __call__
    return self._partial(*args, **kwargs)
  File "/nix/store/3g9j2hw15zj4xdk45b3hzcy5i27inq57-flatten-references-graph-0.1.0/lib/python3.8/site-packages/flatten_references_graph/lib.py", line 100, in find_vertex_by_name_or_none
    return graph.vs.find(name)
AttributeError: 'dict' object has no attribute 'vs'

Let's simplify to a working example:

dockerMakeLayers { baseJson = writeText "json" "{}"; contentsList = [ hello figlet ]; pipeline = [ ["split_paths" [ hello ] ]    ]; }

This outputs a set of layers:

[ [hello]
  [libdn2 libunistring glibc]
  [figlet bash]
]

That is as expected; main, common and rest.
If I specify another split_paths in the pipeline, I would expect it to run for each layer and produce the following (with empty layers still in place to show how the result came to be (via split_paths and then a concatMap)

[ [hello] [] []
  [] [glibc] [libdn2 libunistring]
  [figlet bash] [] []
]

Here, split_paths has an effect on both [libdn2 libunistring glibc] and [figlet bash].

Whether my expectation is optimal is debatable. Perhaps it's better to apply it only to components that contain split_paths arguments, because it results in fewer layers. Otoh, splitting all layers has the benefit of commutativity, which could be really important when paths are in hash order (ie nondeterministic for practical purposes).

Could you try to make it possible, at least for the top-level pipeline, to allow operations to work on all components simultaneously? It seems that at least the splitting functions will behave like identities for components that are entirely unrelated to the argument path(s).

  • splitting the python code into its own repo

I have no preference. Happy to do whatever you (and other maintainers) decide.

I vote to keep it all here.

I copied over the code but haven't touched the original package since I wasn't sure if we should keep it around (within nixpkgs only dockerTools depended on it).

I'd consider the old script an internal implementation detail. If someone used it without dockerTools, they're better off forking it than using a wrapper that will bit rot.

@adrian-gierakowski
Copy link
Contributor Author

@roberth

apologies for the delay

Regarding:

nix-build --expr 'with import ./. {}; dockerMakeLayers { baseJson = writeText "json" "{}"; contentsList = [ hello figlet cowsay ]; pipeline = [ ["split_paths" [ hello ] ] ["split_paths" [ figlet ] ] ]; }'

I believe what you are trying to achieve could be done with:

pipeline = [
  ["split_paths" [ hello ] ]
  ["flatten"]
  ["map" ["split_paths" [ figlet ] ] ] 
]

We could make a version of pipe which would give the same result without user having to add flatten and map manually. Not sure if that should be the default for the top level pipeline. I actually had it working this way initially, but as you notice it can make it difficult to apply operations like popularity_contest to only one specific layer. In some cases the layers where not ordered in the way I'd like them to (which matters if you want to apply limit_layers to the final result). Anyway adding the auto-flattening version of pipe should be straightforward. Hope to be able to do it later this week.

@roberth
Copy link
Member

roberth commented May 26, 2021

I'm still a bit concerned about the need for this amount of configuration. The capability is probably great for power users who will be able to fine tune their images and it's very useful for experimentation. I just don't want people to have to become power users.

Perhaps I have given up too soon on optimization (in the math sense) when thinking about the split_paths approach. I've mentioned before that it optimizes for minimizing changes between updates of the same image as opposed to popularity_contest which optimizes for cross-image sharing. I didn't make the extra step to formalize that though. There's no reason it can't be done though, so a fully automatic split_paths may still be possible.

Having a good algorithm without configuration should alleviate any DSL issues.

A fully automatic split_paths could split layers one at a time, selecting the optimal split in each iteration. If we find a good definition of optimal we don't need any configuration at all. If it does not work great in some cases, we can let the user adjust the metric instead of making the whole thing manual.

At a high level, for every split we're looking to maximize the expected saved cost of transmission.

maxpath ∈ nodes(graph) P(change in path) * cost_of_changegraph,layers(path))

where P(change in path) is the probability that path changes between versions of the image
and cost_of_change is the cost of transmitting the layers that need to be transmitted when path changes.

This seems to be an accurate representation of what we're trying to achieve.
To use it, we need to define the two functions. cost_of_change is the most tractable one. The sum of the size of the changed layers is a good approximation. I don't think we need to account for per-layer overhead, partly because I don't expect the algorithm to produce small layers.

cost_of_changegraph,layers(path) = size(intersect(split_atgraph(layers, path), layers))

Here, split_at performs the split according to split_paths and gives all the dependents of _path_ a new identity, so the intersect function can do its job. This is why we need graph as well as layers.

So far, this seems to be accurate. P(change in path) is by far the most unpredictable one. One approximation is to assume that each byte in the closure is equally likely to change. Compared to an equal probability for all paths, this seems reasonable because big paths have more entropy. It will also have the effect of putting the small config files near the root together in the "rest" layer, not wasting layers. It also doesn't favor putting small dependencies in layers. However, because they're deeper down, those will be grouped according to the needs of the bigger paths, as a consequence of how split_paths works.

We could make a version of pipe which would give the same result without user having to add flatten and map manually.

While writing the above, I've assumed the split_paths to apply to all layers, like in the "list transformer" (as in Haskell) pipe you suggest; not just the layer containing path. I'm not sure if we need the commutativity of this variation though, because we apply the iterations of it in a deterministic order that isn't too arbitrary either. Perhaps splitting less per split_paths operation is good because it leaves more layers to be decided by the optimization step instead. I can't judge which is better at this point.

it can make it difficult to apply operations like popularity_contest to only one specific layer. In some cases the layers where not ordered in the way I'd like them to (which matters if you want to apply limit_layers to the final result).

The necessity of this seems to ultimately stem from the desire to use as many layers as possible without having too much configuration. Automatic iterated split_paths is also automatic and can just stop at the configured maximum. I hope we don't need to combine multiple automatic algorithms.

Just a side note, if you could generate a graphviz graph in a passthru attribute, that would be very useful when trying to understand real-world images. Clusters seem like a perfect match for layers.

@adrian-gierakowski
Copy link
Contributor Author

@roberth

I'm still a bit concerned about the need for this amount of configuration. The capability is probably great for power users who will be able to fine tune their images and it's very useful for experimentation. I just don't want people to have to become power users.

Yes, I believe in most cases uses should be able to select one of predefined algorithms/presets based on their particular needs. Your proposal sounds interesting, however I believe this could be addressed in a separate PR. I think the capabilities implemented in this PR are still useful so it would be great if we could get this merged while we try to figure out the "fully automatic" algorithm which optimises for image rebuilds.

@roberth
Copy link
Member

roberth commented Jun 7, 2021

We've just had a call. The plan is to work on an automatic algorithm before working on the DSL and documentation, because those may be affected by this work.
I don't expect many changes to dockerTools in the short term, although Adrian has experienced some merge conflicts during the past month. Last month has been busier than usual; possibly because of the release. If more than usual PRs show up again, I will reconsider this. (And I'll keep this PR in mind when other PRs come in)

@adrian-gierakowski
Copy link
Contributor Author

sorry, I have been busy with a new project, I might be able to get back to it in about a month or so. Alternatively I could set up a bounty and contribute some $ to it if anyone would be interested in taking this up

@roberth
Copy link
Member

roberth commented Oct 9, 2021

@adrian-gierakowski Thanks for the update. It'd be great if anyone could make progress on this.

This pr brings good changes, such as tests and a foundation for building smarter layering strategies. It would even fix a bug, #140908, so I think the priority should be to get this merged. The strategies themselves can be marked as experimental with a message during evaluation, to manage expectations and invite collaboration.

Regarding the bounty, that may be quite helpful. It'd be most effective if you could help with the merge-readiness part. That way the bounty can be used for a more specific task that doesn't require as much background.

I should probably also plug https://opencollective.com/nix-deployments which is a new fund for maintenance and development of Nix-based deployment tech. You can donate for specific projects, such as the NixOps 2 release, or "Anything around Containers", where the ideas are open for discussion.
That said, a bounty works differently and might better fit your needs. Specifically, it is more specifically scoped and not paid ahead. Do whatever you're comfortable with; it's highly appreciated regardless.

@adrian-gierakowski
Copy link
Contributor Author

@roberth I could probably spend 1-2 hours on this next week to complete any work you think necessary to get this merged. This would obviously not be enough to implement the fully automatic strategy. I could probably work on the latter in about a month or so, and submit it as a separate PR, together with documentation etc. In the meantime I’ll look into setting up a bounty in case someone else is keen to do it earlier.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/a-faster-dockertools-buildimage-prototype/16922/2

@rihardsk
Copy link

rihardsk commented Mar 11, 2022

I've been using this PR in one project and thought it might be useful to post some error tracebacks I've been encountering, in case this PR is ever finished:

building '/nix/store/lc5jf8pjg92gcjzdmgah7a9hl9aav558-docker-make-layers.drv'...
Traceback (most recent call last):
  File "/nix/store/3g9j2hw15zj4xdk45b3hzcy5i27inq57-flatten-references-graph-0.1.0/bin/.flatten_references_graph-wrapped", line 9, in <module>
    sys.exit(main())
  File "/nix/store/3g9j2hw15zj4xdk45b3hzcy5i27inq57-flatten-references-graph-0.1.0/lib/python3.8/site-packages/flatten_references_graph/__main__.py", line 44, in main
    print(main_impl(file_path))
  File "/nix/store/3g9j2hw15zj4xdk45b3hzcy5i27inq57-flatten-references-graph-0.1.0/lib/python3.8/site-packages/flatten_references_graph/__main__.py", line 24, in main_impl
    result = flatten_references_graph(
  File "/nix/store/3g9j2hw15zj4xdk45b3hzcy5i27inq57-flatten-references-graph-0.1.0/lib/python3.8/site-packages/flatten_references_graph/flatten_references_graph.py", line 36, in flatten_references_graph
    return create_list_of_lists_of_strings(pipe(
  File "/nix/store/3g9j2hw15zj4xdk45b3hzcy5i27inq57-flatten-references-graph-0.1.0/lib/python3.8/site-packages/flatten_references_graph/flatten_references_graph.py", line 17, in create_list_of_lists_of_strings
    return list(tlz.map(
  File "/nix/store/3g9j2hw15zj4xdk45b3hzcy5i27inq57-flatten-references-graph-0.1.0/lib/python3.8/site-packages/flatten_references_graph/lib.py", line 275, in flatten
    yield from flatten(x)
  File "/nix/store/3g9j2hw15zj4xdk45b3hzcy5i27inq57-flatten-references-graph-0.1.0/lib/python3.8/site-packages/flatten_references_graph/lib.py", line 275, in flatten
    yield from flatten(x)
  File "/nix/store/3g9j2hw15zj4xdk45b3hzcy5i27inq57-flatten-references-graph-0.1.0/lib/python3.8/site-packages/flatten_references_graph/lib.py", line 275, in flatten
    yield from flatten(x)
  [Previous line repeated 3 more times]
  File "/nix/store/3g9j2hw15zj4xdk45b3hzcy5i27inq57-flatten-references-graph-0.1.0/lib/python3.8/site-packages/flatten_references_graph/lib.py", line 273, in flatten
    for x in xs:
  File "/nix/store/3g9j2hw15zj4xdk45b3hzcy5i27inq57-flatten-references-graph-0.1.0/lib/python3.8/site-packages/flatten_references_graph/lib.py", line 299, in <lambda>
    (lambda: (yield merge_graphs(graphs_iterator)))()
  File "/nix/store/3g9j2hw15zj4xdk45b3hzcy5i27inq57-flatten-references-graph-0.1.0/lib/python3.8/site-packages/flatten_references_graph/lib.py", line 256, in merge_graphs
    return tlz.reduce(lambda acc, g: acc + g, graphs)
  File "/nix/store/b6wvrfxrl3cbqi8vdld63mah9d11p067-python3.8-toolz-0.11.1/lib/python3.8/site-packages/toolz/functoolz.py", line 303, in __call__
    return self._partial(*args, **kwargs)
TypeError: reduce() of empty sequence with no initial value
error: builder for '/nix/store/lc5jf8pjg92gcjzdmgah7a9hl9aav558-docker-make-layers.drv' failed with exit code 1

I've encountered this quite a few times but I'm not certain what causes it. It happens when specifying stuff for some dependencies and not for others. It's probably related to how many dependencies are contained in the "main" and "rest" parts of splits.

I know that this PR might be obsolete and there now seem to be some other PRs tackling the same problem. In case it isn't obsolete and somebody decides to tackle this again, I hope the stacktraces help.

EDIT: it seems that the error is caused by "popularity_contest" strategy having a "layer_limit" set that is higher than the actual number of dependencies in the subgraph.

@adrian-gierakowski
Copy link
Contributor Author

@rihardsk thanks for the feedback! I defiantly intend to push the ball forward on this PR, however my work situation changed soon after I submitted this PR and I haven't had much spare time atm to continue working on it.

I know that this PR might be obsolete and there now seem to be some other PRs tackling the same problem

I'm not aware of any other PRs which provide a better solution to the problem this PR is trying to address so I hope we can get this PR merged eventually.

EDIT: it seems that the error is caused by "popularity_contest" strategy having a "layer_limit" set that is higher than the actual number of dependencies in the subgraph.

Nice, I'll try to replicate this in tests and see if can catch it early to provide better feedback to the user.

@roberth you wrote above that:

It'd be most effective if you could help with the merge-readiness part.

what would have to be done (apart from rebasing and resolving conflicts) for this PR to be considered merge ready?

@rihardsk
Copy link

@adrian-gierakowski i'm definitely enjoying the flexibility your solution provides. I've only had a quick glance on those other docker image layering PRs so i wasn't sure if anything there attempts to provide a general solution.

One slight annoyance I've had with this PR is that the layering strategy that I've created has ended up being deeply nested, here's a sample from my project:

let
  # Helper functions for layering strategy creation
  popularityHelper = overWhat: layerLimit: [
    "over"
    overWhat
    [
      "pipe"
      [
        ["popularity_contest"]
        ["reverse"]
        ["limit_layers" layerLimit]
      ]
    ]
  ];
  # Note, there seems to be a bug, where the layer creation code fails with
  #   TypeError: reduce() of empty sequence with no initial value
  # that seems to be caused by the popularity_contest strategy having a layerLimit
  # that is higher than the actual number of dependencies in the dependency
  # subgraph. To work around this, try decreasing the layerLimitForRest argument.
  splitFromMain = splitWhat: layerLimitForRest:
    let mainProcessingStrategy =
      if layerLimitForRest > 0
        then
          [
            ["subcomponent_in" splitWhat]
            (popularityHelper "rest" layerLimitForRest)
          ]
        else
          [
            ["subcomponent_in" splitWhat]
          ]
      ;
    in [
      "over"
      "main"
      [
        "pipe"
        mainProcessingStrategy
    ]
  ];

  layeringPipeline = [
    ["subcomponent_out" [models]]
    [
      "over"
      "rest"
      [
        "pipe"
        [
          ["split_paths" [cudatoolkit]]
          (splitFromMain [cudatoolkit] 10)
          (popularityHelper "common" 30)
          [
            "over"
            "rest"
            [
              "pipe"
              [
                ["split_paths" [gateway]]
                (splitFromMain [gateway] 1)
                # (popularityHelper "common" 30)  # cannot use this because of TypeError: reduce() of empty sequence with no initial value
                [
                  "over"
                  "rest"
                  [
                    "pipe"
                    [
                      ["split_paths" [segmenter]]
                      (splitFromMain [segmenter] 1)
                      # (popularityHelper "common" 30)  # cannot use this because of KeyError: 'common'
                      [
                        "over"
                        "rest"
                        [
                          "pipe"
                          [
                            ["split_paths" [translator]]
                            (splitFromMain [translator] 4)
                            # (popularityHelper "common" 30)
                            [
                              "over"
                              "rest"
                              [
                                "pipe"
                                [
                                  ["split_paths" [marian-server]]
                                  (splitFromMain [marian-server] 2)
                                  # (popularityHelper "common" 30)
                                  [
                                    "over"
                                    "rest"
                                    [
                                      "pipe"
                                      [
                                        ["split_paths" [pkgs.python38Packages.supervisor]]
                                        # (splitFromMain [pkgs.python38Packages.supervisor] 10)
                                        # (popularityHelper "common" 30)
                                        [
                                          "over"
                                          "rest"
                                          [
                                            "pipe"
                                            [
                                              ["subcomponent_out" [gateway_conf translator_conf supervisord_conf]]
                                              # (popularityHelper "rest" 5)
                                            ]
                                          ]
                                        ]
                                      ]
                                    ]
                                  ]
                                ]
                              ]
                            ]
                          ]
                        ]
                      ]
                    ]
                  ]
                ]
              ]
            ]
          ]
        ]
      ]
    ]
  ];
in {} # ...

Maybe I'm using it wrong, but it would be cool if I could avoid all the nesting here and instead have something that looks more linear and somehow automatically pipes the "rest" part to the next step (avoiding "over").

@roberth
Copy link
Member

roberth commented Mar 11, 2022

I think what this needs is

  1. rebase
  2. a test case reproducing rihardsk's problem, assuming it also affects the default pipeline
  3. some expectation management regarding the dsl. It's in development and is very likely to change. Worst case, users can delete their pipelines, so that's not terrible, but it is when they don't expect to have to.
  4. merge

@roberth
Copy link
Member

roberth commented Mar 11, 2022

@rihardsk it seems that you're most knowledgeable about the issue. Do you think you can write a test case?

@adrian-gierakowski
Copy link
Contributor Author

@SuperSandro2000 which tool did you use to format pkgs/build-support/docker/default.nix in this commit? I hope to make merge conflicts more manageable by running the same tool on the branch of this PR before doing rebasing it on top of master. Thanks!

@adrian-gierakowski adrian-gierakowski force-pushed the ag/docker-customisable-layering-strategy branch from 138a673 to 5b4a8db Compare November 9, 2024 16:25
@adrian-gierakowski
Copy link
Contributor Author

adrian-gierakowski commented Nov 9, 2024

Just rebased on master, fixing merge conflicts, applying nixfmt to new nix files and squashing all commits into one.

I've backed up previous state of this branch on at https://github.com/adrian-gierakowski/nixpkgs/tree/ag/docker-customisable-layering-strategy-before-rebase-and-nixfmt-2024-11-09

@adrian-gierakowski
Copy link
Contributor Author

also ran nix-build nixos/tests/docker-tools.nix locally and tests passed ✅

# These derivations are only created as implementation details of docker-tools,
# so they'll be excluded from the created images.
unnecessaryDrvs = [ baseJson overallClosure customisationLayer ];
layersJsonFile = buildPackages.dockerMakeLayers {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for people who would like to experiment with different ways of grouping paths into layers, this could now be override (at pkgs level) in order to inject arbitrary code for generating layers

probably not the most friendly interface but at least it's an improvement over status quo of streamLayeredImage being tightly coupled to referencesByPopularity approach

I'd be happy to iterate on this in order to make the injection of layering logic more user friendly.

@colonelpanic8
Copy link
Contributor

@roberth @tomberek Please merge this or an articulate a reason why it can't be merged. It has been more than 3 years. I have been using this to build all the containers at my small business for almost as year now with no issues!

@roberth
Copy link
Member

roberth commented Nov 9, 2024

@adrian-gierakowski @colonelpanic8 In principle I'm in favor of merging this as soon as 24.11 is branched off, which is planned for Thursday.

Originally I wanted to do more for this PR in terms of review and perhaps improving it further, but that has turned out unrealistic, so I am content that this is at the very least an improvement in terms of maintainability and ability to evolve this solution.

Thank you @adrian-gierakowski for implementing this (and persevering, oh my), @colonelpanic8 for testing, and @tomberek for offering to help with review.

@ofborg ofborg bot removed 2.status: merge conflict This PR has merge conflicts with the target branch 10.rebuild-darwin: 1 10.rebuild-linux: 1 labels Nov 9, 2024
@adrian-gierakowski
Copy link
Contributor Author

adrian-gierakowski commented Nov 10, 2024

Thanks @roberth!

Would you be willing to nominate me, or support my self-nomination, to be added as docketTools maintainer and committer here? I’d be happy to iterate on this to make further improvements and eventually stabilise the interface. Thanks!

@roberth
Copy link
Member

roberth commented Nov 10, 2024

I will, but if anyone would only look at the numbers, they may disagree. I can trust you because we've talked directly and you've shown professionalism etc, but that won't apply to others, so it's probably best to wait a bit. Until then, ping me so I can check a couple of things and merge for you. These won't necessarily be in-depth reviews, or very large diffs, so that's two reasons why it will be much much quicker, especially compared to this one.

@tomberek
Copy link
Contributor

@adrian-gierakowski i messaged you in Matrix. Any other good way to get in touch?

@adrian-gierakowski
Copy link
Contributor Author

@tomberek I don't see your message, but just sent you one on matrix as well (found you on #users:nixos.org)

@adrian-gierakowski
Copy link
Contributor Author

addressed all recent comments
ran tests locally and they still pass

Copy link
Contributor

@tomberek tomberek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging this as a refactor of the underlying algorithm and reviewed by @roberth and myself. The DSL is still an internal representation. Still needs good documentation, helper wrappers, and perhaps some design iteration.

I am also looking at the maintainability. If this turns out to be too expensive we should consider reverting.

@tomberek tomberek merged commit 4703b8d into NixOS:master Nov 30, 2024
16 of 17 checks passed
@tomberek
Copy link
Contributor

How has this gone so far? Any issues?

@kevinmehall
Copy link
Contributor

As someone who has been waiting for this functionality for a long time, I just tried this out -- The pipeline syntax could definitely use more documentation and examples but now that I've wrapped my head around it, it works very well.

A few snippets for common use cases:

Flatten python3 and all its dependencies into a single layer as a common base layer, handle other paths like previously

[
    ["subcomponent_out" [pkgs.python3]]
    ["over" "rest" ["pipe" [
        ["popularity_contest"]
        ["limit_layers" 100]
    ]]]
]

Force the often-changing custom application code into its own layer

[
    ["subcomponent_in" [my_app]]
    ["over" "rest" ["pipe" [
        ["popularity_contest"]
        ["limit_layers" 100]
    ]]]
]

Combining the two

[
    ["subcomponent_in" [my_app]]
    ["over" "rest" ["pipe" [
        ["subcomponent_out" [pkgs.python3]]
        ["over" "rest" ["pipe" [
            ["popularity_contest"]
            ["limit_layers" 100]
        ]]]
    ]]]
]

The ["over" "rest" ["pipe" [ and resulting rightward drift is a bit awkward. But in any case, it does what I need it to do, and thanks for finally getting this landed!

@colonelpanic8
Copy link
Contributor

I was already using it on an older forked version of nixpkgs, but I have now moved to using the version that is in unstable and I'm (pretty unsurprisingly) having no issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.