Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Artifact versions ordering #101

Open
vsevel opened this issue Aug 12, 2022 · 8 comments
Open

Artifact versions ordering #101

vsevel opened this issue Aug 12, 2022 · 8 comments
Labels
bug Something isn't working

Comments

@vsevel
Copy link

vsevel commented Aug 12, 2022

When Artifact CRs are applied in parallel, there is a non deterministic provisioning of schemas in the registry.
For instance if I apply:

apiVersion: artifact.apicur.io/v1alpha1
kind: Artifact
metadata:
  name: x-1
...
spec:
  artifactId: x
  type: AVRO
  version: '1'

and at the same time:

apiVersion: artifact.apicur.io/v1alpha1
kind: Artifact
metadata:
  name: x-2
...
spec:
  artifactId: x
  version: '2'

sometimes the different versions will be registered as 1 and 2 (2 is being the latest), and sometimes it will be 2 and 1 (1 is the latest).
we need versions to be strictly ordered in a deterministic way.

specifically if on day 1 I push version 2, it should be considered the only and latest version.
and if on day 2 I push version 1, then it should create version 1 in the registry prior to version 2, and latest stays on version 2.

with that improvement, I could apply Artifact CRs in any order, which means also that I could apply them in parallel.

@EricWittmann EricWittmann added the bug Something isn't working label Aug 16, 2022
@EricWittmann
Copy link
Member

This is an interesting use-case. The issue we have today is that version numbers are not meaningful to the registry, especially when provided by the client (as I think is being done in this case). In other words, registry doesn't really know that 2 should come after 1. So if 2 is added first, and then 1 is added, registry will think that 1 is the latest version. Which is obviously not ideal.

So it would be up to the client to ensure proper ordering of operations based on whatever criteria is desired. In this case, I think you're suggesting (correctly) that the sync operator should sort the versions to ensure consistent ordering. I agree, although we should probably support three (configurable?) sorting algorithms:

  1. Simple numeric (interpret the version as a number and do a numeric sort)
  2. Simple alphabetic
  3. Semver

@vsevel
Copy link
Author

vsevel commented Aug 17, 2022

The issue we have today is that version numbers are not meaningful to the registry, especially when provided by the client

hmm. that is not what I see. If I create a spec with:

spec: 
  name: my-movie
  artifactId: org.acme.Movie
  groupId: default
  version: 'foo'

then I do see it in the registry, as expected:

$ curl -k  https://broker-schema-registry-infra-kafka-sbiz.apps.dev.ocp.dev.biz.lodh.com/apis/registry/v2/groups/default/artifacts/org.acme.coucou.Movie2/versions
{"count":1,"versions":[{"name":"my-movie","description":"super movie avro","createdOn":"2022-08-17T15:36:55+0000","createdBy":"","type":"AVRO","labels":["avro","kafka","ocpdeploy"],"state":"ENABLED","globalId":165,"version":"foo","properties":{"custom1":"bubu","custom2":"lala"},"contentId":98}]}

So it would be up to the client to ensure proper ordering of operations based on whatever criteria is desired.

sorting is one thing when all artifacts are there is one thing.
but I think this should support out of order apply as well.
in other words, I should be able to apply v2, then later v1, and still end up with v1 being considered as an older version compared to v2 (i.e. v2 is latest).

this is very important because gitops tooling does not always enforce strict ordering. it assumes desired state principle only.
for instance argo will enforce strict ordering between waves only. and using waves to order artifacts is overkill.
inside a wave, argo will order object types (e.g. ConfigMap first). But inside an object type (e.g. Artifact), I am not sure there is a strict guarantee that v1 will be applied before v2, just because there are in sequence in the same document.
and even if they are, and argo honors this, argo will not wait for the individual statuses. It will only do this at the end of the wave.
This opens up edge cases were argo applies v1, and for some reason the operator fails to create the artifact (e.g. temporary connectivity issue with the registry), then argo applies v2, and the operator creates v2 as the latest (without v1).
later the operator retries v1 in its reconcile loop, and add it to the list of versions. now v1 is latest.
now it depends on the operator failure handling. we do not want to guess.
in this situation, I should be able to apply v1 then v2, or v2 then v1, with the exact same result, assuming I chose the versions to be strictly ordered according to the version field.

we should probably support three (configurable?) sorting algorithms:

seems reasonable.

@EricWittmann
Copy link
Member

this is very important because gitops tooling does not always enforce strict ordering. it assumes desired state principle only. for instance argo will enforce strict ordering between waves only. and using waves to order artifacts is overkill. inside a wave, argo will order object types (e.g. ConfigMap first). But inside an object type (e.g. Artifact), I am not sure there is a strict guarantee that v1 will be applied before v2, just because there are in sequence in the same document. and even if they are, and argo honors this, argo will not wait for the individual statuses. It will only do this at the end of the wave. This opens up edge cases were argo applies v1, and for some reason the operator fails to create the artifact (e.g. temporary connectivity issue with the registry), then argo applies v2, and the operator creates v2 as the latest (without v1). later the operator retries v1 in its reconcile loop, and add it to the list of versions. now v1 is latest. now it depends on the operator failure handling. we do not want to guess. in this situation, I should be able to apply v1 then v2, or v2 then v1, with the exact same result, assuming I chose the versions to be strictly ordered according to the version field.

Understood. We've been discussing the gitops use-case internally the last week or two, actually. There is definitely some work to do to facilitate it. There are some design choices (immutability of versions, for example) that are impactful. We may need to loosen some restrictions we currently have, and add some features we currently do not have (like deleting versions), in order to properly support the use-case. With the assumption that in a GitOps configuration, only one (or very few) service accounts will be responsible for all WRITE operations. And users will be restricted to READ-ONLY access.

@vsevel
Copy link
Author

vsevel commented Aug 18, 2022

With the assumption that in a GitOps configuration, only one (or very few) service accounts will be responsible for all WRITE operations. And users will be restricted to READ-ONLY access.

yes. this is exactly our situation.

@vsevel
Copy link
Author

vsevel commented Oct 12, 2022

hello, any news on this?

@EricWittmann
Copy link
Member

Nothing concrete at the moment. The GitOps use-case is important and something we're talking about in various contexts here at Red Hat. But we don't have a specific path planned yet, I'm sorry to say.

@vsevel
Copy link
Author

vsevel commented Mar 15, 2023

any news on this?

@carlesarnal
Copy link
Member

carlesarnal commented Dec 13, 2023

The only news we have for this is that we have started working on a Gitops storage variant for Apicurio Registry v3 that will add those capabilities to the application. Once V3 is released there will be documentation for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants