Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail CLI Test On Unknown Additional Fields in Implementation Not Present in the Contract #403

Open
dataGriff opened this issue Sep 1, 2024 · 5 comments

Comments

@dataGriff
Copy link

There should be a feedback or failure mechanism on the data contract cli test when there are additional fields present in the implementation that are not present in the contract.

Currently you can add fields to the implementaiton of the data product and the data contract cli test will pass.

Adding the failure or feedback mechanism will prevent drift of implementation from the contract.
Ideally I think it should fail to force the updating of the contract.

It is worth noting that the data contract itself can evolve with additional optinal fields before the implementation, as these would be non-breaking forward changes, were the implementation can catch-up later.
However we should never have extra fields in an implementation that are not present in the contract.

Original slack thread.

@jochenchrist
Copy link
Contributor

I think we should make it explicit in the data contract that no other fields are allowed.

Option 1: additionalFields

A new property additionalFields (aligned with OpenAPI's additionalProperties).

  • If additionalFields is not set -> There can be additional fields or not (default)
  • If additionalFields is true -> There are likely additional fields
  • If additionalFields is false -> datacontract test should fail if there are other fields.
models:
  orders:
    description: One record per order. Includes cancelled and deleted orders.
    type: table
    additionalFields: false
    fields:
      order_id:
        $ref: '#/definitions/order_id'
        required: true
        unique: true
        primary: true
      order_timestamp:
        description: The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful.
        type: timestamp
        required: true
        example: "2024-09-09T08:30:00Z"

Option 2: Config Option

models:
  orders:
    description: One record per order. Includes cancelled and deleted orders.
    type: table
    config:
      failWithAdditionalFields: false
    fields:
      order_id:
        $ref: '#/definitions/order_id'
        required: true
        unique: true
        primary: true
      order_timestamp:
        description: The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful.
        type: timestamp
        required: true
        example: "2024-09-09T08:30:00Z"

Option 3: Other names

Having a boolean fields with a default of true might not be perfect. Other ideas for better names?

@dataGriff
Copy link
Author

I prefer option 1 as it marries up closer to how you'd handle the same situation with columns (e.g. the property required true/false).

However I think the default behaviour should be false?

additionalFields is false

My expectation would be that the contract can be ahead of the implementation, with fields marked as required : false, but the implementation should never be ahead of the contract? Even allowing that with the additionalFields option being set to true should come with a warning label 😄

In fact... I am talking myself out of there being any behaviour other than the implementation should simply not expect any additional columns compared to the contract and not even be allowed as an option?

Can the whole thing be handled with what is present in the data contract specification and just the extra logic required in the cli?

Proposal

  • The implementation can never be ahead with extra functionality (fields) than what the contract expects.
  • The contract however can evolve first ahead of the implementation with non-breaking changes by having fields set as not required.
  • The user has to ensure they reference the correct version of the specification when validating the implementation using the CLI.

@NiklasA
Copy link
Contributor

NiklasA commented Sep 26, 2024

@jochenchrist I see it the same way as @dataGriff ; I would also reject by default any attributes/columns that are not defined. If the developer wants a different behavior, they can configure that additionally for themselves.

@simonharrer
Copy link
Contributor

required set to false currently has a different meaning, namely, that the values in the field might be absent, not that the column is not available. Currently, there is no way to say that there might be a field.

@jochenchrist
Copy link
Contributor

Ticket for Specification extension: datacontract/datacontract-specification#99

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants