Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to include fields that are equal to the default value in the JSON encoding #1526

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mrabiciu
Copy link

This adds the ability to include fields that are equal to the default value in the json encoding.

This is done by adding a traversalOptions var to Visitor as well as adding a includeDefaultValues to JSONEncodingOptions which controls the JSONEncodingVisitor's traversalOptions

I've modified the generation of traverse to check traversalOptions before calling visit on each individual field.

@mrabiciu
Copy link
Author

mrabiciu commented Dec 13, 2023

This commit has all of the functional changes 59a6212

@thomasvl
Copy link
Collaborator

Do the other languages end up encoding the empty arrays when visiting default values? i.e. - does it make sense to visit on those or just skip the visit calls still? We probably want to make sure our JSON support ends up doing the same things the other upstream languages do.

In any case, since this is public api, we should spell out in the comments what the expected behavior is for all cases:

  • repeated fields
  • map fields
  • extension fields (all types)
  • proto2 syntax files (since non repeated field types have presence all have presence)
  • proto3 syntax files where the field isn't optional (expect the visit)
  • proto3 syntax files where a field is optional (likely should be the same as proto2 syntax files)

And then use a custom listener in the test to ensure all these behaviors for all the different message types.

@thomasvl
Copy link
Collaborator

For the comments spelling out the behaviors, I guess we should use the terms from the upstream pages:

@thomasvl thomasvl requested a review from tbkka December 13, 2023 20:39
@mrabiciu
Copy link
Author

Do the other languages end up encoding the empty arrays when visiting default values?

Yes I'm using the python package for comparison and if you set including_default_value_fields=True it includes empty lists and empty maps.

https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html

including_default_value_fields – If True, singular primitive fields, repeated fields, and map fields will always be serialized. If False, only serialize non-empty fields. Singular message fields and oneof fields are not affected by this option.

@mrabiciu
Copy link
Author

And then use a custom listener in the test to ensure all these behaviors for all the different message types

I'm not sure what you mean by that. Is an encoding test that asserts the encoded json is as expected not enough?

@mrabiciu mrabiciu force-pushed the simple-default-traversal branch from 7319183 to 6d44684 Compare December 14, 2023 00:15
@thomasvl
Copy link
Collaborator

And then use a custom listener in the test to ensure all these behaviors for all the different message types

I'm not sure what you mean by that. Is an encoding test that asserts the encoded json is as expected not enough?

Haven't looked in detail, it might be. I wasn't sure if json had any specific logic of it's own vs. ensure all the methods expected did/didn't get called.

@thomasvl
Copy link
Collaborator

I finally got a chance to start looking at this, and I'm going to suggest we step back to hopefully won't be a bike shed –

As drafted, the PR calls the JSONEncodingOptions addition includeDefaultValues. And Visitor gets TraversalOptions which has visitDefaultValues.

The looking at the C++ api, their JSON is always_print_primitive_fields

The Java api has: alwaysOutputDefaultValueFields & includingDefaultValueFields

I don't think we want to eventually try to do the level of the api that Java has (the Set support would really complicate the visitor pattern).

So on the naming side, my 2¢ would be:

I think default gets a little confusing around the behavior with repeated fields and map<> fields. So I think I sorta like the C++ naming because it is about primitive fields, there still is a minor stretch in that a repeated or map field is primitive (it's an array or a map), but it a single message field clearly isn't a primitive field. So…

  • Visitor gets TraversalOptions with alwaysVisitPrimitiveFields.
  • JSONEncodingOptions gets alwaysPrintPrimitiveFields

The JSON value lines up with the two other names we already have that start alwaysPrint…, and the naming gnerally lines up with the C++ api (which is what I think we normally look to).

Thoughts?

@tbkka what makes sense to you?

@tbkka
Copy link
Collaborator

tbkka commented Dec 14, 2023

I like Thomas' naming ideas here.

Copy link
Collaborator

@thomasvl thomasvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the test case added is in the regenerate commit, so I almost missed that addition.

@@ -47,6 +47,10 @@ class MessageFieldGenerator: FieldGeneratorBase, FieldGenerator {
return false
}
}

var usesDefaultValueFlagForTraversal: Bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, we don't want to have to check syntax any more (working towards Editions). Would just looking at say hasFieldPresence and if the field is repeated/map vs. singular work here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a difference in behavior between proto2 and proto3 though. At least in the python lib which I was using for reference. In proto3 unset optional primitives are omitted from json, and in proto2 unset optional primitives are included. I'm not sure if theres a better way to do this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Descriptor support should already be tracking most of that: https://github.com/apple/swift-protobuf/blob/main/Sources/SwiftProtobufPluginLibrary/Descriptor.swift#L638-L642 What's the python code you're looking at for reference?

Copy link
Author

@mrabiciu mrabiciu Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look at python code, just observed how the python lib behaves when serializing a message with every kind of field type.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example of what I mean

Proto3 behavior

syntax = "proto3";

message Proto3Message {
    int32 singularInt32 = 1;
    repeated int32 repeatedInt32 = 2;
    optional int32 optionalInt32 = 3;
}
print(MessageToJson(Proto3Message(), including_default_value_fields=True))
{
  "singularInt32": 0,
  "repeatedInt32": []
}

Proto2 behavior

message Proto2Message {
    required int32 singularInt32 = 1;
    repeated int32 repeatedInt32 = 2;
    optional int32 optionalInt32 = 3;
}
print(MessageToJson(Proto2Message(), including_default_value_fields=True))
{
  "singularInt32": 0,
  "repeatedInt32": [],
  "optionalInt32": 0
}

hasPresence will be true for optionalInt32 for both proto2 and proto3 but we want to print it in proto2 and omit it in proto3

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good I can work on that. Thank you!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a docs rollout is happening:

https://protobuf.dev/programming-guides/proto2/#json-options

Always emit fields without presence: Fields that don’t support presence and that have their default value are omitted by default in JSON output (for example, an implicit presence integer with a 0 value, implicit presence string fields that are empty strings, and empty repeated and map fields). An implementation may provide an option to override this behavior and output fields with their default values.

As of v25.x, the C++, Java, and Python implementations are nonconformant, as this flag affects proto2 optional fields but not proto3 optional fields. A fix is planned for a future release.

https://protobuf.dev/programming-guides/proto3/#json-options

Always emit fields without presence: Fields that don’t support presence and that have their default value are omitted by default in JSON output (for example, an implicit presence integer with a 0 value, implicit presence string fields that are empty strings, and empty repeated and map fields). An implementation may provide an option to override this behavior and output fields with their default values.

Copy link
Author

@mrabiciu mrabiciu Feb 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by those docs for proto2

(for example, an implicit presence integer with a 0 value, implicit presence string fields that are empty strings, and empty repeated and map fields). An implementation may provide an option to override this behavior and output fields with their default values.

Don't integer and string fields always have presence under proto2? Under what scenario would we actually emit these in proto2?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the logic to just use hasPresence for proto2 the same way as in proto3. But I think those docs could use some clarification.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please feel free to open an issue upstream with the feedback. Some times it helps to hear directly from devs.

Sources/protoc-gen-swift/MessageFieldGenerator.swift Outdated Show resolved Hide resolved
Sources/protoc-gen-swift/MessageFieldGenerator.swift Outdated Show resolved Hide resolved
Sources/protoc-gen-swift/MessageFieldGenerator.swift Outdated Show resolved Hide resolved
Sources/SwiftProtobuf/Visitor.swift Outdated Show resolved Hide resolved
Sources/SwiftProtobuf/Visitor.swift Outdated Show resolved Hide resolved
@thomasvl
Copy link
Collaborator

@mrabiciu looks like you might need to sync also to resolve some conflicts.

@thomasvl
Copy link
Collaborator

@tbkka can you take a look, I think given the new docs from the protobuf team, we're likely pretty good to go on this.

@mrabiciu mrabiciu force-pushed the simple-default-traversal branch from 8b366c4 to 49c7175 Compare February 21, 2024 20:55
@mrabiciu
Copy link
Author

Rebased on main and squashed all my commits

@mrabiciu mrabiciu force-pushed the simple-default-traversal branch from 49c7175 to ba9708f Compare February 22, 2024 23:46
@ChristopherDrum
Copy link

ChristopherDrum commented Dec 26, 2024

@thomasvl It's been 10 months now. Was there a blocker on this? It seemed at the time like it was ready to merge in. Has this been supplanted by another PR or methodology? Any guidance from the team on how we should deal with this situation in the meantime?

Edit: Found the answer in another thread from another thread. Have to wait for v2 apparently.
#1504 (comment)

@thomasvl
Copy link
Collaborator

New apis/additions to the library generally get signoff from someone at Apple since they own the repo.

Currently they want to ensure any changes/additions to the repo are non breaking as a major version bump could cause issues for downstream packages that also depend on this repo.

Copy link
Member

@FranzBusch FranzBusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a sensible change to me. This is only additive in adding new encoding options and it adds new test cases to check that this option is working correctly. The change is non breaking since we provide a default implementation of the new protocol requirement.

@gjcairo
Copy link
Contributor

gjcairo commented Jan 2, 2025

@tbkka left a comment here in 2023 for another change that also only adds new APIs, but he mentioned it's breaking because it's breaking between the runtime and the generated code (kind of similarly to this PR, new API was added to the Message protocol, which means the generated code has to be re-generated to build).

I agree there are only API additions with default implementations on this PR, but if you have mismatching versions for the generated code and SwiftProtobuf (specifically, newer generated code running with an older swift-protobuf), things will break because the generated code now references a new field in the Visitor.

FWIW, I don't think these count as breaking changes (so I agree with @FranzBusch approving this): if someone went to the trouble of updating the generated code with a newer version of the plugin, why would they keep the old version of the runtime (or vice versa)? As long as users are updating both things simultaneously they should be safe, and I don't see why this wouldn't happen, to be honest.

@thomasvl In the README we suggest that you should try to keep the versions of SwiftProtobuf and protoc-gen-swift matching, but have we always been clear about what we officially support here?

@thomasvl
Copy link
Collaborator

thomasvl commented Jan 2, 2025

@thomasvl In the README we suggest that you should try to keep the versions of SwiftProtobuf and protoc-gen-swift matching, but have we always been clear about what we officially support here?

I don't think we have. Since we've never come up with a better way to do the version mismatch checks compared to what the C based languages can do, it's sorta a space we've always struggled with.

@thomasvl
Copy link
Collaborator

thomasvl commented Jan 2, 2025

I guess it isn't a "break", but the potential catch I can see is if someone depends on protos compiled into another package, they would need those also updated for this option to work since the support is captured in the generated code.

The only other thing I can think of that has been a concern on additions is code size increase, and this does increase the generated code size (which I believe is what pushed some of the investigation into table based generation to try and factor things like this out).

/// - repeated fields: Will always be visited, even if they are empty.
/// - map fields: Will always be visited, even if they are empty.
/// - singular message and group fields: Will be visited only if they are set.
/// - singular primitive fields: Will always be visited under proto2. Under proto3 unset optional primitive fields will not be visited.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc string likely needs an update since everything is about Editions now upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants