Support encoding comments and specifying the encoding format #328

arp242 · 2021-11-16T12:08:02Z

This allows encoding comments and setting some flags to control the
format. While toml.Marshaler added in #327 gives full control over how
you want to format something, I don't think it's especially
user-friendly to tell everyone to create a new type with all the
appropriate formatting, escaping, etc. The vast majority of of use cases
probably just call in to a few simple categories such as "use """ or
"encode this number as a hex".

I grouped both features together as they're closely related: both set
additional information on how to write keys.

What I want is something that:

allows setting attributes programmatically;
supports round-tripping by default on a standard struct;
plays well with other encoders/decoders;
has a reasonable uncumbersome API.

Most options (custom types, struct tags) fail at least one of these;
there were some PRs for struct tags, but they fail at 1, 2, and
(arguably) 4. Custom types fail at 2, 3, and probably 4.

This adds SetMeta() to the Encoder type; this is already what we have
when decoding, and all additional information will be set on it. On
MetaData we add the following types:

SetType()       Set TOML type info.
TypeInfo()      Get TOML type info.
Doc()           Set "doc comment" above the key.
Comment()       Set "inline comment" after the key.

Every TOML type has a type in this package, which support different
formatting options (see type_toml.go):

Bool
String
Int
Float
Datetime
Table
Array
ArrayTable

For example:

meta := toml.NewMetaData().
        SetType("key", toml.Int{Width: 4, Base: 16}).
        Doc("key", "A codepoint").
Comment("key", "ë")
toml.NewEncoder(os.Stdout).SetMeta(meta).Encode(struct {
        Key string `toml:"key"`
}{"ë")

Would write:

# A codepoint.
key = 0x00eb  # ë

It also has Key() to set both:

toml.NewMetaData().
    Key("key", toml.Int{Width: 4, Base: 16}, toml.Doc("A codepoint"), toml.Comment("ë")).
    Key("other", toml.Comment("..."))

The advantage of this is that it reduces the number of times you have to
type the key string to 1, but it uses interface{}. Not yet decided which
one I'll stick with, and also not a huge fan of Doc() and Comment(), but
I can't really think of anything clearer at the moment (these are the
names the Go ast uses).

The Decode() sets all this information on the MetaData, so this:

meta, _ := toml.Decode(..)
toml.NewEncoder(os.Stdout).SetMeta(meta).Encode(..)

Will write it out as "key = 0x00eb" again, rather than "key = 235".

This way, pretty much any flag can be added programmatically without
getting in the way of JSON/YAML/whatnot encoding/decoding.

I don't especially care how you need to pass the keys as strings, but
there isn't really any good way to do it otherwise. There is also the
problem that the "key" as found in the parser may be different than the
"key" the user is expecting if you don't use toml struct tags:

type X struct { Key int }

Will read "key = 2" in to "Key", but when encoding it will write as
"Key" rather than "key". The type information will be set to "key", but
when encoding it will look for "Key", so round-tripping won't work
correct and has the potential for confusion if the wrong key is set.

This is not so easy to fix since we don't have access to the struct in
the parser. I think it's fine to just document this as a caveat and tell
people to use struct tags, which is a good idea in any case.

I'm not necessarily opposed to also adding struct tags for most of these
things, although I'm not a huge fan of them. Since struct tags can't be
set programmatically it's not really suitable for many use cases (e.g.
setting comments dynamically, using multiline strings only if the string
contains newlines, etc.) It's something that could maybe be added in a
future PR, if a lot of people ask for it.

Fixes #64
Fixes #75
Fixes #160
Fixes #192
Fixes #213
Fixes #269

This allows encoding comments and setting some flags to control the format. While toml.Marshaler added in #327 gives full control over how you want to format something, I don't think it's especially user-friendly to tell everyone to create a new type with all the appropriate formatting, escaping, etc. The vast majority of of use cases probably just call in to a few simple categories such as "use `"""` or "encode this number as a hex". I grouped both features together as they're closely related: both set additional information on how to write keys. What I want is something that: 1. allows setting attributes programmatically; 2. supports round-tripping by default on a standard struct; 3. plays well with other encoders/decoders; 4. has a reasonable uncumbersome API. Most options (custom types, struct tags) fail at least one of these; there were some PRs for struct tags, but they fail at 1, 2, and (arguably) 4. Custom types fail at 2, 3, and probably 4. --- This adds SetMeta() to the Encoder type; this is already what we have when decoding, and all additional information will be set on it. On MetaData we add the following types: SetType() Set TOML type info. TypeInfo() Get TOML type info. Doc() Set "doc comment" above the key. Comment() Set "inline comment" after the key. Every TOML type has a type in this package, which support different formatting options (see type_toml.go): Bool String Int Float Datetime Table Array ArrayTable For example: meta := toml.NewMetaData(). SetType("key", toml.Int{Width: 4, Base: 16}). Doc("key", "A codepoint"). Comment("key", "ë") toml.NewEncoder(os.Stdout).SetMeta(meta).Encode(struct { Key string `toml:"key"` }{"ë") Would write: # A codepoint. key = 0x00eb # ë It also has Key() to set both: toml.NewMetaData(). Key("key", toml.Int{Width: 4, Base: 16}, toml.Doc("A codepoint"), toml.Comment("ë")). Key("other", toml.Comment("...")) The advantage of this is that it reduces the number of times you have to type the key string to 1, but it uses interface{}. Not yet decided which one I'll stick with, and also not a huge fan of Doc() and Comment(), but I can't really think of anything clearer at the moment (these are the names the Go ast uses). --- The Decode() sets all this information on the MetaData, so this: meta, _ := toml.Decode(..) toml.NewEncoder(os.Stdout).SetMeta(meta).Encode(..) Will write it out as "key = 0x00eb" again, rather than "key = 235". This way, pretty much any flag can be added programmatically without getting in the way of JSON/YAML/whatnot encoding/decoding. --- I don't especially care how you need to pass the keys as strings, but there isn't really any good way to do it otherwise. There is also the problem that the "key" as found in the parser may be different than the "key" the user is expecting if you don't use toml struct tags: type X struct { Key int } Will read "key = 2" in to "Key", but when encoding it will write as "Key" rather than "key". The type information will be set to "key", but when encoding it will look for "Key", so round-tripping won't work correct and has the potential for confusion if the wrong key is set. This is not so easy to fix since we don't have access to the struct in the parser. I think it's fine to just document this as a caveat and tell people to use struct tags, which is a good idea in any case. --- I'm not necessarily opposed to also adding struct tags for most of these things, although I'm not a huge fan of them. Since struct tags can't be set programmatically it's not really suitable for many use cases (e.g. setting comments dynamically, using multiline strings only if the string contains newlines, etc.) It's something that could maybe be added in a future PR, if a lot of people ask for it. Fixes #64 Fixes #75 Fixes #160 Fixes #192 Fixes #213 Fixes #269

tschmidtb51 · 2022-02-10T20:07:44Z

@arp242 Any updates on this one?

arp242 · 2022-02-10T20:18:49Z

Any updates on this one?

I ran in to a problem: to support this properly I need to change the way keys are addressed in Meta type, because right now you can't address certain keys: see #169.

So that syntax needs to be changed, but isn't a backwards-compatible change :-( My plan is to fix most of the outstanding bugs etc. and then release a v2 with this change, but I just haven't had the time.

tschmidtb51 · 2022-02-10T20:22:47Z

Thanks for the update!

arp242 force-pushed the as branch 3 times, most recently from 367083a to 030ae5e Compare November 24, 2021 19:03

arp242 force-pushed the as branch from 030ae5e to e891083 Compare November 24, 2021 19:49

bernhardreiter mentioned this pull request Feb 9, 2022

Document provider config options gocsaf/csaf#40

Merged

arp242 added the v2 label May 27, 2022

arp242 mentioned this pull request Sep 27, 2024

Use base64 to encode []byte by default #423

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support encoding comments and specifying the encoding format #328

Support encoding comments and specifying the encoding format #328

arp242 commented Nov 16, 2021 •

edited

Loading

tschmidtb51 commented Feb 10, 2022

arp242 commented Feb 10, 2022

tschmidtb51 commented Feb 10, 2022 via email

Support encoding comments and specifying the encoding format #328

Are you sure you want to change the base?

Support encoding comments and specifying the encoding format #328

Conversation

arp242 commented Nov 16, 2021 • edited Loading

tschmidtb51 commented Feb 10, 2022

arp242 commented Feb 10, 2022

tschmidtb51 commented Feb 10, 2022 via email

arp242 commented Nov 16, 2021 •

edited

Loading