Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 decode should not be required for response.clientDataJSON and cData #2100

Open
zacknewman opened this issue Jul 18, 2024 · 2 comments
Open
Assignees
Labels
@Risk Items that are at risk for L3 type:technical

Comments

@zacknewman
Copy link
Contributor

zacknewman commented Jul 18, 2024

Currently the spec states:

Let JSONtext be the result of running UTF-8 decode on the value of response.clientDataJSON.

Note: Using any implementation of UTF-8 decode is acceptable as long as it yields the same result as that yielded by the UTF-8 decode algorithm. In particular, any leading byte order mark (BOM) MUST be stripped.

for step 5 in Registering a New Credential and

Let JSONtext be the result of running UTF-8 decode on the value of cData.

Note: Using any implementation of UTF-8 decode is acceptable as long as it yields the same result as that yielded by the UTF-8 decode algorithm. In particular, any leading byte order mark (BOM) MUST be stripped.

for step 8 in Verifying an Authentication Assertion.

This seems slightly too strict. While the notes call out stripping a BOM, they also state "yields the same result …" (emphasis added); however UTF-8 decode requires decoding with the "replacement" handler as well.

According to the serialization of the CollectedClientData, it is impossible for invalid UTF-8 to be generated. This means that RPs should only have to worry about stripping a BOM but not replacing invalid UTF-8 code units with the "replacement character" (i.e., U+FFFD); as the existence of invalid UTF-8 implies the serialization algorithm has not been adhered to as mandated by the spec. I suppose one could argue prepending the "zero width no-break space" character (i.e., U+FEFF) also violates the serialization algorithm; thus interpreting its existence as a byte-order mark (BOM) and subsequently stripping it seems bizarre too.

@zacknewman
Copy link
Contributor Author

zacknewman commented Jul 19, 2024

To add to this, the limited verification algorithm doesn't even rely on UTF-8 decode or any other decoding. This means the whole BOM stripping isn't even applicable to it. Either steps need to be added to both the serialization and limited verification algorithms or UTF-8 decode should not be required. If BOM stripping is necessary, then in order for the limited verification algorithm to be consistent, step 2 should be added to the serialization section:

  1. Let result be an empty byte string.
  2. Optionally append 0xefbbbf (i.e., U+FEFF) to result.
  3. Append 0x7b2274797065223a ({"type":) to result.

Similarly steps 3–5 should be added to the limited verification algorithm:

  1. Let expected be an empty byte string.
  2. If clientDataJSON is not at least 3 bytes long, fail.
  3. Let bom be the first three bytes of clientDataJSON.
  4. If bom is 0xefbbbf (i.e., U+FEFF), then append 0xefbbbf to expected.
  5. Append 0x7b2274797065223a ({"type":) to expected.

@zacknewman
Copy link
Contributor Author

zacknewman commented Jul 27, 2024

I hope this is not misconstrued as a "rant", but I think the real problem is the serialization algorithm itself. It is far too strict. The fact this algorithm is technically required means many clients or user agents will likely accidentally not conform to the spec since they likely use an actual JSON library which will likely not guarantee the order of fields, not guarantee unnecessary whitespace is not used, etc. The spec should be written such that clients and user agents have an easier time to conform to the spec and not a harder time so long as there is not a reduction in security.

The existence of the serialization algorithm seems to violate two big goals of WebAuthn: consistency among RPs in order to mitigate risks related to a fragmented ecosystem that causes users to turn away from WebAuthn altogether and the spec as a web standard.

Personally, I'm sympathetic to libraries that don't kowtow to non-conforming clients/usage; and as an RP writer, I find it hard to "forgive" these non-conforming clients. It is obvious that allowing any conforming JSON, even if it violates the spec, is not a "security issue"; but that creates a slippery slope. While Postel's law is still common, this can cause issues if one is not careful. The purpose of a spec should be to state what is necessary and not rely on implementors to use their best judgment when the spec can, and in this case seemingly should, be violated.

The existence of the serialization algorithm and the related limited verification algorithm is a not-so-subtle way to help non-web platforms—as confirmed by #2102 (comment). When questions come up about native applications and other non-web platforms, the usual (and appropriate) response is something to the effect of "WebAuthn is only a web standard, so concerns with native applications are not applicable"; yet here we have a required algorithm that is designed for non-web platforms.

Ideally the solution would be to state that clients/user agents can send any payload that conforms to RFC 8259 (or whatever formal standard you wish to cite), and they MAY optionally begin the payload with U+FEFF. Then at most a commentary note stating that they SHOULD conform to the serialization algorithm.

With above, an RP writer that relies on the limited verification algorithm cannot claim to be "WebAuthn-conforming" since conforming clients/user agents are allowed to send any JSON (and optional BOM).

Until then, the spec should at least be consistent and ensure the required serialization algorithm and the limited verification algorithm align with other areas of the standard that state BOM stripping MUST occur; because as of now, it's a bizarre requirement to allow payloads that strictly speaking should not be possible.

Related:
#2101
#2102

@nadalin nadalin added this to the L3-WD-02 milestone Aug 14, 2024
@nadalin nadalin added the @Risk Items that are at risk for L3 label Aug 14, 2024
@nadalin nadalin modified the milestones: L3-WD-02, Futures (catch-all) Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@Risk Items that are at risk for L3 type:technical
Projects
None yet
Development

No branches or pull requests

4 participants
@agl @nadalin @zacknewman and others